5072 Commits

Author SHA1 Message Date
Philip Reames
31c37a4a5e
[RISCV][TTI] Adjust VLS shuffle costing to account for sub-mask reuse (#129793)
If we have a shuffle which can be split via VLA where two or more of the
destinations have exactly the same elements, then we only need to
account for them once in costing. The duplicate copies are are (at
worst) whole register moves.

Note that this change only handles the single source case. Doing the
multiple source case seemed a bit more complicated, and I didn't have a
motivating test case.
2025-03-29 15:18:44 -07:00
David Green
9c6eca28cb [AArch64] Return an invalid cost for vscale x 2 x i128 srem.
This protects against invalid size requests on scalable vectors by checking the
original VT, not the legalized type when checking for scalars. The cost
returned is now invalid, which lines up with the codegen not being able to
produce a result.
2025-03-29 19:25:17 +00:00
Florian Hahn
8bdcd0a96e
[LAA] Add missing test coverage for retrying with runtime checks.
Adds extra test coverage showing change by
https://github.com/llvm/llvm-project/pull/128045.
2025-03-27 19:09:10 +00:00
David Green
c6406c8dba
[AArch64] Add getVectorInstrCost Codesize costs handling. (#130946)
We have a lot of missing Codesize costs for vector operations. This
patch starts things off by adding codesize costs for getVectorInstrCost,
returning a single cost instead of the VectorInsertExtractBaseCost
(which is typically 2). Insert of a load are given a cost of 0 as they
use ld1, otherwise the cost is 1.
2025-03-27 17:25:02 +00:00
David Green
e2202b944b
[AArch64] Update costs for scalarizing i64->f32 int_to_fp. (#132366)
After #130665 these operations are scalarized to avoid
double-rounding. This updates the cost model to match.

In the future we might be able to use SVE instructions to help, but for
the moment the costs should be higher. Costsize and Latency costs are
not yet expected to be accurate. The vector insert/extract will use the
cost of VectorInsertExtractBaseCost (2 by default).
2025-03-26 07:26:17 +00:00
Alex MacLean
fd3a6b6005
[NVPTX] Improve modeling of inline PTX (#130675)
Improve the modeling of the memory effects and instruction cost of
inline assembly.

- MemoryEffects: The CUDA spec states that inline assembly is not
assumed to have any side-effects or read or write to memory. An inline
assembly may be treated as NoModRef unless it is explictly marked as
having side effects or has an explicit memory clobber.
https://docs.nvidia.com/cuda/inline-ptx-assembly/index.html#incorrect-optimization

> Normally any memory that is written to will be specified as an out
operand, but if there is a hidden read or write on user memory (for
example, indirect access of a memory location via an operand), or if you
want to stop any memory optimizations around the asm() statement
performed during generation of PTX, you can add a “memory” clobbers
specification after a 3rd colon.

- InstructionCost: This change implements very rough string parsing
system to count the number of instructions in an inline-asm. There are
corner cases it will not handle well, but in general this is an
improvement over the current cost of the number of arguments plus one.
2025-03-25 13:46:16 -07:00
Graham Hunter
f737df73a3
[AArch64][CostModel] Increase the cost of illegal SVE int-to-fp converts (#130756)
If a scalable vector uitofp or sitofp effectively extends the size of
each element as part of the conversion, the AArch64 backend
may need to plant multiple unpacks before converting. Increase
the cost in those cases to account for this.
2025-03-25 10:43:44 +00:00
Simon Pilgrim
b62e149f06 [CostModel][X86] check fma cost kinds using -cost-kind=all 2025-03-20 17:20:47 +00:00
Nikita Popov
38e8dff84b
[AA][BasicAA] Move more call logic to BasicAA (#131144)
Currently, the handling for calls is split between AA and BasicAA in an
awkward way. BasicAA does argument alias analysis for non-escaping
objects (but without considering MemoryEffects), while AA handles the
generic case using MemoryEffects. However, fundamentally, both of these
are really trying to do the same thing.

The new merged logic first tries to remove the OtherMR component of the
memory effects, which includes accesses to escaped memory. If a
function-local object does not escape, OtherMR can be set to NoModRef.

Then we perform the argument scan in basically the same way as AA
previously did. However, we also need to look at the operand bundles. To
support that, I've adjusted getArgModRefInfo to accept operand bundle
arguments.
2025-03-19 15:44:52 +01:00
Simon Pilgrim
0cb9c5045b [CostModel][X86] check fp<->int conversion cost kinds using -cost-kind=all 2025-03-19 14:16:08 +00:00
Nashe Mncube
4ddc8df6ca
[CostModel][ARM]Adjust cost of muls in (U/S)MLAL and patterns (#122713)
PR #117350 made changes to the SLP vectorizer which introduced a
regression on some ARM benchmarks. Investigation narrowed it down to
suboptimal codegen for benchmarks that previously only used scalar (U/S)MLAL
instructions. The linked change meant the SLPVectorizer thought that
these could be vectorized. This change makes the cost of muls in
(U/S)MLAL patterns slightly cheaper to make sure scalar instructions are
preferred in these cases over SLP vectorization on targets supporting DSP
2025-03-19 12:25:44 +00:00
Simon Pilgrim
945ce9642b
[CostModel][X86] check all reduction cost kinds using -cost-kind=all (#132000) 2025-03-19 11:26:10 +00:00
Simon Pilgrim
4686b8a663
[CostModel][X86] merge masked intrinsics costs tests using -cost-kind=all (#131999) 2025-03-19 11:25:54 +00:00
Simon Pilgrim
6ca1424fc1
[CostModel][X86] merge fmaxnum/fminnum costs tests using -cost-kind=all (#131922) 2025-03-19 10:02:38 +00:00
Simon Pilgrim
e9daafdd5e
[CostModel][X86] merge integer comparison costs tests using -cost-kind=all (#131875) 2025-03-19 10:01:31 +00:00
Simon Pilgrim
841d6c45f3
[CostModel][X86] merge fp comparison costs tests using -cost-kind=all (#131874) 2025-03-19 10:01:19 +00:00
Simon Pilgrim
7cd9b3fcec
[CostModel][X86] merge truncation costs tests using -cost-kind=all (#131872) 2025-03-19 10:00:49 +00:00
Simon Pilgrim
61b0bf5e01
[CostModel][X86] merge funnel shifts costs tests using -cost-kind=all (#131867) 2025-03-19 10:00:16 +00:00
Simon Pilgrim
0f2fb2b5c5
[CostModel][X86] merge integer multiply costs tests using -cost-kind=all (#131864) 2025-03-19 09:20:56 +00:00
Simon Pilgrim
72240fae4a
[CostModel][X86] merge select costs tests using -cost-kind=all (#131865) 2025-03-19 09:19:07 +00:00
Simon Pilgrim
a6c09d40ed
[CostModel][X86] merge integer div/rem costs tests using -cost-kind=all (#131873) 2025-03-18 21:46:32 +00:00
David Green
b42f8ec26d [AArch64] Update a number of costmodel tests with -cost-kind=all. NFC 2025-03-18 18:48:35 +00:00
Simon Pilgrim
a5a9b2b92f
[CostModel][X86] merge integer arithmetic costs tests using -cost-kind=all (#131840) 2025-03-18 17:26:31 +00:00
Simon Pilgrim
40c6f89841
[CostModel][X86] merge fp arithmetic costs tests using -cost-kind=all (#131839) 2025-03-18 17:24:05 +00:00
Simon Pilgrim
168177a0bd
[CostModel][X86] merge arithmetic integer min/max costs tests using -cost-kind=all (#131834) 2025-03-18 17:08:52 +00:00
Simon Pilgrim
24fbf9dd42
[CostModel][X86] merge saturated arithmetic costs tests using -cost-kind=all (#131828) 2025-03-18 16:51:26 +00:00
Nikita Popov
93df3e8166 [BasicAA] Add additional test for call AA (NFC) 2025-03-18 17:36:50 +01:00
Simon Pilgrim
33e5d013b7
[CostModel][X86] merge vector shuffle costs tests using -cost-kind=all (#131819) 2025-03-18 16:19:51 +00:00
Simon Pilgrim
e8f79eb898
[CostModel][X86] merge cttz costs tests using -cost-kind=all (#131810) 2025-03-18 15:43:21 +00:00
Simon Pilgrim
df544b73e4
[CostModel][X86] merge vector shifts costs tests using -cost-kind=all (#131806) 2025-03-18 15:32:00 +00:00
Simon Pilgrim
05dbabe329
[CostModel][X86] merge ctpop costs tests using -cost-kind=all (#131802) 2025-03-18 15:22:20 +00:00
Simon Pilgrim
034dd4c26f
[CostModel][X86] merge ctlz costs tests using -cost-kind=all (#131797) 2025-03-18 14:34:44 +00:00
Simon Pilgrim
a2d7451a13
[CostModel][X86] merge bitreverse costs tests using -cost-kind=all (#131791) 2025-03-18 13:30:01 +00:00
Simon Pilgrim
4f5eed0a37
[CostModel][X86] merge bswap costs tests using -cost-kind=all (#131784) 2025-03-18 13:29:25 +00:00
Simon Pilgrim
31e98c7037
[CostModel][X86] merge abs costs tests using -cost-kind=all (#131619)
Now that we have #130490 - merge the cost test files to avoid bitrot

Lots more set of files to do - but this is give an example
2025-03-18 11:19:05 +00:00
Yingwei Zheng
c5a491e9ea
[SCEV] Check whether the start is non-zero in ScalarEvolution::howFarToZero (#131522)
https://github.com/llvm/llvm-project/pull/94525 assumes that the loop
will be infinite when the stride is zero. However, it doesn't hold when
the start value of addrec is also zero.

Closes https://github.com/llvm/llvm-project/issues/131465.
2025-03-17 13:59:16 +08:00
Mircea Trofin
b034905c82
[ctxprof] Capture sampling info for context roots (#131201)
When we collect a contextual profile, we sample the threads entering its root and only collect on one at a time (see `ContextRoot::Taken`). If we want to compare profiles between contextual profiles, and/or flat profiles, we have a problem: we don't know how to compare the counter values relative to each other. To that end, we add `ContextRoot::TotalEntries`, which is incremented every time a root is entered and serves as multiplier for the counter values collected under that root.

We expose this in the profile and leave the normalization to the user of the profile, for a few reasons:

* it's only needed if reasoning about all profiles in aggregate.
* the goal, in compiler_rt, is to flush out the profile as quickly as possible, and performing multiplications adds an overhead that may not even be necessary if the consumer of the profile doesn't care about combining profiles
* the information itself may be interesting as an indication of relative sampling of various contexts.
2025-03-14 21:10:22 -07:00
Florian Hahn
dfb661cd1c
[LAA] Add extra tests for #128061.
Extend test coverage for
https://github.com/llvm/llvm-project/pull/128061.
2025-03-13 21:42:32 +00:00
Nikita Popov
de895751d2
[CaptureTracking][AA] Only consider provenance captures (#130777)
For the purposes of alias analysis, we should only consider provenance
captures, not address captures. To support this, change (or add)
CaptureTracking APIs to accept a Mask and StopFn argument. The Mask
determines which components we are interested in (for AA that would be
Provenance).

The StopFn determines when we can abort the walk early. Currently, we
want to do this as soon as any of the components in the Mask is
captured. The purpose of making this a separate predicate is that in the
future we will also want to distinguish between capturing full
provenance and read-only provenance. In that case, we can only stop
early once full provenance is captured. The earliest escape analysis
does not get a StopFn, because it must always inspect all captures.
2025-03-13 09:54:36 +01:00
David Green
adb44ed2b8 [AArch64] Add -cost-kind=all coverage for insert-extract.ll and shuffle-load.ll. NFC 2025-03-12 09:16:01 +00:00
David Green
6f89c1ff6b [AArch64] Remove Kyro run lines from insert-extract.ll. NFC
They are expected to match the other CHECK lines now.
2025-03-12 09:15:43 +00:00
David Green
c542f42579 [AArch64] Update cost test to use -cost-kind=all. NFC
This is essentially the tests from b021bdbb3997 re-done with the new cost-model
output format from #130490, to add cost-model coverage for all the cost kinds.
More to come..
2025-03-11 15:31:50 +00:00
David Green
5c8760b1ab [AArch64] Update arith-fp.ll codegen test. NFC
A run line with and without +fullfp16 is added to check the differences between
the two, and the fp16 tests are separated out to keep the other check lines
simpler. FP128 tests are added for all operations, and fmuladd tests are added
similar to fma.
2025-03-11 12:50:59 +00:00
David Green
cdf18331eb
[CostModel] Add -cost-kind=all costmodel output (#130490)
In order to make the different cost model kinds easier to test, and to
manage the complexity of all the different variants, this patch
introduces a -cost-kind=all option that will print the output of all
cost model kinds. It feel especially helpful for tests that already have
multiple run lines (with / without +fullfp16 for example).

It currently produces the output:
```
Cost Model: Found costs of RThru:1 CodeSize:1 Lat:3 SizeLat:1 for: %F16 = fadd half undef, undef
```

The output is collapsed into a single value if all costs are the same.
Invalid costs print "Invalid" via the normal InstructionCost printing.

Two test files are updated to show some examples with
-intrinsic-cost-strategy=type-based-intrinsic-cost and Invalid costs.
Once we have something we are happy with I will try to use this to
update more tests, as in b021bdbb3997ef6dd13980dc44f24754f15f3652 but
for more variants.
2025-03-11 10:55:37 +00:00
Sushant Gokhale
c4808741e8
[AArch64][CostModel] Alter sdiv/srem cost where the divisor is constant (#123552)
This patch revises the cost model for sdiv/srem and draws its inspiration from the udiv/urem patch #122236

The typical codegen for the different scenarios has been mentioned as notes/comments in the code itself( this is done owing to lot of scenarios such that it would be difficult to mention them here in the patch description).
2025-03-09 22:26:39 -07:00
David Green
e44e24dfe6
[AArch64] Improve vector funnel shift by constant costs. (#130044)
We now have better codegen, and can have better costs to match. The
generated code should now produce a shl+usra and can be seen in
testcases such as:
7e5821bae8/llvm/test/CodeGen/AArch64/fsh.ll (L3941).
2025-03-09 18:01:45 +00:00
David Sherwood
db5e4016c0
[CostModel] Add type-based cost model for get.active.lane.mask intrinsic (#130132)
I recently realised that we return an invalid cost when requesting
the type-based cost for the get.active.lane.mask intrinsic. I've
fixed that in this patch by reusing the existing code for the
non-type-based model.
2025-03-07 16:12:35 +00:00
Benjamin Maxwell
5239f6777a
[CostModel][Test] Replace multiple flags with -intrinsic-cost-strategy (#128885)
This replaces the `-prefer-intrinsic-cost` and
`type-based-intrinsic-cost` flags with a single
`-intrinsic-cost-strategy=<strategy>` flag.

The possible strategies are:

 * `instruction-cost`
   - Use TargetTransformInfo::getInstructionCost()
 * `intrinsic-cost`
   - Use TargetTransformInfo::getIntrinsicInstrCost()
 * `type-based-intrinsic-cost`
   - Calculate the intrinsic cost based only on argument types
2025-03-07 10:51:16 +00:00
DianQK
462eb7e28e
[ValueTracking] Skip incoming values that are the same as the phi in isGuaranteedNotToBeUndefOrPoison (#130111)
Fixes (keep it open) #130110.

If the incoming value is PHI itself, we can skip this. If we can
guarantee that the other incoming values are neither undef nor poison,
then we can also guarantee that the value isn't either. If we cannot
guarantee that, it makes no sense in calculating it.
2025-03-07 05:46:32 +08:00
Matt Arsenault
5faa4130b9
AMDGPU: Add gfx950 cost model tests for minimum and maximum (#130029) 2025-03-06 17:27:36 +07:00