5072 Commits

Author SHA1 Message Date
David Green
c856e8def4 [ARM] Update cmps.ll, control-flow.ll and divrem.ll to use -cost-kind=all. NFC 2025-08-20 12:59:32 +01:00
Benjamin Maxwell
bb3066d42b
[LAA] Move scalable vector check into getStrideFromAddRec() (#154013)
This moves the check closer to the `.getFixedValue()` call and fixes
#153797 (which is a regression from #126971).
2025-08-19 06:40:07 +01:00
Panagiotis Karouzakis
c2e7fad446
[DemandedBits] Support non-constant shift amounts (#148880)
This patch adds support for the shift operators to handle non-constant
shift operands.

ashr proof -->https://alive2.llvm.org/ce/z/EN-siK
lshr proof --> https://alive2.llvm.org/ce/z/eeGzyB
shl proof --> https://alive2.llvm.org/ce/z/dpvbkq
2025-08-19 01:11:16 +08:00
Diana Picus
ac005e16f6
Reapply "[AMDGPU] Intrinsic for launching whole wave functions" (#153584)
This reverts commit 14cd1339318b16e08c1363ec6896bd7d1e4ae281. The
buildbot failure seems to have been a cmake issue which has been
discussed in more detail in this Discourse post:

https://discourse.llvm.org/t/cmake-doesnt-regenerate-all-tablegen-target-files/87901

If any buildbots fail to select arbitrary intrinsics with this patch,
it's worth considering using clean builds with ccache instead of
incremental builds, as recommended here:

https://llvm.org/docs/HowToAddABuilder.html#:~:text=Use%20CCache%20and%20NOT%20incremental%20builds

The original commit message for this patch:
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.

Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.

Tail calls are handled in a future patch.
2025-08-15 10:12:47 +02:00
joaosaffran
d56fa96524
[DirectX] Add Range Overlap validation (#152229)
As part of the Root Signature Spec, we need to validate if Root
Signatures are not defining overlapping ranges.
Closes: https://github.com/llvm/llvm-project/issues/126645

---------

Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com>
Co-authored-by: Joao Saffran <jderezende@microsoft.com>
2025-08-14 18:40:11 -04:00
David Green
5836bae463
[AArch64] Change the cost of fma and fmuladd to match fmul. (#152963)
As fmul and fmadd are so similar, their performance characteristics tend
to be the same on most platforms, at least in terms of reciprocal
throughputs. Processors capable of performing a given number of fmul per
cycle can usually perform the same number of fma, with the extra add
being relatively simple on top. This patch makes the scores of the two
operations the same, which brings the throughput cost of a fma/fmuladd
to 2, and the latency to 3, which are the defaults for fmul.

Note that we might also want to change the throughput cost of a fmul to
1, as most processors have ample bandwidth for them, but they should
still stay in-line with one another.
2025-08-14 21:53:45 +01:00
David Green
d9d9d9ad19
[ARM][MVE] Add shuffle costs for LDn and STn instructions. (#145304)
LD2 is represented in IR as deinterleave-shuffle(load), and ST2 as
store(interleave-shuffle). Whilst the shuffle would be expensive in
general for MVE (it does not have zip/uzp instructions), it should be
treated as cheap when part of the LD2/ST2 pattern. This borrows some
code from the AArch64 backed to produce lower costs. (Some of which
still shows as higher than it should - that just shows how broken the
generic shuffle costs are at the moment, they would be lower if
getShuffleCost was called directly as opposed to going through
getInstructionCost).
2025-08-14 06:59:37 +01:00
Luke Lau
81b576e66b
[RISCV] Cost casts with illegal types that can't be legalized (#153030)
If we have a floating point vector and no zve32f/zve64f/zve64d, we can
end up with an invalid type-legalization cost from
getTypeLegalizationCost.

Previously this triggered an assertion that the type must have been
legalized if the "legal" type is a vector, but in this case when it's
not possible to legalize the original type is spat back out.

This fixes it by just checking that the legalization cost is valid.

We don't have much testing for zve64x, so we may have other places in
the cost model with this issue.

Fixes #153008
2025-08-12 00:29:39 +08:00
Ramkumar Ramachandra
4443b37877
[LAA] Pre-commit tests exercising different types (#151091)
Pre-commit tests exercising different types of source/sink in
depend_diff_types.ll, in preparation to weaken the HasSameSize check in
LoopAccessAnalysis.

Co-authored-by: Igor Kirillov <igor.kirillov@arm.com>
2025-08-11 10:19:10 +01:00
Alexander Richardson
3a4b351ba1
[IR] Introduce the ptrtoaddr instruction
This introduces a new `ptrtoaddr` instruction which is similar to
`ptrtoint` but has two differences:

1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance
2) `ptrtoaddr` only extracts (and then extends/truncates) the low
   index-width bits of the pointer

For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.

This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.

RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/139357
2025-08-08 10:12:39 -07:00
Ivan R. Ivanov
7c141e2118
[ValueTracking] Add missing check for two-value PN recurrence matching (#152700)
When InstTy is a type like IntrinsicInst which can have a variable
number of arguments, we can encounter a case where Operation will have
fewer than two arguments and error at the getOperand() calls.

Fixes: https://github.com/llvm/llvm-project/issues/152725.
2025-08-08 17:39:24 +02:00
David Green
7f1638efc1
[AArch64] Generalize costing for FP16 instructions (#150033)
This extracts the code for modelling a fp16 operation as
`fptrunc(fpop(fpext,fpext))` into a new function named
getFP16BF16PromoteCost so that it can be reused by the
arithmetic instructions. The function takes a lambda to
calculate the cost of the operation with the promoted type.
2025-08-08 13:40:07 +01:00
Ryotaro Kasuga
bd39ae6125
[Delinearization] Add function for fixed size array without relying on GEP (#145050)
The existing functions `getIndexExpressionsFromGEP` and
`tryDelinearizeFixedSizeImpl` provide functionality to delinearize
memory accesses for fixed size array. They use the GEP source element
type in their optimization heuristics. However, driving optimization
heuristics based on GEP type information is not allowed.

This patch introduces new functions `findFixedSizeArrayDimensions` and
`delinearizeFixedSizeArray` to delinearize a fixed size array without
using the type information in GEP. The new function
`findFixedSizeArrayDimensions` infers the size of each dimension of the
array based on the value to be added to the address as induction
variables are incremented. `delinearizeFixedSizeArray` attempts to
restore the subscripts of each dimension based on the estimated array
size.

This is an initial implementation that may not cover all cases, but is
intended to replace the existing function in the future.

Related:
- https://discourse.llvm.org/t/enabling-loop-interchange/82589/4
-
https://github.com/llvm/llvm-project/pull/124911#issuecomment-2962499501
2025-08-08 19:08:14 +09:00
Graham Hunter
de72cca671
[CostModel] Provide a default model for histogram intrinsics (#149348)
Since we scalarize these intrinsics when the target does not support
them, we should model that for costing purposes.
2025-08-08 11:00:00 +01:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
David Green
6a32e2225e [AArch64] Add SVE fmuladd and fma cost tests. NFC 2025-08-08 08:52:28 +01:00
Ryotaro Kasuga
05dd957cda
[DA] Fix the check between Subscript and Size after delinearization (#151326)
Delinearization provides two values: the size of the array, and the
subscript of the access. DA checks their validity (`0 <= subscript <
size`), with some special handling. In particular, to ensure `subscript
< size`, calculate the maximum value of `subscript - size` and check if
it is negative. There was an issue in its process: when `subscript -
size` is expressed as an affine format like `init + step * i`, the value
in the last iteration (`start + step * (num_iterations - 1)`) was
assumed to be the maximum value. This assumption is incorrect in the
following cases:

- When `step` is negative
- When the AddRec wraps

This patch introduces extra checks to ensure the sign of `step` and
verify the existence of nsw/nuw flags.

Also, `isKnownNonNegative(S - smax(1, Size))` was used as a regular
check, which is incorrect when `Size` is negative. This patch also
replace it with `isKnownNonNegative(S - Size)`, although it's still
unclear whether using `isKnownNonNegative` is appropriate in the first
place.

Fix #150604
2025-08-08 10:58:13 +09:00
Shilei Tian
351b38f266
[AMDGPU] Mark address space cast from private to flat as divergent if target supports globally addressable scratch (#152376)
Globally addressable scratch is a new feature introduced in gfx1250.
However, this feature changes how scratch space is mapped into the flat
aperture, making address space casts from private to flat no longer
uniform.
2025-08-06 17:08:56 -04:00
Pedro Lobo
2bbc614713
[InstCombine] Support offsets in memset to load forwarding (#151924)
Adds support for load offsets when performing `memset` load forwarding.
2025-08-05 17:09:06 +01:00
Nikita Popov
ba099c516d
[StackLifetime] Remove handling for lifetime size mismatch (#151965)
Split out from #150248:

Since #150944 the size passed to lifetime.start/end is considered
meaningless. The lifetime always applies to the whole alloca.

Accordingly remove handling for size mismatch in the StackLifetime
analysis.
2025-08-05 09:19:10 +02:00
David Green
fcae1ba775 [ARM] Use -cost-kind=all for cast and active_lane_mask tests. NFC 2025-08-05 08:14:47 +01:00
Stanislav Mekhanoshin
a153e83e41
[AMDGPU] gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 codegen (#152036) 2025-08-04 19:16:34 -07:00
Nikita Popov
b18377ccad [GlobalsModRef] Generate test checks (NFC) 2025-08-04 17:13:28 +02:00
David Green
b30d5315b7
[AArch64] Add better fcmp costs for expanded predicates (#147940)
Certain fcmp predicates need to be expanded into multiple operations and
or'd together. This adds some more accurate cost modelling for them
based on the predicate. Unsupported operations are given the cost of a
libcall and the latency is set to 2 as that seemed to be fairly common
between different CPUs.
2025-08-04 13:42:57 +01:00
David Green
e136fb04f2
[AArch64] Add sve bf16 fpext and fpround costs. (#150485)
This prevents them from generating Invalid costs, as generating the
instructions seems to work fine with and without +bf16. The costs are
mostly taken from the number of instructions (minus ptrue and constants).
2025-08-04 09:47:41 +01:00
Stanislav Mekhanoshin
33abf05af4
[AMDGPU] gfx1250 v_permlane_* instructions (#151749) 2025-08-01 16:14:19 -07:00
Ramkumar Ramachandra
e7200c734d
[LV] Pre-commit test for #151664 (#151671)
Hoisted vector instructions are costed incorrectly.
2025-08-01 17:09:11 +01:00
Florian Hahn
2ae996cbbe
[LAA] Support assumptions in evaluatePtrAddRecAtMaxBTCWillNotWrap (#147047)
This patch extends the logic added in
https://github.com/llvm/llvm-project/pull/128061 to support
dereferenceability information from assumptions as well.

Unfortunately both assumption cache and the dominator tree need to be
threaded through multiple layers to make them available where needed.

PR: https://github.com/llvm/llvm-project/pull/147047
2025-08-01 14:18:07 +01:00
Antonio Frighetto
3eda63c958
[GVN] Add MemorySSA coverage to tests (NFC)
Test check lines have been regenerated by ensuring parity between
MemDep and MemorySSA, while migrating towards the latter.
2025-08-01 15:10:58 +02:00
Florian Hahn
d74d841b65
[SECV] Try to push the op into ZExt: A + zext (-A + B) -> zext (B) (#151227)
Try to push the constant operand into a ZExt:
A + zext (-A + B) -> zext (B), if trunc (A) + -A + B does not
unsigned-wrap.

The actual code supports ZExts with arbitrary number of arguments, hence
the getAddExpr in the return.

This helps SCEV reasoning in some cases, commonly when adding an offset
to a zero-extended SCEV that subtracts the same offset.

Note that this is restricted to cases where we can fold away an operand
of the inner Add. This is needed to avoid bad interactions with patterns
when forming ZExts, which try to push to ZExt to add operands.

https://alive2.llvm.org/ce/z/q7d303

PR: https://github.com/llvm/llvm-project/pull/151227
2025-07-30 21:10:57 +01:00
Ramkumar Ramachandra
ec0c79df59
[RISCV] Fix bug in [l](lrint|lround) vector-cost (#151298)
Follow up on a review of bd66fd0 ([CostModel/RISCV] Fix costs of vector
[l](lrint|lround)) post-landing to fix a subtle problem with the cost
of vector [l](lrint|lround). We should use source LMUL in the case of
a narrowing op.

Co-authored-by: Luke Lau <luke@igalia.com>
2025-07-30 19:41:11 +01:00
Florian Hahn
c6f7fa7437
[SCEV] Add test for pushing constant add into zext.
Adds a SCEV-only tests for
https://github.com/llvm/llvm-project/pull/151227.
2025-07-30 10:04:40 +01:00
Luke Lau
2a5ac19605 Revert "[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882)"
This reverts commit fe4f6c1a58ab4f00a88a97af01000b6783b573ee, but leaves
the tests that were added.

The original commit mistakenly assumed that if regular bf16/f16 loads
and stores could be lowered without zvfbfmin/zvfhmin, then so too could
masked loads/stores and gathers/scatters.

However SelectionDAG can't actually type-legalize masked.load/stores
since it needs to be done in ScalarizeMaskedMemIntrinPass.

This was causing crashes on IREE because we now returned true for
isLegalMaskedLoadStore.

The original intent of this was to remove a discrepancy in the loop
vectorizer tests whenever predication was enabled, but this has gone
away after 92d09245d61dce80d3e68a27cc34d5fc6f062c93. So I don't think we
need to reapply this patch.
2025-07-30 13:29:47 +08:00
Ramkumar Ramachandra
bd66fd0d01
[CostModel/RISCV] Fix costs of vector [l](lrint|lround) (#146058)
Take the actual instruction cost into account, and don't fallthrough to
code that doesn't apply to [l]lrint. Also strip invalid costs for
[b]f16, as a companion to #146507, and unify it with [l]lround costs as
a companion to #147713.
2025-07-29 19:22:11 +01:00
David Green
46526f879f [ARM] Use -cost-kind=all for arith-overflow.ll, arith-ssat.ll and arith-usat.ll. NFC 2025-07-29 15:08:45 +01:00
Luke Lau
fe4f6c1a58
[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882)
When vectorizing with predication some loops that were previously
vectorized without zvfhmin/zvfbfmin will no longer be vectorized because
the masked load/store or gather/scatter cost returns illegal.

This is due to a discrepancy where for these costs we check
isLegalElementTypeForRVV but for regular memory accesses we don't.

But for bf16 and f16 vectors we don't actually need the extension
support for loads and stores, so this adds a new function which takes
this into account.

For regular memory accesses we should probably also e.g. return an
invalid cost for i64 elements on zve32x, but it doesn't look like we
have tests for this yet.

We also should probably not be vectorizing these bf16/f16 loops to begin
with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this
is due to the scalar costs being too cheap. I've added tests for this in
a100f6367205c6a909d68027af6a8675a8091bd9 to fix in another patch.
2025-07-28 22:59:49 +08:00
Luke Lau
a100f63672 [RISCV] Add FP cost model tests for no zfhmin/zfbfmin. NFC
Vector costs without zvfhmin/zvfbfmin and zfhmin/zfbfmin are somehow
cheaper than with zvfhmin/zvfbfmin at smaller vector sizes, despite the
fact that the former are scalarized to libcalls. This adds a RUN line to
showcase this, splitting out the bfloat tests into their own functions
so we don't have duplicate lines for the regular float/double costs.
2025-07-28 11:27:30 +08:00
Luke Lau
e259ba8bec [RISCV] Modernize FP cost model tests. NFC
* Replace undef -> poison
* Remove overloaded type in intrinsic signature
2025-07-28 10:59:39 +08:00
Simon Pilgrim
3820206194
[CostModel][X86] Update SK_Reverse based on cost kinds (#150650)
When these were converted to CostKindTblEntry the throughput was mainly
copied to all cost kinds

Regenerated with my check_cost_tables.py helper script
2025-07-26 18:21:56 +01:00
Florian Hahn
9e7782db73
[LV,LAA] Add tests where RT checks are known false after expansion. 2025-07-26 14:17:35 +01:00
Simon Pilgrim
0fa0ce1f3a
[CostModel][X86] Update SK_Broadcast based on cost kinds (#150620)
When these were converted to CostKindTblEntry the throughput was mainly copied to all cost kinds

Regenerated with my check_cost_tables.py helper script
2025-07-26 13:52:47 +01:00
Ryotaro Kasuga
b06f10d96c
[DA] Add check for base pointer invariance (#148241)
As specified in #53942, DA assumes base pointer invariance in its
process. Some cases were fixed by #116628. However, that PR only
addressed the parts related to AliasAnalysis, so the original issue
persists in later stages, especially when the AliasAnalysis results in
`MustAlias`.
This patch insert an explicit loop-invariant checks for the base pointer
and skips analysis when it is not loop-invariant.

Fix the cases added in #148240.
2025-07-26 03:25:01 +09:00
Simon Pilgrim
81eb63ad7f [CostModel][X86] Complicate the cross lane single/two source shuffle masks
Try to ensure shuffle masks don't simplify too much to easier shuffle kinds when splitting
2025-07-25 17:25:08 +01:00
Simon Pilgrim
f1122a64c6 [CostModel][X86] load-broadcast.ll - regenerate checks for all cost kinds 2025-07-25 13:33:32 +01:00
David Green
66dd09a232 [AArch64] Change sve-fcmp.ll to test scalable vectors. NFC
Whilst testing fixed length vectors with +sve might be useful, this was just a
mistake in the generation of the test and should be using scalable vectors.
2025-07-25 10:09:54 +01:00
David Green
526b672a2c [AArch64] Add sve bf16 fpext and fptrunc costs. NFC 2025-07-24 19:09:30 +01:00
Luke Lau
61110e0f62
[TTI] Share value and type based llvm.vector.reverse cost (#150415)
We currently provide a generic cost for llvm.vector.reverse in BasicTTI
by reusing the reverse shuffle cost, but only for the value based cost.
Since the argument values aren't actually used, move this into the type
based costing method so that type based costing can also reuse it.
2025-07-24 22:06:40 +08:00
Luke Lau
0e42eaa668 [RISCV] Add type based RUN line for vector intrinsic cost model tests. NFC 2025-07-24 20:50:47 +08:00
David Green
52499bbd90 [ARM] Test all cost kinds in arith.ll. NFC 2025-07-23 22:01:46 +01:00
Nikita Popov
2c6eec219d [Tests] Avoid lifetime intrinsics on non-allocas (NFC)
Don't rely on auto-upgrade, instead either remove unnecessary
casts or remove no longer applicable tests.
2025-07-23 15:05:43 +02:00