1963 Commits

Author SHA1 Message Date
David Green
c856e8def4 [ARM] Update cmps.ll, control-flow.ll and divrem.ll to use -cost-kind=all. NFC 2025-08-20 12:59:32 +01:00
David Green
5836bae463
[AArch64] Change the cost of fma and fmuladd to match fmul. (#152963)
As fmul and fmadd are so similar, their performance characteristics tend
to be the same on most platforms, at least in terms of reciprocal
throughputs. Processors capable of performing a given number of fmul per
cycle can usually perform the same number of fma, with the extra add
being relatively simple on top. This patch makes the scores of the two
operations the same, which brings the throughput cost of a fma/fmuladd
to 2, and the latency to 3, which are the defaults for fmul.

Note that we might also want to change the throughput cost of a fmul to
1, as most processors have ample bandwidth for them, but they should
still stay in-line with one another.
2025-08-14 21:53:45 +01:00
David Green
d9d9d9ad19
[ARM][MVE] Add shuffle costs for LDn and STn instructions. (#145304)
LD2 is represented in IR as deinterleave-shuffle(load), and ST2 as
store(interleave-shuffle). Whilst the shuffle would be expensive in
general for MVE (it does not have zip/uzp instructions), it should be
treated as cheap when part of the LD2/ST2 pattern. This borrows some
code from the AArch64 backed to produce lower costs. (Some of which
still shows as higher than it should - that just shows how broken the
generic shuffle costs are at the moment, they would be lower if
getShuffleCost was called directly as opposed to going through
getInstructionCost).
2025-08-14 06:59:37 +01:00
Luke Lau
81b576e66b
[RISCV] Cost casts with illegal types that can't be legalized (#153030)
If we have a floating point vector and no zve32f/zve64f/zve64d, we can
end up with an invalid type-legalization cost from
getTypeLegalizationCost.

Previously this triggered an assertion that the type must have been
legalized if the "legal" type is a vector, but in this case when it's
not possible to legalize the original type is spat back out.

This fixes it by just checking that the legalization cost is valid.

We don't have much testing for zve64x, so we may have other places in
the cost model with this issue.

Fixes #153008
2025-08-12 00:29:39 +08:00
David Green
7f1638efc1
[AArch64] Generalize costing for FP16 instructions (#150033)
This extracts the code for modelling a fp16 operation as
`fptrunc(fpop(fpext,fpext))` into a new function named
getFP16BF16PromoteCost so that it can be reused by the
arithmetic instructions. The function takes a lambda to
calculate the cost of the operation with the promoted type.
2025-08-08 13:40:07 +01:00
Graham Hunter
de72cca671
[CostModel] Provide a default model for histogram intrinsics (#149348)
Since we scalarize these intrinsics when the target does not support
them, we should model that for costing purposes.
2025-08-08 11:00:00 +01:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
David Green
6a32e2225e [AArch64] Add SVE fmuladd and fma cost tests. NFC 2025-08-08 08:52:28 +01:00
David Green
fcae1ba775 [ARM] Use -cost-kind=all for cast and active_lane_mask tests. NFC 2025-08-05 08:14:47 +01:00
David Green
b30d5315b7
[AArch64] Add better fcmp costs for expanded predicates (#147940)
Certain fcmp predicates need to be expanded into multiple operations and
or'd together. This adds some more accurate cost modelling for them
based on the predicate. Unsupported operations are given the cost of a
libcall and the latency is set to 2 as that seemed to be fairly common
between different CPUs.
2025-08-04 13:42:57 +01:00
David Green
e136fb04f2
[AArch64] Add sve bf16 fpext and fpround costs. (#150485)
This prevents them from generating Invalid costs, as generating the
instructions seems to work fine with and without +bf16. The costs are
mostly taken from the number of instructions (minus ptrue and constants).
2025-08-04 09:47:41 +01:00
Ramkumar Ramachandra
e7200c734d
[LV] Pre-commit test for #151664 (#151671)
Hoisted vector instructions are costed incorrectly.
2025-08-01 17:09:11 +01:00
Ramkumar Ramachandra
ec0c79df59
[RISCV] Fix bug in [l](lrint|lround) vector-cost (#151298)
Follow up on a review of bd66fd0 ([CostModel/RISCV] Fix costs of vector
[l](lrint|lround)) post-landing to fix a subtle problem with the cost
of vector [l](lrint|lround). We should use source LMUL in the case of
a narrowing op.

Co-authored-by: Luke Lau <luke@igalia.com>
2025-07-30 19:41:11 +01:00
Luke Lau
2a5ac19605 Revert "[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882)"
This reverts commit fe4f6c1a58ab4f00a88a97af01000b6783b573ee, but leaves
the tests that were added.

The original commit mistakenly assumed that if regular bf16/f16 loads
and stores could be lowered without zvfbfmin/zvfhmin, then so too could
masked loads/stores and gathers/scatters.

However SelectionDAG can't actually type-legalize masked.load/stores
since it needs to be done in ScalarizeMaskedMemIntrinPass.

This was causing crashes on IREE because we now returned true for
isLegalMaskedLoadStore.

The original intent of this was to remove a discrepancy in the loop
vectorizer tests whenever predication was enabled, but this has gone
away after 92d09245d61dce80d3e68a27cc34d5fc6f062c93. So I don't think we
need to reapply this patch.
2025-07-30 13:29:47 +08:00
Ramkumar Ramachandra
bd66fd0d01
[CostModel/RISCV] Fix costs of vector [l](lrint|lround) (#146058)
Take the actual instruction cost into account, and don't fallthrough to
code that doesn't apply to [l]lrint. Also strip invalid costs for
[b]f16, as a companion to #146507, and unify it with [l]lround costs as
a companion to #147713.
2025-07-29 19:22:11 +01:00
David Green
46526f879f [ARM] Use -cost-kind=all for arith-overflow.ll, arith-ssat.ll and arith-usat.ll. NFC 2025-07-29 15:08:45 +01:00
Luke Lau
fe4f6c1a58
[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882)
When vectorizing with predication some loops that were previously
vectorized without zvfhmin/zvfbfmin will no longer be vectorized because
the masked load/store or gather/scatter cost returns illegal.

This is due to a discrepancy where for these costs we check
isLegalElementTypeForRVV but for regular memory accesses we don't.

But for bf16 and f16 vectors we don't actually need the extension
support for loads and stores, so this adds a new function which takes
this into account.

For regular memory accesses we should probably also e.g. return an
invalid cost for i64 elements on zve32x, but it doesn't look like we
have tests for this yet.

We also should probably not be vectorizing these bf16/f16 loops to begin
with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this
is due to the scalar costs being too cheap. I've added tests for this in
a100f6367205c6a909d68027af6a8675a8091bd9 to fix in another patch.
2025-07-28 22:59:49 +08:00
Luke Lau
a100f63672 [RISCV] Add FP cost model tests for no zfhmin/zfbfmin. NFC
Vector costs without zvfhmin/zvfbfmin and zfhmin/zfbfmin are somehow
cheaper than with zvfhmin/zvfbfmin at smaller vector sizes, despite the
fact that the former are scalarized to libcalls. This adds a RUN line to
showcase this, splitting out the bfloat tests into their own functions
so we don't have duplicate lines for the regular float/double costs.
2025-07-28 11:27:30 +08:00
Luke Lau
e259ba8bec [RISCV] Modernize FP cost model tests. NFC
* Replace undef -> poison
* Remove overloaded type in intrinsic signature
2025-07-28 10:59:39 +08:00
Simon Pilgrim
3820206194
[CostModel][X86] Update SK_Reverse based on cost kinds (#150650)
When these were converted to CostKindTblEntry the throughput was mainly
copied to all cost kinds

Regenerated with my check_cost_tables.py helper script
2025-07-26 18:21:56 +01:00
Simon Pilgrim
0fa0ce1f3a
[CostModel][X86] Update SK_Broadcast based on cost kinds (#150620)
When these were converted to CostKindTblEntry the throughput was mainly copied to all cost kinds

Regenerated with my check_cost_tables.py helper script
2025-07-26 13:52:47 +01:00
Simon Pilgrim
81eb63ad7f [CostModel][X86] Complicate the cross lane single/two source shuffle masks
Try to ensure shuffle masks don't simplify too much to easier shuffle kinds when splitting
2025-07-25 17:25:08 +01:00
Simon Pilgrim
f1122a64c6 [CostModel][X86] load-broadcast.ll - regenerate checks for all cost kinds 2025-07-25 13:33:32 +01:00
David Green
66dd09a232 [AArch64] Change sve-fcmp.ll to test scalable vectors. NFC
Whilst testing fixed length vectors with +sve might be useful, this was just a
mistake in the generation of the test and should be using scalable vectors.
2025-07-25 10:09:54 +01:00
David Green
526b672a2c [AArch64] Add sve bf16 fpext and fptrunc costs. NFC 2025-07-24 19:09:30 +01:00
Luke Lau
61110e0f62
[TTI] Share value and type based llvm.vector.reverse cost (#150415)
We currently provide a generic cost for llvm.vector.reverse in BasicTTI
by reusing the reverse shuffle cost, but only for the value based cost.
Since the argument values aren't actually used, move this into the type
based costing method so that type based costing can also reuse it.
2025-07-24 22:06:40 +08:00
Luke Lau
0e42eaa668 [RISCV] Add type based RUN line for vector intrinsic cost model tests. NFC 2025-07-24 20:50:47 +08:00
David Green
52499bbd90 [ARM] Test all cost kinds in arith.ll. NFC 2025-07-23 22:01:46 +01:00
Elvis Wang
324773e238
[RISCV][TTI] Implement vector costs for llvm.fpto{u|s}i.sat(). (#143655)
This patch implement vector costs for `llvm.fptoui.sat()` in RISCV TTI.
2025-07-23 09:52:33 +08:00
Nikita Popov
92c55a315e
[IR] Only allow lifetime.start/end on allocas (#149310)
lifetime.start and lifetime.end are primarily intended for use on
allocas, to enable stack coloring and other liveness optimizations. This
is necessary because all (static) allocas are hoisted into the entry
block, so lifetime markers are the only way to convey the actual
lifetimes.

However, lifetime.start and lifetime.end are currently *allowed* to be
used on non-alloca pointers. We don't actually do this in practice, but
just the mere fact that this is possible breaks the core purpose of the
lifetime markers, which is stack coloring of allocas. Stack coloring can
only work correctly if all lifetime markers for an alloca are
analyzable.

* If a lifetime marker may operate on multiple allocas via a select/phi,
we don't know which lifetime actually starts/ends and handle it
incorrectly (https://github.com/llvm/llvm-project/issues/104776).
* Stack coloring operates on the assumption that all lifetime markers
are visible, and not, for example, hidden behind a function call or
escaped pointer. It's not possible to change this, as part of the
purpose of lifetime markers is that they work even in the presence of
escaped pointers, where simple use analysis is insufficient.

I don't think there is any way to have coherent semantics for lifetime
markers on allocas, while also permitting them on arbitrary pointer
values.

This PR restricts lifetimes to operate on allocas only. As a followup, I
will also drop the size argument, which is superfluous if we always
operate on an alloca. (This change also renders various code handling
lifetime markers on non-alloca dead. I plan to clean up that kind of
code after dropping the size argument as well.)

In practice, I've only found a few places that currently produce
lifetimes on non-allocas:

* CoroEarly replaces the promise alloca with the result of an intrinsic,
which will later be replaced back with an alloca. I think this is the
only place where there is some legitimate loss of functionality, but I
don't think this is particularly important (I don't think we'd expect
the promise in a coroutine to admit useful lifetime optimization.)
* SafeStack moves unsafe allocas onto a separate frame. We can safely
drop lifetimes here, as SafeStack performs its own stack coloring.
* Similar for AddressSanitizer, it also moves allocas into separate
memory.
* LSR sometimes replaces the lifetime argument with a GEP chain of the
alloca (where the offsets ultimately cancel out). This is just
unnecessary. (Fixed separately in
https://github.com/llvm/llvm-project/pull/149492.)
* InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast
of an alloca. I don't think this is necessary.
2025-07-21 15:04:50 +02:00
David Green
828a867ee0
[AArch64] Reduce the costs of and/or/xor reductions (#148553)
Since the costs were added the codegen for i8/i16 and/or/xor reductions
has improved. This updates the cost model to produce the same costs in
terms of number of instructions.
2025-07-16 09:59:36 +01:00
David Green
0967957d7a
[CostModel] Handle all cost kinds in getCmpSelInstrCost (#148233)
Currently we always produce a cost of 1 for all CostKinds that are not
RecipThroughput, which can underestimate the cost if the type has a
higher legalization cost (like larger vectors). This relaxes it to cover
all cost kinds.
2025-07-15 18:08:52 +01:00
Florian Hahn
02d3738be9
[AArch64,TTI] Remove RealUse check for vector insert/extract costs. (#146526)
getVectorInstrCostHelper would return costs of zero for vector
inserts/extracts that move data between GPR and vector registers, if
there was no 'real' use, i.e. there was no corresponding existing
instruction.

This meant that passes like LoopVectorize and SLPVectorizer, which
likely are the main users of the interface, would understimate the cost
of insert/extracts that move data between GPR and vector registers,
which has non-trivial costs.

The patch removes the special case and only returns costs of zero for
lane 0 if it there is no need to transfer between integer and vector
registers.

This impacts a number of SLP test, and most of them look like general
improvements.I think the change should make things more accurate for any
AArch64 target, but if not it could also just be Apple CPU specific.

I am seeing +2% end-to-end improvements on SLP-heavy workloads.

PR: https://github.com/llvm/llvm-project/pull/146526
2025-07-15 15:19:27 +01:00
David Green
58d79aaba6
[AArch64] Guard against non-simple types in udiv sve costs. (#148580)
The code here probably needs to change to handle types more uniformly,
but this patch prevents it from trying to use a simple type where it does
not exist.

Fixes #148438.
2025-07-15 10:25:08 +01:00
David Green
a647fd7dda [AArch64] Add a cost for v2i32 vecreduce.add.
These can lower to a addp. The score does not alter with this patch, but this
should help keep the scores the same with #146526.
2025-07-13 08:06:10 +01:00
Luke Lau
7eb14d9dd1
[TTI] Fix value-based BasicTTIImpl vp.{gather,scatter} costing (#148020)
After #147677 we now preserve value based costing for vp intrinsics
instead of switching it to type based costing.

However for vp.gather and vp.scatter, even though they fallback to their
functionally equivalent masked.gather and masked.scatter, the number of
arguments are different due to the alignment being a dedicated argument.

This caused a crash detected at
https://lab.llvm.org/staging/#/builders/210/builds/988

Thix fixes it by explicitly handling the two intrinsics and adding test
coverage.

Note that the type based costing isn't yet implemented for
masked.gather/masked.scatter so it doesn't show up correctly.
2025-07-11 17:29:53 +08:00
Luke Lau
763db3841d
[TTI] Handle experimental.vp.reverse in BasicTTIImpl (#147868)
A experimental_vp_reverse isn't exactly functionally the same as
vector_reverse, so previously it wasn't getting picked up by the generic
VP costing code that reuses the non-VP equivalents.

But for costing purposes it's good enough so we can reuse it.

The type-based cost for vector_reverse is still incorrect and should be
fixed in another patch.
2025-07-10 22:10:02 +08:00
Luke Lau
1d8b51667a
[TTI] Don't drop VP intrinsic args when delegating to non-vp equivalent (#147677)
Previously we only carried the type arguments which caused value-based
costs to be inadvertantly changed into type-based costs.

I'm just using vp.is.fpclass as an example intrinsic for now since the
type based cost seems to differ from the value based cost, and most
normal intrinsics e.g. min/max have the same value + type based cost.

We still need to handle the cost properly for is.fpclass in a second
patch.

This is needed for an upcoming patch to handle the cost of
llvm.experimental.vp.reverse which suffers from the same problem.
2025-07-10 19:41:49 +08:00
David Green
2052d7bf9a [AArch64] Expand fcmp cost model tests. NFC 2025-07-10 12:13:35 +01:00
Luke Lau
da8d7f49ff
[RISCV] Unify non-vp and vp rounding intrinsic costing (#147872)
Currently we have slightly different costing for the vp and non-vp
version of the rounding intrinsics.

We can delete this code and use the generic BasicTTIImpl code for the vp
intrinsics which falls back to the non-vp versions.

I'm not sure if the zvfh costing is correct, this should probably be
fixed in a follow up patch. At the moment the non-vp cost is more
important since it is what the loop vectorizer will use.
2025-07-10 15:46:05 +08:00
Elvis Wang
213735487e
[TTI] Check type legalization of both src and result for fpto{u|s}i.sat. (#147657)
For the cast instructions such ass `fptoui.sat`, `fptosi.sat`, need to
check
both type of the source and the result type can be lowering legally. If
one of them is invalid, return invalid cost.

--
Fixes https://github.com/llvm/llvm-project/issues/142973.

---------

Co-authored-by: Craig Topper <craig.topper@sifive.com>
2025-07-10 14:44:26 +08:00
David Green
10f782456e
[AArch64] Enable other cost kinds for getCmpSelInstrCost. (#144375)
This removes the CostKind == TCK_RecipThroughput limitation from
getCmpSelInstrCost, allowing it to return more accurate costs for CodeSize and
Lat / SizeLat. Especially for larger vectors under CodeSize, the returned costs
are currently 1, not the legalization cost.
2025-07-10 07:12:21 +01:00
Luke Lau
4c7cfe3fdb [RISCV] Remove intrinsic declares from costmodel tests. NFC
Declaring an intrinsic is no longer needed these days, and for intrinsic
tests we end up with a lot of them due to the various type overloads.
2025-07-08 18:31:24 +08:00
Ramkumar Ramachandra
1c8283a30c
[BasicTTIImpl] Add cost entries for ldexp, [l]lround (#146373)
The ldexp intrinsic is incorrectly costed as 1, due to a missing entry
in BasicTTIImpl::getTypeBasedIntrinsicCost: fix this. While at it, fix
missing entries for [l]lround, which is costed as 1 anyway, making the
change non-functional.
2025-07-07 09:26:30 +01:00
David Green
aa9e02cc10 [AArch64] Add lrint and lround costmodel tests. NFC
This adds some costmodel tests for lrint, llrint, lround and llround.
2025-07-06 17:50:01 +01:00
David Green
f1549befc1 [AArch64] Add ldexp cost-model tests. NFC
Add costmodel test coverage for ldexp. The codegen for SVE is not implemented
yet. Costs should improve with #146373.
2025-07-06 16:42:47 +01:00
Graham Hunter
85bc868417
[AArch64][TTI] Reduce cost for splatting whole first vector segment (SVE) (#145701)
Improve cost modeling for splatting the first 128b segment.
2025-07-02 09:51:56 +01:00
Ramkumar Ramachandra
1e2ddc8a3d
[CostModel/RISCV] Add tests for ldexp, [l]lround (#146108) 2025-06-28 11:40:41 +01:00
Simon Pilgrim
08f074a59f
[TTI] getInstructionCost - consistently treat all undef/poison shuffle masks as free (#146039)
#145920 exposed an issue where we were treating undef/poison shuffles as SK_Select kinds
2025-06-27 10:53:01 +01:00
Gheorghe-Teodor Bercea
3df36a2b18
[AMDGPU] Enable vectorization of i8 values. (#134934)
This patch adjusts the cost model to account for the ability of the
AMDGPU optimizer to group together i8 values into i32 values.

Co-authored-by: Erich Keane <ekeane@nvidia.com>
2025-06-26 19:15:31 -04:00