182 Commits

Author SHA1 Message Date
David Green
c856e8def4 [ARM] Update cmps.ll, control-flow.ll and divrem.ll to use -cost-kind=all. NFC 2025-08-20 12:59:32 +01:00
David Green
d9d9d9ad19
[ARM][MVE] Add shuffle costs for LDn and STn instructions. (#145304)
LD2 is represented in IR as deinterleave-shuffle(load), and ST2 as
store(interleave-shuffle). Whilst the shuffle would be expensive in
general for MVE (it does not have zip/uzp instructions), it should be
treated as cheap when part of the LD2/ST2 pattern. This borrows some
code from the AArch64 backed to produce lower costs. (Some of which
still shows as higher than it should - that just shows how broken the
generic shuffle costs are at the moment, they would be lower if
getShuffleCost was called directly as opposed to going through
getInstructionCost).
2025-08-14 06:59:37 +01:00
David Green
fcae1ba775 [ARM] Use -cost-kind=all for cast and active_lane_mask tests. NFC 2025-08-05 08:14:47 +01:00
David Green
46526f879f [ARM] Use -cost-kind=all for arith-overflow.ll, arith-ssat.ll and arith-usat.ll. NFC 2025-07-29 15:08:45 +01:00
David Green
52499bbd90 [ARM] Test all cost kinds in arith.ll. NFC 2025-07-23 22:01:46 +01:00
David Green
0967957d7a
[CostModel] Handle all cost kinds in getCmpSelInstrCost (#148233)
Currently we always produce a cost of 1 for all CostKinds that are not
RecipThroughput, which can underestimate the cost if the type has a
higher legalization cost (like larger vectors). This relaxes it to cover
all cost kinds.
2025-07-15 18:08:52 +01:00
Simon Pilgrim
e25db2f6b3
[CostModel] getInstructionCost - match SK_InsertSubvector shuffle patterns before SK_Select (#145920)
More closely match improveShuffleKindFromMask's shuffle ordering by
trying to match a SK_InsertSubvector shuffles patterns before SK_Select
- both can match many of the same patterns, but its much easier to
recognise when a SK_InsertSubvector can be converted to SK_Select than
vice-versa.

Another step towards #145335 - which I'm hoping will allow us to
generalise improveShuffleKindFromMask and remove getInstructionCost's
shuffle matching entirely.
2025-06-26 20:15:51 +01:00
David Green
2545d6f723 [ARM] Add MVE test coverage for LD2/ST2 shuffle costs. NFC 2025-06-23 11:06:25 +01:00
Harald van Dijk
32752913b1
[ARM] Do not assume memory intrinsics specify alignment. (#138356) 2025-05-07 16:25:03 +01:00
Nashe Mncube
4ddc8df6ca
[CostModel][ARM]Adjust cost of muls in (U/S)MLAL and patterns (#122713)
PR #117350 made changes to the SLP vectorizer which introduced a
regression on some ARM benchmarks. Investigation narrowed it down to
suboptimal codegen for benchmarks that previously only used scalar (U/S)MLAL
instructions. The linked change meant the SLPVectorizer thought that
these could be vectorized. This change makes the cost of muls in
(U/S)MLAL patterns slightly cheaper to make sure scalar instructions are
preferred in these cases over SLP vectorization on targets supporting DSP
2025-03-19 12:25:44 +00:00
David Green
b2165f214e
[CostModel] Account for power-2 urem in funnel shift costs (#127037)
As can be seen in https://godbolt.org/z/qvMqY79cK, a urem by a power-2
constant will be code-generated as an And of a mask. The cost model for
funnel shifts tries to account for that by passing OP_PowerOf2 as the
operand info for the second operand. As far as I can tell returning a
lower cost for urem with a OP_PowerOf2 is only implemented on X86
though.

This patch short-cuts that by calling getArithmeticInstrCost(And, ..)
directly when we know the typesize will be a power-of-2. This is an
alternative to the patch in #126912 which is a more general solution for
power-2 udiv/urem costs, this more narrowly just fixes funnel shifts.
2025-02-13 16:05:00 +00:00
Paul Walker
38fffa630e
[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548) 2024-11-06 11:53:33 +00:00
Matt Arsenault
0bc6407748
TTI: Check legalization cost of add/sub overflow ISD nodes (#100518) 2024-08-08 23:44:07 +04:00
Matt Arsenault
47d831f2c9
TTI: Check legalization cost of min/max ISD nodes (#100514)
Instead of counting the cost of the assumed expansion.

The AMDGPU costs for the i64 case look too high to me.

Preserve default expansion logic
2024-08-08 17:06:11 +04:00
Matt Arsenault
4f067dc467
TTI: Fix special casing vectorization costs of saturating add/sub (#97463) 2024-08-06 17:33:52 +04:00
David Green
dcd246cbde
[ARM] Add scalar add_sat costs. (#100988)
These can usually generate:
 - qadd / qsub for signed i32 scalars
- uqadd16 / qadd16 / uqsub16 / qsub16 with an extend for signed/unsigned
i8/i16
 - Are expanded to an add + cmp + sel otherwise

This can lead to differences in unrolling etc, but should be a better
cost for the instructions.
2024-08-05 18:56:04 +01:00
Chris Copeland
651bdb96b1
[ARM] Armv8-R does not require fp64 or neon. (#88287)
This was [addressed for AArch64
here](https://github.com/llvm/llvm-project/pull/79004), but the same
applies to ARM.

Move the enablement of neon+fp64 to `-mcpu=cortex-r52`, which optionally
supports these features.
2024-05-07 11:48:30 +01:00
Kevin P. Neal
9c9f94063c [FPEnv][CostModel] Correct strictfp test.
Correct strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

These tests needed the strictfp attribute added to some function
definitions.

Test changes verified with D146845.
2024-04-02 13:53:56 -04:00
David Green
7433120137
[CostModel] Mark ssa_copy as free (#75294)
These are intrinsics are only used ephemerally and be should be given a
zero cost.
2023-12-13 11:24:47 +00:00
David Green
b003fed283 [CostModel] Add some ssa.copy costmodel tests. NFC 2023-12-13 07:26:17 +00:00
Alex Richardson
e39f6c1844 [opt] Infer DataLayout from triple if not specified
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.

One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.

Differential Revision: https://reviews.llvm.org/D141060
2023-10-26 12:07:37 -07:00
David Green
233fb987fc [ARM] Improve bitwise reduction costs
This adds some basic and/or/xor reduction costs for NEON/MVE, handling them
like other reductions where vector operations are used to reduce to legal
sizes, followed by an optional VREV+VAND/VORR/VEOR step and scalarization from
there.
2023-09-04 16:22:52 +01:00
David Green
4cef24a886 [ARM] Improve reduction integer min/max costs
This adds some basic smin/smax/umin/umax reduction costs for MVE/NEON, similar
to the existing Add reduction costs. They follow the same style as Add
reductions, but include a higher cost as the costs tend to be dependant on the
element size for vminv/vmaxv. These costs may not be precise, but will be more
inline than the default that extracts each element.
2023-09-04 15:47:06 +01:00
David Green
2955cc15ff [ARM] Improve costs for FMin/Max reductions
Similar to the other reductions, this changes the cost of fmin/fmax reductions
under MVE/NEON to perform vector operations until the types need to be
scalarized. The fp16 vectors can perform a VREV+FMIN/FMAX to skip a step of the
reduction, and otherwise need lanewise extract fro the top lanes.
2023-09-04 12:49:13 +01:00
David Green
4530f02916 [ARM] Improve reduction fadd/fmul costs
This adds some basic fadd/fmul reduction costs for MVE/NEON. It reduces by
halving the vector size until it it gets scalarized, with some additional costs
for fp16 which may require extracting the top lanes.

Differential Revision: https://reviews.llvm.org/D159367
2023-09-04 11:37:14 +01:00
David Green
5afb161ed5 [ARM] Add various vector reduce costmodel tests. NFC
See D159367 and the followups.
2023-09-04 10:50:58 +01:00
David Green
12025cef3e [CostModel] Use min/max intrinsics for vecreduce.min/max costs
This changes the costmodelling of the vecreduce.min/max nodes to use the costs
of the relevant min/max intrinsics instead of expanding them to compare and
selects. The getMinMaxReductionCost have changed to take a Opcode for the
relevant intrinsic, dropping the IsUnsigned and CondTy parameters as they are
no longer needed.

A follow up patch will add some basic fminimum/fmaximum costmodelling.

Differential Revision: https://reviews.llvm.org/D153547
2023-07-04 15:02:30 +01:00
Luke Lau
a68dcd09e8 [TTI] Use users of GEP to guess access type in getGEPCost
Currently getGEPCost uses the target type of the GEP as a heuristic for
the type that will be accessed, to pass onto isLegalAddressingMode.
Targets use this to work out if a GEP can then be folded into the
load/store instruction that uses the GEP.
For example, on RISC-V loads and stores can have an offset added to a
base register folded into a single instruction, so the following GEP is
free:

%p = getelementptr i32, ptr %base, i32 42       ; getInstructionCost = 0
%x = load i32, ptr %p                           ; getInstructionCost = 1
------------------------------------------------------------------------
lw t0, a0(42)

However vector loads and stores cannot have an offset folded into them,
so the following GEP is costed:

%p = getelementptr <2 x i32>, ptr %base, i32 42 ; getInstructionCost = 1
%x = load <2 x i32>, ptr %p                     ; getInstructionCost = 1
------------------------------------------------------------------------
addi  a0, 42
vle32 v8, (a0)

The issue arises whenever there is a mismatch between the target type of
the GEP and the type that is actually accessed:

%p = getelementptr i32, ptr %base, i32 42       ; getInstructionCost = 0
%x = load <2 x i32>, ptr %p                     ; getInstructionCost = 1
------------------------------------------------------------------------
addi  a0, 42
vle32 v8, (a0)

Even though this GEP will result in an add instruction, because TTI
thinks it's loading an i32, it will think it can be folded and not
charge for it.

The target type can become mismatched with the memory access during
transformations, noticeably during SLP where a scalar base pointer will
be reused to perform a vector load or store.

This patch adds an optional AccessType argument to getGEPCost which
allows the type of memory accessed by users to be passed in as a hint,
so that we can more accurately determine if the GEP can be folded into
its users.

If AccessType is not provided, getGEPCost falls back to the old
behaviour of using the PointeeType to guess the memory access type. This
can be revisited in a later patch.

Also for now, only GEPs with exactly one user use the access type hint.
Whilst we could look through all users and use all access types to
determine if we can fold the GEP, this patch avoids doing so to prevent
O(N) behaviour.

Differential Revision: https://reviews.llvm.org/D149889
2023-06-29 13:44:37 +01:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
David Green
a39d2d50af [ARM] Increase the Scalarized cost of masked gather/scatter operations
If a gather/scatter is masked and will need to be scalarized then the cost
should be higher than we currently produce. An additional cost for scalarizing
the mask, extracting i1s and branching on the result needs to be added, which
this patch gives a cost of 5.

Differential Revision: https://reviews.llvm.org/D147331
2023-04-11 14:49:46 +01:00
Nikita Popov
68c50b111d [CostModel] Convert some tests to opaque pointers (NFC) 2022-12-15 09:50:34 +01:00
David Green
de6dfbbb30 [ARM] Fix for MVE i128 vector icmp costs.
We were hitting an assert as the legalied type needn't be a vector.

Fixes #58364
2022-10-14 18:49:25 +01:00
Simon Pilgrim
fdec50182d [CostModel] Replace getUserCost with getInstructionCost
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.

Original Patch by @samparker (Sam Parker)

Differential Revision: https://reviews.llvm.org/D79483
2022-08-18 11:55:23 +01:00
Simon Pilgrim
4178e33470 [CostModel] Update RUN -passes=* to double quotes to appease update scripts on windows
DOS really doesn't like `` quotes to be used in command lines

Some prep work as I'm intending to resurrect D79483 soon
2022-08-10 17:54:06 +01:00
David Green
0a11ad2aa8 [ARM] Expand MVE i1 fptoint and inttofp if mve.fp is not present.
If MVE.fp is not present then we cannot select the vector i1 fp
operations to VCMP instructions, so need to expand.
2022-07-11 13:03:30 +01:00
David Green
438ffdb821 [ARM] Switch the costs of mve1beat and mve4beat
These three subtarget features are meant to control where MVE
instructions take 1 vs 2 vs 4 architectural beats. The mve1beat feature
is described as "Model MVE instructions as a 1 beat per tick
architecture", meaning MVE instruction will execute over 4 cycles.
mve4beat is the opposite where the entire 4 beats of the MVE instruction
execute in a single cycle. The costs for the two were backwards though,
not matching the cycle counts like they should. This patch switches the
costs on the two to bring them in-line with expectations.

Differential Revision: https://reviews.llvm.org/D129141
2022-07-07 16:10:00 +01:00
David Green
53be6ab25c [ARM] Fix MVE getShuffleCost legalized type check
The MVE shuffle costing for VREV instructions was making incorrect
assumptions as to legalized vector types remaining as vectors. Add a
quick check to ensure they are indeed vectors before attempting to get
the number of elements.
2022-06-07 14:36:04 +01:00
David Green
b4dd9fc370 [ARM] Cost modelling for MVE vector fptoi_sat
Building on top of D125665, this adds MVE costs for fptosi.sat and
fptoui.sat, providing MVE is available and the types are legal.

Differential Revision: https://reviews.llvm.org/D125666
2022-05-20 11:00:34 +01:00
David Green
80aab0312a [ARM] Cost modelling for scalar fptoi_sat
Similar to D124357, this adds some cost modelling for fptoi_sat for Arm
targets. Where VFP2 is available (and FP64/FP16 for the relevant types),
the operations are legal as the Arm instructions naturally saturate.
Otherwise they will need an extra smin/smax clamp, similar to AArch64.

Differential Revision: https://reviews.llvm.org/D125665
2022-05-19 19:53:21 +01:00
David Green
4a8c13a6f4 [CostModel] Add basic fptoi_sat costs
This adds some basic fptosi_sat and fptoui_sat target independent cost
modelling. The fptosi_sat is modelled as a fmin/fmax to saturate the
value, followed by a fp convert. The signed values then have an
additional fcmp+select for handling Nan correctly.

The AArch64/Arm costs may be more incorrect, as the instruction exist
natively. This can be fixed with target specific cost updates.

Differential Revision: https://reviews.llvm.org/D124269
2022-04-27 09:30:00 +01:00
David Green
1159984802 [CostModel] Add fptoi_sat costmodel tests. NFC 2022-04-25 18:44:35 +01:00
David Sherwood
e7b89c2fc3 Add BasicTTIImpl cost model for llvm.get.active.lane.mask intrinsic
The vectoriser sometimes generates predicated vector loops using
the llvm.get.active.lane.mask intrinsic so it's important that we
are able to calculate a valid cost for the call instruction. When
SVE is enabled we are able to use a single whilelo instruction
for some vector types - in such cases I've marked the cost as 1.
For all other cases I've set the cost according to how the intrinsic
will be expanded.

Tests added here:

  Analysis/CostModel/AArch64/sve-intrinsics.ll
  Analysis/CostModel/ARM/active_lane_mask.ll
  Analysis/CostModel/RISCV/active_lane_mask.ll

Differential Revision: https://reviews.llvm.org/D121109
2022-03-14 09:35:05 +00:00
Arthur Eubanks
15ba588d6d [test] Migrate '-analyze -cost-model' to '-passes=print<cost-model>' 2022-02-09 15:42:16 -08:00
Andrew Litteken
4ff4e7ea30 [CostModel] Use cost of target trunc type when only it is the only use of a non-register sized load
The code size cost model for most targets uses the legalization cost for the type of the pointer of a load. If this load is followed directly by a trunc instruction, and is the only use of the result of the load, only one instruction is generated in the target assembly language. This adds a check for this case, and uses the target type of the trunc instruction if so.

This did not show any changes in CTMark code size benchmarks.

Reviewers: paquette, samparker, dmgreen

Differential Revision: https://reviews.llvm.org/D109388
2022-01-12 18:03:50 -06:00
David Green
255ad73424 [ARM] Make MVE v2i1 predicates legal
MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the
same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the
two halves. This was never treated as a legal type in llvm in the past
as there are not many 64bit instructions and no 64bit compares. There
are a few instructions that could use it though, notably a VSELECT (as
it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for
similar reasons, some gathers/scatter and long multiplies and VCTP64
instructions.

This patch goes through and makes v2i1 a legal type, handling all the
cases that fall out of that. It also makes VSELECT legal for v2i64 as a
side benefit. A lot of the codegen changes as a result - usually in way
that is a little better or a little worse, but still expensive. Costs
can change a little too in the process, again in a way that expensive
things remain expensive. A lot of the tests that changed are mainly to
ensure correctness - the code can hopefully be improved in the future
where it comes up in practice.

The intrinsics currently remain using the v4i1 they previously did to
emulate a v2i1. This will be changed in a followup patch but this one
was already large enough.

Differential Revision: https://reviews.llvm.org/D114449
2021-12-03 14:05:41 +00:00
Zarko Todorovski
7f7dac7126 [NFC][llvm] Inclusive language: reword uses of sanity test and check
Part of continuing work to use more inclusive language. Reworded uses
of sanity check and sanity test in llvm/test/
2021-11-25 07:21:42 -05:00
David Green
309f1e4ac8 [ARM] Add datalayout to costmodel tests. NFC
This adds a sensible datalayout to the ARM cost model tests, to prevent
the costs reported being incorrect for the size of pointers.
2021-11-16 09:49:42 +00:00
Simon Pilgrim
7bd097fd1e [CostModel][TTI] Fix ops used for generic smulo/umulo cost expansion
Fix copy+pasta that was checking for smul_fix instead of smul_with_overflow to detected signed values.

The LShr is performed on the extended type as we use it to truncate+extract the upper/hi bits of the extended multiply.

More closely matches the default expansion from TargetLowering::expandMULO
2021-10-06 19:11:32 +01:00
Craig Topper
765348298c [CostModel] Update default cost model for sadd/ssub overflow to match TargetLowering
The expansion for these was updated in https://reviews.llvm.org/D47927 but the cost model was not adjusted.

I believe the cost model was also incorrect for the old expansion.
The expansion prior to D47927 used 3 icmps using LHS, RHS, and Result
to calculate theirs signs. Then 2 icmps to compare the signs. Followed
by an And. The previous cost model was using 3 icmps and 2 selects.
Digging back through git blame, those 2 selects in the cost model used to
be 2 icmps, but were changed in https://reviews.llvm.org/D90681

Differential Revision: https://reviews.llvm.org/D110739
2021-09-30 09:41:14 -07:00
Simon Pilgrim
7397dcb403 [TTI] Add basic SK_InsertSubvector shuffle mask recognition
This patch adds an initial ShuffleVectorInst::isInsertSubvectorMask helper to recognize 2-op shuffles where the lowest elements of one of the sources are being inserted into the "in-place" other operand, this includes "concat_vectors" patterns as can be seen in the Arm shuffle cost changes. This also helped fix a x86 issue with irregular/length-changing SK_InsertSubvector costs - I'm hoping this will help with D107188

This doesn't currently attempt to work with 1-op shuffles that could either be a "widening" shuffle or a self-insertion.

The self-insertion case is tricky, but we currently always match this with the existing SK_PermuteSingleSrc logic.

The widening case will be addressed in a follow up patch that treats the cost as 0.

Masks with a high number of undef elts will still struggle to match optimal subvector widths - its currently bounded by minimum-width possible insertion, whilst some cases would benefit from wider (pow2?) subvectors.

Differential Revision: https://reviews.llvm.org/D107228
2021-08-02 11:23:44 +01:00