466 Commits

Author SHA1 Message Date
David Green
8274be509e [AArch64] Remove header dependencies of AArch64ISelLowering.h. NFC
This patch aims to reduce the include used by AArch64ISelLowering, allowing it
to be included by unittests so that they can reference the AArch64ISD nodes.
It:
 - Moves the inclusion of AArch64SMEAttributes.h to the uses.
 - Moves LowerPtrAuthGlobalAddressStatically to a static function, so that
   AArch64PACKey is not required in the header.
 - Moves the definitions of getExceptionPointerRegister to the cpp file, to
   remove the reference of AArch64::X0.
2024-10-28 18:53:37 +00:00
Graham Hunter
091a235ec5
Revert "[AArch64][SVE] Enable max vector bandwidth for SVE" (#112873)
Reverts llvm/llvm-project#109671

Reverting due to some performance regressions on neoverse-v1.
2024-10-18 11:05:55 +01:00
Danila Malyutin
1a609052b6
[AArch64][InstCombine] Eliminate redundant barrier intrinsics (#112023)
If there are no memory ops on the path from one dmb to another then one
barrier can be eliminated.
2024-10-17 21:04:04 +04:00
Graham Hunter
c980a20b10
[AArch64][SVE] Enable max vector bandwidth for SVE (#109671)
Returns true for shouldMaximizeVectorBandwidth when the register type
is a scalable vector and SVE or streaming SVE are available.
2024-10-17 13:17:24 +01:00
Philip Reames
b3c687b4e9
[LV] Check early for supported interleave factors with scalable types [nfc] (#111592)
Previously, the cost model was returning an invalid cost. This simply
moves the check from one place to another. This is mostly to make the
cost modeling code a bit easier to follow.

---------

Co-authored-by: Mel Chen <mel.chen@sifive.com>
2024-10-15 07:37:46 -07:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Jeffrey Byrnes
853c43d04a
[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.
2024-10-09 14:30:09 -07:00
Paul Walker
d283705829
[AArch64][SVE] Fix definition of bfloat fcvt intrinsics. (#110281)
Affected intrinsics:
  llvm.aarch64.sve.fcvt.bf16f32
  llvm.aarch64.sve.fcvtnt.bf16f32
    
The named intrinsics took a predicate based on the smallest element type
when it should be based on the largest. The intrinsics have been replace
by v2 equivalents and affected code ported to use them.
    
Patch includes changes to getSVEPredicateBitCast() that ensure the
generated code for the auto-upgraded old intrinsics is unchanged.
2024-10-03 12:36:01 +01:00
Paul Walker
be9461cda6
[LLVM][InstCombine][SVE] fcvtnt(a,all_active,b) != fcvtnt(undef,all_active,b) (#110278)
The "narrowing top" convert instructions leave the bottom half of active
elements untouched and thus the first paramater of their associated
intrinsic remains live even when there are no inactive lanes.
2024-10-01 11:13:04 +01:00
Philip Reames
d288574363
[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824)
This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704,
and extends the costing API for compares and selects to provide
information about the operands passed in an analogous manner. This
allows us to model the cost of materializing the vector constant, as
some select-of-constants are significantly more expensive than others
when you account for the cost of materializing the constants involved.

This is a stepping stone towards fixing
https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch
will be required to utilize the new API.
2024-09-25 07:25:57 -07:00
Paul Walker
622ae7ffa4
[LLVM][InstCombine][AArch64] sve.insr(splat(x), x) ==> splat(x) (#109445)
Fixes https://github.com/llvm/llvm-project/issues/100497
2024-09-24 15:11:36 +01:00
Sushant Gokhale
c5672e21ca
[AArch64][CostModel] Reduce the cost of fadd reduction with fast flag (#108791)
fadd reduction with
  1. Fast flag set
2. No of elements in input vector is power of 2 results in series of
faddp instructions. faddp instruction has latency/throughput identical
to fadd instruction and hence, we set relative cost=1 for faddp as well.

The change didn't show any regression with SPEC17-FP(C/C++),
llvm-test-suite on Neoverse-V2.
2024-09-24 14:35:01 +05:30
Matthew Devereau
1808fc13c8
[AArch64][InstCombine] Bail from combining SRAD on +/-1 divisor (#109274)
This fixes a crash when svdiv's third parameter is svdup_s64(1)
2024-09-20 13:53:02 +01:00
Samuel Tebbs
b1b436c108 [AArch64] Fix build error from extra !
This fixes a build failure caused by https://github.com/llvm/llvm-project/pull/108521
2024-09-19 14:45:30 +01:00
Sam Tebbs
b49a6b2a9d
[AArch64] Consider histcnt smaller than i32 in the cost model (#108521)
This PR updates the AArch64 cost model to consider the cheaper cost of
<i32 histograms to reflect the improvements from
https://github.com/llvm/llvm-project/pull/101017 and
https://github.com/llvm/llvm-project/pull/103037

Work by Max Beck-Jones (@DevM-uk)

---------

Co-authored-by: DevM-uk <max.beck-jones@arm.com>
2024-09-19 13:56:52 +01:00
Lukacma
d57be195e3
[AArch64] replace SVE intrinsics with no active lanes with zero (#107413)
This patch extends https://github.com/llvm/llvm-project/pull/73964 and
optimises SVE intrinsics into zero constants when predicate is zero.
2024-09-09 10:28:01 +01:00
Jon Roelofs
bded3b3ea9
[llvm][AArch64] Improve the cost model for i128 div's (#107306) 2024-09-05 07:42:23 -07:00
Lukacma
113806d187
[AArch64] optimise SVE cvt intrinsics with no active lanes (#104809)
This patch extends https://github.com/llvm/llvm-project/pull/73964 and
optimises SVE cvt intrinsics away when predicate is zero.
2024-08-29 11:45:14 +01:00
Maciej Gabka
95d2d1cba0
Move stepvector intrinsic out of experimental namespace (#98043)
This patch is moving out stepvector intrinsic from the experimental
namespace.

This intrinsic exists in LLVM for several years now, and is widely used.
2024-08-28 12:48:20 +01:00
cceerczw
67a9093a47
[instCombine][bugfix] Fix crash caused by using of cast in instCombineSVECmpNE (#102472) 2024-08-23 15:30:51 +01:00
Lukacma
29cb1e6b4f
[AArch64] optimise SVE cmp intrinsics with no active lanes (#104779)
This patch extends https://github.com/llvm/llvm-project/pull/73964 and
optimises SVE cmp intrinsics to zero vector when predicate is zero.
2024-08-22 15:51:51 +01:00
David Green
c61d565721
[AArch64] Set scalar fneg to free for fnmul (#104814)
A fneg(fmul(..)) or fmul(fneg(..)) can be folded into a fnmul under
AArch64. https://clang.godbolt.org/z/znPj34Mae

This discounts the cost of the fneg in such patterns to be free.
2024-08-21 18:10:16 +01:00
Lukacma
d7aeea626d
[AArch64] optimise SVE prefetch intrinsics with no active lanes (#103052)
This patch extends https://github.com/llvm/llvm-project/pull/73964 and
optimises away SVE prefetch intrinsics when predicate is zero.
2024-08-15 13:52:35 +01:00
Madhur Amilkanthwar
b73771cf0f
[AArch64] Increase scatter overhead on Neoverse-V2 (#101296)
This patch increases scatter overhead on Neoverse-V2 to 13. This
benefits s128 kernel from TSVC_2 test suite.
SPEC 17, RAJAPerf, and Sptter are unaffected by this patch.

This patch boosts s128 kernel's performance from TSVC test suite by about
40% as this enables vectorization. Also, handle minor code refactoring
for gather related part.
2024-08-14 10:12:40 +05:30
David Green
0b745a1084
[AArch64] Add invalid 1 x vscale costs for reductions and reduction-operations. (#102105)
The code-generator is currently not able to handle scalable vectors of
<vscale x 1 x eltty>. The usual "fix" for this until it is supported is
to mark the costs of loads/stores with an invalid cost, preventing the
vectorizer from vectorizing at those factors. But on rare occasions
loops do not contain load/stores, only reductions.

So whilst this is still unsupported return an invalid cost to avoid
selecting vscale x 1 VFs. The cost of a reduction is not currently used
by the vectorizer so this adds the cost to the add/mul/and/or/xor or
min/max that should feed the reduction. It includes reduction costs
too, for completeness. This change will be removed when code-generation
for these types is sufficiently reliable.

Fixes #99760
2024-08-09 14:25:07 +01:00
Paul Walker
7775a4882d
[LLVM][TTI][SME] Allow optional auto-vectorisation for streaming functions. (#101679)
The command line option enable-scalable-autovec-in-streaming-mode is
used to enable scalable vectors but the same check is missing from
enableScalableVectorization, which is blocking auto-vectorisation.
2024-08-05 11:25:44 +01:00
Sander de Smalen
fb470db7b3
[AArch64] Avoid inlining if ZT0 needs preserving. (#101343)
Inlining may result in different behaviour when the callee clobbers ZT0,
because normally the call-site will have code to preserve ZT0. When
inlining the function this code to preserve ZT0 will no longer be
emitted, and so the resulting behaviour of the program is changed.
2024-08-02 10:29:08 +01:00
David Spickett
d1f3a92ea9 Revert "[AArch64] Remove special-case inserted shuffle cost."
This reverts commit 19b785b7334d01354e8430634bab3c3341c671ca.

My bisect must have been wrong because they're still failing,
and there are follow ups to this that would need unpicking anyway.
2024-07-29 11:24:39 +00:00
David Spickett
19b785b733 Revert "[AArch64] Remove special-case inserted shuffle cost."
This reverts commit f67fa3be4db68afc08c7f3d9523f1533fa5687b7.

Caused test suite failures on AArch64:
https://lab.llvm.org/buildbot/#/builders/17/builds/1349
2024-07-29 11:03:25 +00:00
David Green
6907ab4939 [AArch64] Extend costs for fptoi.sat intrinsics.
Most of these bring the costs in line with the code generation. The f16 costs
without FullFP16 are usually converted to f32. Extended v2f32->v2f64 vectors
similarly use fcvtl + fcvt. As a backup we use the costs similar to the target
independent code, which should give a relatively high cost.
2024-07-28 10:47:40 +01:00
David Green
f67fa3be4d [AArch64] Remove special-case inserted shuffle cost.
This special case tried to measure if the shuffle vector will be multiple
inserts into an existing vector, with one of the lanes already in-place. If so
it reduces the cost by 1 to to represent it will can insert n-1 vector lanes.
This isn't always true though as the original vector may need to be moved to a
new value to start inserting new values into it, if other values from the
original are still needed.

This didn't effect performance much when I tried it, but should hopefully start
to address a regression we see from differences in SLP vectorization lane
orders.
2024-07-25 17:46:48 +01:00
Sander de Smalen
d94ed83a9e [AArch64] Fix assertion failure in getCastInstrCost
We should not call `getVectorElementType` on the result MVT from
`getTypeLegalizationCost` when we don't if the legal type is a
vector. This is the case when the type needs to be legalized
using scalarization.
2024-07-16 10:43:07 +00:00
Graham Hunter
2c0add93b2
[TTI] Return a more sensible cost for histogram intrinsic. (#97397)
This is just an initial cost, making it invalid for any target which
doesn't specifically return a cost for now. Also adds an AArch64
specific cost check.

We will need to improve that later, e.g. by returning a scalarization
cost for generic targets and possibly introducing a new TTI method, at
least once LoopVectorize has changed it's cost model. The reason is
that the histogram intrinsic also effectively contains a gather and
scatter, and we will need details of the addressing to determine an
appropriate cost for that.
2024-07-04 10:59:21 +01:00
Lukacma
9ceb45cc19
[AArch64][SVE] optimisation for unary SVE store intrinsics with no active lanes (#95793)
This patch extends https://github.com/llvm/llvm-project/pull/73964 and
adds optimisation of store SVE intrinsics when predicate is zero.
2024-07-02 11:37:52 +02:00
Nikita Popov
2d209d964a
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...

`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
2024-06-27 16:38:15 +02:00
David Sherwood
2dd4167a09
[LoopVectorize][AArch64] Add limited support for scalable vectorisation of i1 types (#95920)
Previously isElementTypeLegalForScalableVector returned false for i1
types, which also prevented vectorisation of loops with i1 reductions.
This is overkill - we only need to disable vectorisation for loads
and/or stores of i1 types. I've added i1 as a legal type, but changed
the cost model to return an invalid cost for loads and stores.
2024-06-25 15:04:24 +01:00
Sander de Smalen
c436649313
[AArch64] Remove all instances of the 'hasSVEorSME' interfaces. (#96543)
I've not added any new tests for these, because the original conditions
were wrong (they did not consider streaming mode) and we have tests for
the positive cases.
2024-06-25 13:27:06 +01:00
Lukacma
0bd9c49a29
[AArch64][SVE] optimisation for SVE load intrinsics with no active lanes (#95269)
This patch extends #73964 and adds optimisation of load SVE intrinsics
when predicate is zero.
2024-06-25 10:58:16 +02:00
Sander de Smalen
738533c84a
[AArch64] Consider streaming mode in TTI interfaces for vectorization. (#96305)
At the moment, vectorization is only enabled in streaming(-compatible)
mode when enabled through an option. But the interfaces should check
more than just 'hasSVE()', because a function with +sme in streaming
mode should also vectorize with the option enabled.

Additionally, a streaming-compatible function should only be able to use
fixed-length autovec if SVE is available, otherwise the vector code will
be scalarised by the backend.
2024-06-24 11:06:16 +01:00
Graham Hunter
e16f2f5d24
[AArch64] Override isLSRCostLess, take number of instructions into account (#84189)
Adds an AArch64-specific version of isLSRCostLess, changing the relative
importance of the various terms from the formulae being evaluated.

This has been split out from my vscale-aware LSR work, see the RFC for
reference:
https://discourse.llvm.org/t/rfc-vscale-aware-loopstrengthreduce/77131
2024-06-06 14:45:36 +01:00
Graham Hunter
2e8d815596
[TTI] Support scalable offsets in getScalingFactorCost (#88113)
Part of the work to support vscale-relative immediates in LSR.
2024-05-10 11:22:11 +01:00
David Green
363ec6f691 [AArch64][GlobalISel] Common some shuffle mask functions.
This removes the GISel versions of isREVMask, isTRNMask, isUZPMask and
isZipMask. They are combined with the existing versions from SDAG into
AArch64PerfectShuffle.h.
2024-05-06 18:37:04 +01:00
David Sherwood
96b2e35a58
[CostModel][AArch64] Improve fixed-width vector costs for get.active.lane.mask (#89068)
When SVE is available we can lower calls to get.active.lane.mask using
the SVE whilelo instruction, however in practice since vXi1 types are
not legal for NEON we often end up expanding the predicate into a vector
of integers, e.g. v4i1 -> v4i32. This usually happens when we have to
keep the predicate live out of the block, for example when the predicate
is the incoming value to a PHI node in a tail-folded vector loop.
Currently in such cases the intrinsic call has a cost of 1, which is far
too low when considering the extra instructions required to expand the
predicate. This patch fixes that by basing the cost on the number of
lane moves required for expansion. This is required for a follow-on
patch that adds the cost of the intrinsic call to the vectorisation cost
model, so that we can teach the vectoriser to make better choices.
2024-04-24 14:31:06 +01:00
David Green
18bb175428 [AArch64] Add costs for LD3/LD4 shuffles.
Similar to #87934, this adds costs to the shuffles in a canonical LD3/LD4
pattern, which are represented in LLVM as deinterleaving-shuffle(load). This
likely has less effect at the moment than the ST3/ST4 costs as instcombine will
perform certain transforms without considering the cost.
2024-04-21 13:53:22 +01:00
David Green
6c2cc8240e [AArch64] Improve cost of non-zero lane splats
This adds a cost for non-zero lane splats, which is not included by default in
SK_Broadcast but can be handled by aarch64 dup lane instruction.
2024-04-14 12:09:14 +01:00
David Green
a53674359d
[AArch64] Add ZIP and UZP shuffle costs. (#88150)
This adds some costs for the shuffle instructions that should be lowered
to zip1/zip2/uzp1/uzp2 instructions.
2024-04-11 08:45:28 +01:00
David Green
f0e79d9152 [AArch64] Add a cost for identity shuffles.
These are mostly handled at a higher level when costing shuffles, but some
masks can end up being identity or concat masks which we can treat as free.
2024-04-09 17:16:14 +01:00
David Green
4ac2721e51
[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934)
This tries to add some costs for the shuffle in a ST3/ST4 instruction,
which are represented in LLVM IR as store(interleaving shuffle). In
order to detect the store, it needs to add a CxtI context instruction to
check the users of the shuffle. LD3 and LD4 are added, LD2 should be a
zip1 shuffle, which will be added in another patch.

It should help fix some of the regressions from #87510.
2024-04-09 16:36:08 +01:00
Il-Capitano
308ed0233a
[Intrinsics] Make patchpoint.i64 generic on its return type (#85911)
Currently patchpoints can only have two result types, `void` and `i64`.
This limits the result to general purpose registers.
This patch makes `patchpoint.i64` an overloadable intrinsic, allowing
result values that can fit in a single register (e.g. integers,
pointers, floats).
2024-03-26 19:08:52 +05:30
Jeremy Morse
b9d83eff25
[NFC][RemoveDIs] Use iterators for insertion at various call-sites (#84736)
These are the last remaining "trivial" changes to passes that use
Instruction pointers for insertion. All of this should be NFC, it's just
changing the spelling of how we identify a position.

In one or two locations, I'm also switching uses of getNextNode etc to
using std::next with iterators. This too should be NFC.

---------

Merged by: Stephen Tozer <stephen.tozer@sony.com>
2024-03-19 16:36:29 +00:00