393 Commits

Author SHA1 Message Date
zhongyunde
f41223eeca [AArch64][SVE2] Delete an unused parameter for isExtPartOfAvgExpr, NFC
Depend on D157628, which set the cost of extends 0 because they will fold into
the s/urhadd.

Reviewed By: kmclaughlin
Differential Revision: https://reviews.llvm.org/D159273
2023-09-01 23:40:52 +08:00
Sander de Smalen
7e815dd76d [AArch64][SME] Create new interface for isSVEAvailable.
When a function is compiled to be in Streaming(-compatible) mode, the full
set of SVE instructions may not be available. This patch adds an interface
to query that and changes the codegen for FADDA (not legal in Streaming-SVE
mode) to instead be expanded for fixed-length vectors, or otherwise not to
code-generate for scalable vectors.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D156109
2023-09-01 12:00:36 +00:00
Sander de Smalen
0a32a999ae [AArch64][SME] NFC: Rename hasNewZAInterface to hasNewZABody.
__arm_new_za is a declaration attribution, not a type attribute,
and is therefore not part of the interface of a function.
2023-08-30 13:14:42 +00:00
Kerry McLaughlin
9a98ab589a [AArch64][SVE2] Change the cost of extends with S/URHADD to 0
When SVE2 is enabled, we can combine an add of 1, add & shift right by 1
to a single s/urhadd instruction. If the operands to the adds are extended,
these extends will fold into the s/urhadd and their costs should be 0.

Reviewed By: david-arm, dtemirbulatov

Differential Revision: https://reviews.llvm.org/D157628
2023-08-29 12:24:47 +00:00
Alexey Bataev
9a207578ac [TTI]Add InsertSubvector pattern in improveShuffleKindFromMask().
It improves shuffle instructions estimation and improves vectorization
outcome.

Differential Revision: https://reviews.llvm.org/D157425
2023-08-18 13:47:01 -07:00
Kerry McLaughlin
5d814b3848 Revert "[AArch64][SVE2] Change the cost of extends with S/URHADD to 0"
This reverts commit dda2cd2505301aa626fcd3e8dea2a447227d00ca.
2023-08-14 10:44:13 +00:00
Kerry McLaughlin
dda2cd2505 [AArch64][SVE2] Change the cost of extends with S/URHADD to 0
When SVE2 is enabled, we can combine an add of 1, add & shift right by 1
to a single s/urhadd instruction. If the operands to the adds are extended,
these extends will fold into the s/urhadd and their costs should be 0.

Reviewed By: dtemirbulatov

Differential Revision: https://reviews.llvm.org/D157628
2023-08-14 10:32:06 +00:00
Mel Chen
425e9e81a0 [LV] Rename the Select[I|F]Cmp reduction pattern to [I|F]AnyOf. (NFC)
Regarding this NFC change, please refer to the discussion in this thread. https://reviews.llvm.org/D150851#4467261

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D155786
2023-08-03 00:37:19 -07:00
David Green
2a859b2014 [AArch64] Change the cost of vector insert/extract to 2
The cost of vector instructions has always been high under AArch64, in order to
add a high cost for inserts/extracts, shuffles and scalarization. This is a
conservative approach to limit the scope of unusual SLP vectorization where the
codegen ends up being quite poor, but has always been higher than the correct
costs would be for any specific core.

This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a
generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is
also overridden for integer vector at the same time, to remove the effect of
lane 0 being considered free for integer vectors (something that should only be
true for float when scalarizing).

The lower insert/extract cost will reduce the cost of insert, extracts,
shuffling and scalarization. The adjustments of ScalaizationOverhead will
increase the cost on integer, especially for small vectors. The end result will
be lower cost for float and long-integer types, some higher cost for some
smaller vectors. This, along with the raw insert/extract cost being lower, will
generally mean more vectorization from the Loop and SLP vectorizer.

We may end up regretting this, as that vectorization is not always profitable.
In all the benchmarking I have done this is generally an improvement in the
overall performance, and I've attempted to address the places where it wasn't
with other costmodel adjustments.

Differential Revision: https://reviews.llvm.org/D155459
2023-07-28 21:26:50 +01:00
David Green
8da62b865f [AArch64] Basic vector bswap costs
This adds some basic vector bswap costs, providing the type is supported.

Differential Revision: https://reviews.llvm.org/D155806
2023-07-21 08:48:53 +01:00
Sander de Smalen
08fd44b300 [AArch64] Force streaming-compatible codegen when attributes are set.
Before this patch, the only way to generate streaming-compatible code
was to use the `-force-streaming-compatible-sve` flag, but the compiler
should also avoid the use of instructions invalid in streaming mode
when a function has the aarch64_pstate_sm_enabled/compatible attribute.

Reviewed By: paulwalker-arm, david-arm

Differential Revision: https://reviews.llvm.org/D155428
2023-07-18 10:26:00 +00:00
Sander de Smalen
ec6af93d02 [AArch64] NFC: Replace 'forceStreamingCompatibleSVE' with 'isNeonAvailable'.
The AArch64Subtarget interface 'isNeonAvailable' is more appropriate going
forward, as we may also want to generate 'streaming SVE' code (not just
'streaming-compatible SVE' code), but here we must still make sure not to
use NEON instructions which are invalid in streaming SVE mode.
2023-07-17 08:24:10 +00:00
David Green
1712ae6709 [AArch64] Improve cost of umull from known bits
As in D140287, we can now generate umull from mul(zext(x), y) in cases where we
know that the top bits of y are zero. This teaches that to the cost model,
adjusting how isWideningInstruction detects mul operations that can extend both
operands. This helps for constants and other cases where the operands of the
mul are known to be extended, but not directly extends.

Differential Revision: https://reviews.llvm.org/D154936
2023-07-12 13:13:06 +01:00
Tuan Chuong Goh
e36dd3ea8a [AArch64] Fix cost modelling for SVE Min/Max Intrinsics
Add more legal types for SMIN, SMAX, UMIN, UMAX in cost modelling for AArch64

Differential Revision: https://reviews.llvm.org/D154622
2023-07-12 07:46:12 +01:00
Youngsuk Kim
f69b9b7cce [llvm] Remove uses of Type::getPointerTo() (NFC)
Partial progress towards removing in-tree uses of `getPointerTo()`,
by employing the following options:

* Drop the call entirely if the sole purpose of it is to support
  a no-op bitcast (remove the no-op bitcast as well).

* Replace with `PointerType::get()`/`PointerType::getUnqual()`.

Also, remove no-op function `EmitBitCastOfLValueToProperType()`.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D154392
2023-07-08 13:05:58 -04:00
David Green
12025cef3e [CostModel] Use min/max intrinsics for vecreduce.min/max costs
This changes the costmodelling of the vecreduce.min/max nodes to use the costs
of the relevant min/max intrinsics instead of expanding them to compare and
selects. The getMinMaxReductionCost have changed to take a Opcode for the
relevant intrinsic, dropping the IsUnsigned and CondTy parameters as they are
no longer needed.

A follow up patch will add some basic fminimum/fmaximum costmodelling.

Differential Revision: https://reviews.llvm.org/D153547
2023-07-04 15:02:30 +01:00
David Green
5106b221c8 [AArch64] Treat the icmp in icmp(and(..), 0) as free
As in https://godbolt.org/z/4dafd9Geq, the icmp from an And may use an Ands to
set flags, meaning the icmp is free.

This could also be done for add/sub, but those patterns often happen in the
induction variable of a loop, making them quite performance sensitive.

Differential Revision: https://reviews.llvm.org/D153611
2023-07-01 21:59:54 +01:00
Igor Kirillov
17bde328d6 [LV] Add mask support for vectorizing interleaved groups
This patch extends LoopVectorize to handle the vectorization of interleaved
memory accesses with scalable vectors when mask is required or/and predicated
tail folding is enabled.

Differential Revision: https://reviews.llvm.org/D152258
2023-06-29 17:50:56 +00:00
Jolanta Jensen
5cd16e2cb7 [NFC SVE ACLE] Remove IR combines that no longer apply.
Remove IR combines that no longer apply after the SVE merging
intrinsics taking an all active predicate, have been canonicalised
to their equivalent undef (_u) variants.

Differential Revision: https://reviews.llvm.org/D153415
2023-06-22 10:26:20 +00:00
Jolanta Jensen
ecb07f481b [SVE ACLE] Implement IR combines to convert intrinsics used for _m C/C++ builtins
This patch implements IR combines to convert intrinsics used for _m C/C++ builtins
which take an all active predicate to their equivalent _u intrinsic.

Differential Revision: https://reviews.llvm.org/D152005
2023-06-21 10:35:13 +00:00
Zhongyunde
cb353dc74e [LV] Add cost model for simd vector select instructions of type float
For simd vector selects, use cmeq + bsl for v2f32/v4f32/v2f64, so their cost are cheep.
Fix https://github.com/llvm/llvm-project/issues/63082

Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D152523
2023-06-20 13:12:19 +08:00
Paul Walker
b7287a82d3 [SVE][AArch64TTI] Fix invalid mla combine that miscomputes the value of inactive lanes.
Consider: add(pg, a, mul_u(pg, b, c))

Although the multiply's inactive lanes are undefined, they don't
contribute to the final result.  The overall result of the inactive
lanes come from "a" and thus the above is another form of mla
rather than mla_u.
2023-06-18 13:07:03 +01:00
Paul Walker
c7c71aa123 [NFC][AArch64TTI] Breakout add/sub combines into discrete functions. 2023-06-18 13:07:03 +01:00
Nikita Popov
f9f8517e03 [InstCombine][AArch64] Fix phi insertion point
Fix the issue reported at https://reviews.llvm.org/rG724f4a5bac25#inline-9083,
by specifying the correct insertion point for the new phi.
2023-06-16 14:58:33 +02:00
Florian Hahn
baebe719a5
[AArch64] Address post-commit comments from D150482.
Address @v01dXYZ's comments, thanks!
2023-06-15 16:38:09 +01:00
Graham Hunter
95bfb1902d [LV][AArch64] Allow (limited) interleaving for scalable vectors
This patch uses the (de)interleaving intrinsics introduced in
D141924 to handle vectorization of interleaving groups with a
factor of 2 for scalable vectors.

Reviewed By: fhahn, reames

Differential Revision: https://reviews.llvm.org/D145163
2023-06-09 11:42:10 +01:00
zhongyunde
df19d87227 [LV] Add option to tune the cost model, NFC
For Neon, the default nonconst stride cost is conservative,
and it is a local variable, which is not convenience to
to tune the loop vectorize.
So I try to use a option, which is similar to SVEGatherOverhead brought in D115143.
Fix https://github.com/llvm/llvm-project/issues/63082.

Reviewed By: dmgreen, fhahn
Differential Revision: https://reviews.llvm.org/D152253
2023-06-07 22:08:29 +08:00
Jolanta Jensen
a963dbb5ac [SVE ACLE] Extend existing aarch64_sve_mul combines to also act on aarch64_sve_mul_u.
Differential Revision: https://reviews.llvm.org/D152004
2023-06-06 15:26:33 +00:00
Jolanta Jensen
dc63b35b02 [SVE ACLE] Extend IR combines for fmul, fsub, fadd to cover _u variants
This patch extends existing IR combines for: fmul, fsub and fadd,
relying on all active predicate to also apply to their equivalent
undef (_u) intrinsics.

Differential Revision: https://reviews.llvm.org/D150768
2023-06-02 11:06:57 +00:00
Florian Hahn
e97b8a7e3f
[AArch64] Don't use tbl lowering if ZExt can be folded into user.
If the ZExt can be lowered to a single ZExt to the next power-of-2 and
the remaining ZExt folded into the user, don't use tbl lowering.

Fixes #62620.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D150482
2023-06-02 11:53:04 +01:00
David Green
eb764a7f38 [AArch64] Increase the cost of i1 inserts / extracts
i1 inserts will need an extra cset, and i1 extracts need a cmp (or tst) in
order to be used. This increase the cost of them a little to account for those
extra instructions.
https://godbolt.org/z/3c5z4G7Mh

Differential Revision: https://reviews.llvm.org/D151189
2023-06-01 10:54:53 +01:00
David Green
e79fac2968 [AArch64] Adjust costs of i1 and/or/xor reductions
This expands the reduction cost of i1 and/or/xor, so that larger type sizes get
handled by the existing code. For i1 reductions - and will use maxv, or will use
minv and xor will use addv, plus the cost of legalizing the type for larger
vectors using and/or/xor. The i1 vectors will be legalized to higher width
integers (say v16i8), which this overrides the cost of. As with all i1 vectors
there is a chance that the types the i1 vector is created with and how it is
used will not match, introducing extra extends that are not necessarily
costmodelled.
https://godbolt.org/z/6Gc9K6b7T

Differential Revision: https://reviews.llvm.org/D151184
2023-06-01 09:28:48 +01:00
Craig Topper
6006d43e2d LLVM_FALLTHROUGH => [[fallthrough]]. NFC
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D150996
2023-05-24 12:40:10 -07:00
Dinar Temirbulatov
7489301c03 [AArch64][LV] Disable maximising bandwidth for streaming compatible sve
Fixing last commit by adding actual change to AArch64TargetTransformInfo.cpp

Differential Revision: https://reviews.llvm.org/D150336
2023-05-23 13:24:01 +00:00
Sander de Smalen
11926e6149 [SME2/SVE2p1] Extend llvm.aarch64.sve.convert.to/from.svbool to accept target("aarch64.svcount")
The convert intrinsics can be used to implement existing operations on svcount_t
when the actual bits/content of the predicate register doesn't matter (such
as PSEL, which copies the full contents of the first source register to the
destination register).

Reviewed By: CarolineConcatto, david-arm

Differential Revision: https://reviews.llvm.org/D150959
2023-05-22 13:52:18 +00:00
David Sherwood
c7dbe326df [AArch64][LoopVectorize] Enable tail-folding of simple loops on neoverse-v1
This patch enables the tail-folding of simple loops by default
when targeting the neoverse-v1 CPU. Simple loops exclude those
with recurrences or reductions or loops that are reversed.

New tests have been added here:

Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

In terms of SPEC2017 only one benchmark is really affected when
building with "-Ofast -mcpu=neoverse-v1 -flto", which is
(+ faster, - slower):

525.x264: +7.0%

Differential Revision: https://reviews.llvm.org/D130618
2023-05-18 10:35:57 +00:00
David Sherwood
7beb2ca8fa [AArch64][NFC] Refactor the tail-folding option
This patch does simple refactoring of the tail-folding
option in preparation for enabling tail-folding by
default for neoverse-v1. It adds a default tail-folding
option field to the AArch64Subtarget class that can
be set on a per-CPU.

Differential Revision: https://reviews.llvm.org/D149659
2023-05-17 08:39:40 +00:00
Jolanta Jensen
105d63a250 [SVE ACLE] Change the lowering of SVE integer builtins
Change the lowering of SVE integer mla_x/mls_x and mad_x/msb_x
builtins to use dedicated undef (_u) intrinsics.

Differential Revision: https://reviews.llvm.org/D150553
2023-05-16 17:44:24 +00:00
Nikita Popov
724f4a5bac [AArch64] Use correct IRBuilder in InstCombine hooks
These need to use the IRBuilder provided by InstCombine for proper
worklist management.
2023-05-16 18:13:59 +02:00
Dinar Temirbulatov
73668cceae [AArch64][CostModel] Add costs for fixed operations when using fixed vectors over SVE.
Currently any cast operation with fixed length vectors uses NEON costs,
If those operations are end up using SVE instruction then we estimate
those operations based upon SVE costs.

Differential Revision: https://reviews.llvm.org/D133955
2023-05-15 16:18:45 +00:00
David Green
1ba9ec0d10 [AArch64] Update FP16 vector cmp costs
Without FP16, a fp16 v4f16 comparison will be converted to a v4f32 and back.
v8f16 get scalarized currently. Update the costs of v4f16 to match.
2023-05-14 23:28:11 +01:00
Craig Topper
9ad9380fbc [LegalizeVectorOps][AArch64][RISCV][X86] Use OpVT for ISD::SETCC in LegalizeVectorOps.
Previously, LegalizeVectorOps used the result VT while LegalizeDAG
used the operand VT. This patch makes them both use the operand VT.

This also makes it consistent with how the default cost model works.

I've hacked the AArch64 cost model to maintain old behavior for some
f16 vectors.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D149572
2023-05-13 23:33:00 -07:00
Philip Reames
e41dce4d49 [LAA/LV] Simplify stride speculation logic [NFC] (try 2)
The original commit wasn't quite NFC, and this was caught by an arguably overly strong assert.  Specifically, I'd failed to strip off the integer cast off the SCEV before saving it in the map.  The result - other than a failed assert - is that we'd speculate on the casted unknown, not the unknown.  The only case I can think of where that might change behavior would be a sext(i1 load).  I doubt that case is interesting in practice, but it's good to be strictly NFC on this change regardless.

Original commit message follows..

The existing code makes it hard to tell that collectStridedAccess is really about identifying some loop invariant SCEV which is *profitable* to speculate is equal to one. The odd dual usage structure of Value and SCEV confuses this point.

We could choose to loosen the profitability analysis if desired. I'm not proposing doing so at this time as it exposes too many cases where the speculation is unprofitable.

Differential Revision: https://reviews.llvm.org/D147750
2023-05-11 10:19:23 -07:00
Philip Reames
dc0d00c5fc Revert "[LAA/LV] Simplify stride speculation logic [NFC]"
This reverts commit d5b840131223f2ffef4e48ca769ad1eb7bb1869a.  Running this through broader testing after rebasing is revealing a crash.  Reverting while I investigate.
2023-05-11 09:26:35 -07:00
Philip Reames
d5b8401312 [LAA/LV] Simplify stride speculation logic [NFC]
The existing code makes it hard to tell that collectStridedAccess is really about identifying some loop invariant SCEV which is *profitable* to speculate is equal to one. The odd dual usage structure of Value and SCEV confuses this point.

We could choose to loosen the profitability analysis if desired. I'm not proposing doing so at this time as it exposes too many cases where the speculation is unprofitable.

Differential Revision: https://reviews.llvm.org/D147750
2023-05-11 08:32:56 -07:00
ManuelJBrito
d22edb9794 [IR][NFC] Change UndefMaskElem to PoisonMaskElem
Following the change in shufflevector semantics,
poison will be used to represent undefined elements in shufflevector masks.

Differential Revision: https://reviews.llvm.org/D149256
2023-04-27 18:01:54 +01:00
Zain Jaffal
3d3d8fef07
[AArch64] Improve fsh(l|r) cost modeling if 3rd arg is constant.
In that case, the cost for i32 and i64 should be 1 (a single EXTR
instruction). For v4i32 and v2i64 it should be 3 (USHR + SHL + ORR).

Other sizes smaller than 64 bits require an extra instruction for
conversion to i32/i64.

This recovers a SLP regression revealed by D140392.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D147322
2023-04-20 18:20:01 +01:00
David Sherwood
afc2b7db02 [AArch64][CostModel] Make sext/zext free if folded into a masked load
The BasicTTIImpl implementation of getCastInstrCost ensures
that the cost of zext/sext is 0 when following a load if we
know the combined extending load is legal. For SVE we can do
the same for masked loads too, since they use exactly the
same underlying instruction.

Differential Revision: https://reviews.llvm.org/D148123
2023-04-20 08:48:57 +00:00
Alexey Bataev
8cf0290c4a [SLP]Fix cost estimation for buildvectors with extracts and/or constants.
If the partial matching is found and some other scalars must be
inserted, need to account the cost of the extractelements, transformed
to shuffles, and/or reused entries and calculate the cost of inserting
constants properly into the non-poison vectors.
Also, fixed the cost calculation for final gather/buildvector sequence.

Differential Revision: https://reviews.llvm.org/D148362
2023-04-19 05:54:58 -07:00
Hassnaa Hamdi
045eec61f3 [AArch64][CostModel]: Add costs for zero/sign extend.
Add cost for extending to illegal scalable vector types.
Add testing file for the extend operations.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D142456
2023-04-19 10:26:43 +00:00