2709 Commits

Author SHA1 Message Date
Rohit Aggarwal
dfb60bb919
Adding more vector calls for -fveclib=AMDLIBM (#109662)
AMD has it's own implementation of vector calls.
New vector calls are introduced in the library for exp10, log10, sincos and finite asin/acos
Please refer [https://github.com/amd/aocl-libm-ose]

---------

Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>
2024-10-29 10:09:55 +00:00
Florian Hahn
0d0abb351b
[VPlan] Use ResumePhi to create reduction resume phis. (#110004)
Use VPInstruction::ResumePhi to create phi nodes for reduction resume
values in the scalar preheader, similar to how ResumePhis are used for
first-order recurrence resume values after 9a5a8731e77.

This allows simplifying createAndCollectMergePhiForReduction to only
collect reduction resume phis when vectorizing epilogue loops and adding
extra incoming edges from the main vector loop. Updating phis for the
epilogue vector loops requires special attention, because additional
incoming values from the bypass blocks need to be added.

PR: https://github.com/llvm/llvm-project/pull/110004
2024-10-28 20:14:08 +01:00
Shih-Po Hung
266ff98cba
[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974)
Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the
runtime VF, similar to #95305.

Since only reverse vector pointers require the runtime VF, the patch
sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for
reverse vector pointers. As a result, the generation of reverse vector
pointers is moved into a separate recipe.
2024-10-26 23:18:50 +08:00
Florian Hahn
e724226da7
[VPlan] Return cost of 0 for VPWidenCastRecipe without underlying value.
In some cases, VPWidenCastRecipes are created but not considered in the
legacy cost model, including truncates/extends when evaluating a reduction
in a smaller type. Return 0 for such casts for now, to avoid divergences
between VPlan and legacy cost models.

Fixes https://github.com/llvm/llvm-project/issues/113526.
2024-10-25 21:25:44 +02:00
Tex Riddell
c03d09ce3e
[aarch64] atan2 intrinsic lowering (p5) (#112611)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

- `VecFuncs.def`: define intrinsic to sleef/armpl mapping
- `LegalizerHelper.cpp`: add missing fewerElementsVector handling for
the new atan2 intrinsic
- `AArch64ISelLowering.cpp`: Add arch64 specializations for lowering
like neon instructions
- `AArch64LegalizerInfo.cpp`: Legalize atan2.

Part 5 for Implement the atan2 HLSL Function #70096.
2024-10-24 17:53:12 -07:00
Florian Hahn
2dfb1c664c
[VPlan] Try to hoist Previous (and operands), if sinking fails for FORs. (#108945)
In some cases, Previous (and its operands) can be hoisted. This allows
supporting additional cases where sinking of all users of to FOR fails,
e.g. due having to sink recipes with side-effects.

This fixes a crash where we fail to create a scalar VPlan for a
first-order recurrence, but can create a vector VPlan, because the trunc
instruction of an IV which generates the previous value of the
recurrence has been optimized to a truncated induction recipe, thus
hoisting it to the beginning.

Fixes https://github.com/llvm/llvm-project/issues/106523.

PR: https://github.com/llvm/llvm-project/pull/108945
2024-10-23 13:12:03 -07:00
Florian Hahn
ddbb382a7c
[LV] Regenerate check-lines for some tests. 2024-10-23 04:34:13 +01:00
Paul Walker
5bb34803a4 [NFC] Migrate tests to use autoupdate for CHECK lines. 2024-10-22 12:55:15 +00:00
Ramkumar Ramachandra
f719cfa868
LAA: be less conservative in isNoWrap (#112553)
isNoWrap has exactly one caller which handles Assume = true separately,
but too conservatively. Instead, pass Assume to isNoWrap, so it is
threaded into getPtrStride, which has the correct handling for the
Assume flag. Also note that the Stride == 1 check in isNoWrap is
incorrect: getPtrStride returns Strides == 1 or -1, except when
isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we
can include the case of -1 Stride, and when isNoWrapAddRec is true. With
this change, passing Assume = true to getPtrStride could return a
non-unit stride, and we correctly handle that case as well.
2024-10-22 09:55:51 +01:00
Florian Hahn
2a6b09e0d3
[LV] Use type from InsertPos for cost computation of interleave groups.
Previously the legacy cost model would pick the type for the cost
computation depending on the order of the members in the input IR.
This is incompatible with the VPlan-based cost model (independent of
original IR order) and also doesn't match code-gen, which uses the type
of the insert position.

Update the legacy cost model to use the type (and address space) from
the Group's insert position.

This brings the legacy cost model in line with the legacy cost model and
fixes a divergence between both models.

Note that the X86 cost model seems to assign different costs to groups
with i64 and double types. Added a TODO to check.

Fixes https://github.com/llvm/llvm-project/issues/112922.
2024-10-18 19:12:40 -07:00
Florian Hahn
c7496cebac
[LV] Use SCEV to check if minimum iteration check is known. (#111310)
Use SCEV to check if the minimum iteration check (TC < Step) is known to
be false.

This is a first step towards addressing
https://github.com/llvm/llvm-project/issues/111098. To catch the exact
case from the issue, we need to do extra work to make sure the wrap
flags on the shl are preserved and used by SCEV.

Note that skeleton creation will be gradually moved to VPlan and this
simplification should be done as VPlan transform eventually. The current
plan is to move skeleton creation to VPlan starting from parts closest
to the parts already created by VPlan, starting with induction resume
value creation (started with
https://github.com/llvm/llvm-project/pull/110577), then memory and SCEV
checks and finally minimum iteration checks.

PR: https://github.com/llvm/llvm-project/pull/111310
2024-10-18 15:22:59 -07:00
Alexey Bataev
f148d5791b
[LV]Initial support for safe distance in predicated DataWithEVL vectorization mode.
Enabled initial support for max safe distance in DataWithEVL mode. If
max safe distance is required, need to emit special code:
CMP = icmp ult AVL, MAX_SAFE_DISTANCE
SAFE_AVL = select CMP, AVL, MAX_SAFE_DISTANCE
EVL = call i32 @llvm.experimental.get.vector.length(i64 SAFE_AVL)

while vectorize the loop in DataWithEVL tail folding mode.

Reviewers: fhahn

Reviewed By: fhahn

Pull Request: https://github.com/llvm/llvm-project/pull/102897
2024-10-18 15:51:49 -04:00
Graham Hunter
091a235ec5
Revert "[AArch64][SVE] Enable max vector bandwidth for SVE" (#112873)
Reverts llvm/llvm-project#109671

Reverting due to some performance regressions on neoverse-v1.
2024-10-18 11:05:55 +01:00
Florian Hahn
b497010854
[VPlan] Use VPInstruction::Name when assigning names (NFCI).
This slightly improves the printing of VPInstructions. NFC except debug
output.
2024-10-18 05:52:35 +01:00
Florian Hahn
b060661da8
[SCEVExpander] Expand UDiv avoiding UB when in seq_min/max. (#92177)
Update SCEVExpander to introduce an SafeUDivMode, which is set
when expanding operands of SCEVSequentialMinMaxExpr. In this mode,
the expander will make sure that the divisor of the expanded UDiv is
neither 0 nor poison.

Fixes https://github.com/llvm/llvm-project/issues/89958.


PR https://github.com/llvm/llvm-project/pull/92177
2024-10-17 13:55:20 -07:00
David Sherwood
76f3776185
[NFC][LoopVectorize] Restructure simple early exit tests (#112721)
The previous simple_early_exit.ll was growing too large and difficult to
manage. Instead I've decided to refactor the tests by splitting out into
notional groups:

1. single_early_exit.ll: loops with a single uncountable exit that do
not have live-outs from the loop.
2. single_early_exit_live_outs.ll: loops with a single uncountable exit
with live-outs.
3. multi_early_exit.ll: loops with multiple early exits, i.e. a mixture
of countable and uncountable exits, but with no live-outs from the loop.
4. multi_early_exit_live_outs.ll: as above, but with live-outs.
5. single_early_exit_unsafe_ptrs.ll: loops with a single uncountable
exit, but with pointers that are not unconditionally dereferenceable.
6. unsupported_early_exit.ll: loops with uncountable exits that we
cannot yet vectorise.
7. early_exit_legality.ll: tests the debug output from
LoopVectorizationLegality to make sure we handle different scenarios
correctly.

Only the last test now requires asserts. Over time some of these tests
should start vectorising as more support is added.

I also tried to rename the multi early exit tests to make it clear there
what mixture of countable and uncountable exits are present.
2024-10-17 16:50:59 +01:00
Yingwei Zheng
095d49da76
[InstCombine] Set samesign when converting signed predicates into unsigned (#112642)
Alive2: https://alive2.llvm.org/ce/z/6cqdt-
2024-10-17 20:43:48 +08:00
Graham Hunter
c980a20b10
[AArch64][SVE] Enable max vector bandwidth for SVE (#109671)
Returns true for shouldMaximizeVectorBandwidth when the register type
is a scalable vector and SVE or streaming SVE are available.
2024-10-17 13:17:24 +01:00
David Sherwood
671976ff59
[NFC][LoopVectorize] Add more simple early exit tests (#112529)
I realised we are missing tests to cover more loops with multiple early
exits - some countable and some uncountable.

I've also added a few SVE versions of the test in the AArch64 directory.
Once we can vectorise such early exit loops it's a good sanity check to
make sure they also vectorise for SVE. Also, for some of the tests I
expect there to be some divergence from the same tests in the top level
directory once we start vectorising them.
2024-10-17 09:49:51 +01:00
Florian Hahn
24423107ab
[LV] Add additional trip count expansion tests for #92177.
Extra tests for https://github.com/llvm/llvm-project/pull/92177, split
off the PR.
2024-10-16 07:40:04 +01:00
Florian Hahn
bbff5b8891
[VPlan] Use alloc-type to compute interleave group offset.
Use getAllocTypeSize to get compute the offset to the start of
interleave groups instead getScalarSizeInBits, which may return 0 for
pointers. This is in line with the analysis building the interleave
groups and fixes a mis-compile reported for
https://github.com/llvm/llvm-project/pull/106431.
2024-10-16 07:21:58 +01:00
Florian Hahn
cc5b5ca34b
[LV] Add test where interleave group start pointer is incorrect.
Test case from https://github.com/llvm/llvm-project/pull/106431.
2024-10-16 07:13:38 +01:00
Florian Hahn
3860e29e0e
[VPlan] Mark VPVectorPointerRecipe as not having sideeffects.
VectorPointer doesn't read from memory or have any sideeffects. Mark it
accordingly.
2024-10-16 06:10:19 +01:00
Florian Hahn
34cdd67c85
[VPlan] Use VPWidenIntrinsicRecipe to vp.select. (#110489)
Use VPWidenIntrinsicRecipe
(https://github.com/llvm/llvm-project/pull/110486)
to create vp.select intrinsics. This potentially offers an alternative
to duplicating EVL recipes for all existing recipes.

There are some recipes that will need duplicates (at least at the
moment), due to extra code-gen needs (e.g. widening loads and stores).
But in cases the intrinsic can directly be used, creating the widened
intrinsic directly would reduce the need to duplicate some recipes.


PR: https://github.com/llvm/llvm-project/pull/110489
2024-10-15 21:48:15 +01:00
Florian Hahn
7f06d8afb0
[SCEV] Retain SCEVSequentialMinMaxExpr if an operand may trigger UB. (#110824)
Retain SCEVSequentialMinMaxExpr if an operand may trigger UB, e.g. if
there is an UDiv operand that may divide by 0 or poison

PR: https://github.com/llvm/llvm-project/pull/110824
2024-10-14 13:08:49 +01:00
Florian Hahn
65da32c634
[LV] Account for any-of reduction when computing costs of blend phis.
Any-of reductions are narrowed to i1. Update the legacy cost model to
use the correct type when computing the cost of a phi that gets lowered
to selects (BLEND).

This fixes a divergence between legacy and VPlan-based cost models after
36fc291b6ec6d.

Fixes https://github.com/llvm/llvm-project/issues/111874.
2024-10-11 11:27:22 +01:00
David Sherwood
72f339de45
[LoopVectorize] Use predicated version of getSmallConstantMaxTripCount (#109928)
There are a number of places where we call getSmallConstantMaxTripCount
without passing a vector of predicates:

getSmallBestKnownTC
isIndvarOverflowCheckKnownFalse
computeMaxVF
isMoreProfitable

I've changed all of these to now pass in a predicate vector so that
we get the benefit of making better vectorisation choices when we
know the max trip count for loops that require SCEV predicate checks.

I've tried to add tests that cover all the cases affected by these
changes.
2024-10-11 10:10:15 +01:00
Florian Hahn
bb937e276d
[LV] Compute value of escaped induction based on the computed end value. (#110576)
Update fixupIVUsers to compute the value for escaped inductions using
the already computed end value of the induction (EndValue), but
subtracting the step.

This results in slightly simpler codegen, as we avoid computing the full
transformed index at VectorTripCount - 1.

PR: https://github.com/llvm/llvm-project/pull/110576
2024-10-10 20:04:46 +01:00
Florian Hahn
01cbbc52dc
[VPlan] Request lane 0 for pointer arg in PtrAdd.
After 7f74651, the pointer operand may be replicated of a PtrAdd. Instead
of requesting a single scalar, request lane 0, which correctly handles the
case when there is a scalar-per-lane.

Fixes https://github.com/llvm/llvm-project/issues/111606.
2024-10-09 13:18:54 +01:00
Florian Hahn
6fbbe152fa
[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. (#110486)
This patch splits off intrinsic hanlding to a new
VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the
intrinsic ID to widen and the scalar result type (in case the intrinsic
is overloaded on the result type). It does not need access to an
underlying IR call instruction or function.

This means VPWidenIntrinsicRecipe can be created easily without access
to underlying IR.
2024-10-08 22:37:20 +01:00
Florian Hahn
36fc291b6e
[VPlan] Implement VPBlendRecipe::computeCost.
Implement VPBlendRecipe::computeCost. VPBlendRecipe is currently is also
used if only the first lane is used.

This also requires pre-computing costs for forced scalars and
instructions considered profitable to scalarize. For those, the cost
will be computed separately in the legacy cost model. This will also be
needed when implementing VPReplicateRecipe::computeCost.
2024-10-08 21:33:42 +01:00
Florian Hahn
3ec6f805c5
[VPlan] Don't created GEP x, 0 for interleave group pointers.
The GEP with offet 0 is redundant, remove it. This addresses a TODO
from 7f74651837b ((#106431).
2024-10-08 12:08:13 +01:00
Luke Lau
366e469db9 [RISCV] Add cost tests for more interleave factors. NFC
This shows how we're not properly scaling the cost with the number of
factors, i.e. a factor 8 interleave costs the same as a factor 2
interleave at VF=2.
2024-10-08 17:16:17 +08:00
Luke Lau
e98875af4c [RISCV] Add scalable interleave cost tests. NFC
This gets the cost from the recipe output rather than the individual
instruction cost.

The factor 3 test was left alone since we don't support anything else
other than factor 2 for scalable vectors currently.
2024-10-08 13:39:13 +08:00
Florian Hahn
7f74651837
[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431)
Update VPInterleaveRecipe to always use the pointer to member 0 as
pointer argument. This in many cases helps to remove unneeded index
adjustments and simplifies VPInterleaveRecipe::execute.

In some rare cases, the address of member 0 does not dominate the insert
position of the interleave group. In those cases a PtrAdd VPInstruction
is emitted to compute the address of member 0 based on the address of
the insert position. Alternatively we could hoist the recipe computing
the address of member 0.
2024-10-06 22:53:13 +01:00
Florian Hahn
89d2a9de05
[VPlan] Add additional FOR hoisting test.
Additional tests for https://github.com/llvm/llvm-project/pull/108945.
2024-10-06 14:36:33 +01:00
Florian Hahn
45b526afa2
[LV] Honor uniform-after-vectorization in setVectorizedCallDecision.
The legacy cost model always computes the cost for uniforms as cost of
VF = 1, but VPWidenCallRecipes would be created, as
setVectorizedCallDecisions would not consider uniform calls.

Fix setVectorizedCallDecision to set to Scalarize, if the call is
uniform-after-vectorization.

This fixes a bug in VPlan construction uncovered by the VPlan-based
cost model.

Fixes https://github.com/llvm/llvm-project/issues/111040.
2024-10-06 10:35:06 +01:00
Florian Hahn
68210c7c26
[VPlan] Only generate first lane for VPPredInstPHI if no others used.
IF only the first lane of the result is used, only generate the first
lane.

Fixes https://github.com/llvm/llvm-project/issues/111042.
2024-10-05 19:15:05 +01:00
Shih-Po Hung
26fca7256e
[VPlan][NFC] Use patterns in test check (#111086) 2024-10-04 17:19:07 +08:00
Benjamin Maxwell
01a1398971
[AArch64][Test] Update test variable names (NFC) (#110667)
Simply by running update_test_checks.py with no changes. This is to make
updating these tests for later changes easier.
2024-10-03 16:14:21 +01:00
Florian Hahn
9de327c94d
[LV] Generalize predication checks from 2c8836c899 for operands.
This fixes another case where the VPlan-based and legacy cost models
disagree. If any of the operands is predicated, it can't be trivially
hoisted and we should consider the cost for evaluating it each loop
iteration.

Fixes https://github.com/llvm/llvm-project/issues/108697.
2024-10-02 20:16:41 +01:00
Florian Hahn
6d6eea92e3
[LV] Use SCEV to simplify wide binop operand to constant.
The legacy cost model uses SCEV to determine if the second operand of a
binary op is a constant. Update the VPlan construction logic to mirror
the current legacy behavior, to fix a difference in the cost models.

Fixes https://github.com/llvm/llvm-project/issues/109528.
Fixes https://github.com/llvm/llvm-project/issues/110440.
2024-10-02 13:45:49 +01:00
Nikita Popov
9f3d1695eb
[SCEVExpander] Preserve gep nuw during expansion (#102133)
When expanding SCEV adds to geps, transfer the nuw flag to the resulting
gep. (Note that this doesn't apply to IV increment GEPs, which go
through a different code path.)
2024-10-02 11:45:00 +02:00
David Sherwood
0b2403197f
[LoopVectorize] In LoopVectorize.cpp start using getSymbolicMaxBackedgeTakenCount (#108833)
LoopVectorizationLegality currently only treats a loop as legal to vectorise
if PredicatedScalarEvolution::getBackedgeTakenCount returns a valid
SCEV, or more precisely that the loop must have an exact backedge taken
count. Therefore, in LoopVectorize.cpp we can safely replace all calls to
getBackedgeTakenCount with calls to getSymbolicMaxBackedgeTakenCount,
since the result is the same.

This also helps prepare the loop vectoriser for PR #88385.
2024-10-02 10:28:54 +01:00
Florian Hahn
0344123ffb
[VPlan] Manage FMFs for VPWidenCall via VPRecipeWithIRFlags. (NFC)
Update VPWidenCallRecipe to manage fast-math flags directly via
VPRecipeWithIRFlags. This addresses a TODO and allows adjusting the FMFs
directly on the recipe. Also fixes printing for flags for
VPWidenCallRecipe.
2024-10-01 13:20:34 +01:00
Ramkumar Ramachandra
f2ad39b77b
LV/test: improve a couple of tests, regen with UTC (#107225)
Add noalias, where applicable, to eliminate unnecessary memory check,
and regen with UTC.
2024-09-30 15:47:17 +01:00
Mel Chen
f8373cb0f9
[LV] Reuse VPReplicateRecipe to handle scalar stores in exit block. (#106342)
This patch separates the computation of the final reduction result and
the intermediate stores of reduction.

---------

Co-authored-by: Florian Hahn <flo@fhahn.com>
2024-09-30 15:35:09 +08:00
Florian Hahn
2c8836c899
[LV] Don't consider predicated insts as invariant unconditionally in CM.
Predicated instructions cannot hoisted trivially, so don't treat them as
uniform value in the cost model.

This fixes a difference between legacy and VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/110295.
2024-09-29 20:31:24 +01:00
Florian Hahn
2f7ccaf4a8
[SCEV] Add predicate in SolveLinEq to ensure B is a multiple of A. (#108777)
This can help in cases where pointer alignment info is missing, e.g.
https://github.com/llvm/llvm-project/pull/108210

The predicate is formed for the complex expression that's passed to
SolveLinEquationWithOverflow and the checks could probably be pushed
closer to the root nodes, which in some cases may be cheaper to check.


PR: https://github.com/llvm/llvm-project/pull/108777
2024-09-28 14:19:57 +01:00
Graham Hunter
6f1a8c2da2
[LV] Vectorize histogram operations (#99851)
This patch implements autovectorization support for the 'all-in-one'
histogram intrinsic, which seems to have more support than the
'standalone' intrinsic. See
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788/
for an overview of the work and my notes on the tradeoffs between the
two approaches.
2024-09-27 13:08:55 +01:00