2428 Commits

Author SHA1 Message Date
Pietro Ghiglio
83d9aa2768
[VPlan] Add scalar inferencing support for addrspace cast (#92107)
Fixes https://github.com/llvm/llvm-project/issues/91434

PR: https://github.com/llvm/llvm-project/pull/92107
2024-05-15 14:03:21 +01:00
Florian Hahn
b0a1ae2cca
[LV] Add additional variants of tests with udiv/urem/sdiv/srem in TC.
Add additional tests with udiv/urem/sdiv/srem in trip counts, where the
divisor is constant.

For https://github.com/llvm/llvm-project/pull/92177.
2024-05-15 11:17:23 +01:00
Florian Hahn
d187005cad
[VPlan] Update VPBlendRecipe codegen for for first-lane only.
Update VPBlendRecipe::execute to support generating code for first-lane
only. This fixes a crash in the newly added test
@test_not_first_lane_only_wide_compare_incoming_order_swapped.
2024-05-15 11:00:15 +01:00
Florian Hahn
cf5db39907
[LV] Add tests with trip counts containing UDIV expressions.
Add test cases for
https://github.com/llvm/llvm-project/issues/89958.
2024-05-14 20:28:27 +01:00
Florian Hahn
67d840b60f
[VPlan] Relax over-aggressive assertion in VPTransformState::get().
There are cases where a vector value has some users that demand the
the single scalar value only (NeedsScalar), while other users demand the
vector value (see attached test cases). In those cases, the NeedsScalar
users should only demand the first lane.

Fixes https://github.com/llvm/llvm-project/issues/91883.
2024-05-14 19:10:49 +01:00
Florian Hahn
632317e9ab
[VPlan] Add non-poison propagating LogicalAnd VPInstruction opcode. (#91897)
Add a new opcode to mode non-poison propagating logical AND operations
used when generating edge masks. This follows the similar decision to
model Not as dedicated opcode as well, to improve clarity.

This also helps to simplify the matchers for
https://github.com/llvm/llvm-project/pull/89386.


PR: https://github.com/llvm/llvm-project/pull/91897
2024-05-14 09:42:49 +01:00
Fangrui Song
ef9090fcb5 [test] Fix check prefixes 2024-05-13 14:01:00 -07:00
Simon Pilgrim
079fdef7d2 [TTI] getCommonMaskedMemoryOpCost - use the target getMemoryOpCost/getCFInstrCost implementations.
We were using the default implementations instead of the CRTP versions.
2024-05-11 12:50:26 +01:00
Florian Hahn
082c81ae4a
[LV] Properly extend versioned constant strides.
We only version unknown strides to 1. If the original type is i1, then
the sign of the extension matters. Properly extend the stride value
before replacing it.

Fixes https://github.com/llvm/llvm-project/issues/91369.
2024-05-07 21:31:42 +01:00
Florian Hahn
c76ccf0f1e
[LV] Add test case for #91369.
Add tests for https://github.com/llvm/llvm-project/issues/91369.
2024-05-07 20:41:55 +01:00
Florian Hahn
b54a78d69b
[LV,LAA] Don't vectorize loops with load and store to invar address.
Code checking stores to invariant addresses and reductions made an
incorrect assumption that the case of both a load & store to the same
invariant address does not need to be handled.

In some cases when vectorizing with runtime checks, there may be
dependences with a load and store to the same address, storing a
reduction value.

Update LAA to separately track if there was a store-store and a
load-store dependence with an invariant addresses.

Bail out early if there as a load-store dependence with invariant
address. If there was a store-store one, still apply the logic checking
if they all store a reduction.
2024-05-04 20:53:54 +01:00
Florian Hahn
401ecb4ccc
[LV] Add test showing miscompile with store reductions and RT checks.
Add anew test showing how a loop gets vectorized incorrectly with a
invariant store reduction where the same location is also read, when
vectorizing with runtime checks.
2024-05-03 18:54:00 +01:00
Mel Chen
3f1fef3699
[RISCV] Support interleaved accesses for scalable vector. (#90583)
The support for interleaved accesses for scalable vector with a factor
of 2 is enabled in vectorizer. Therefore, the patch removed the
restriction for scalable vector with a factor of 2.
2024-05-03 21:56:31 +08:00
Florian Hahn
bccb7ed8ac
Reapply "[LV] Improve AnyOf reduction codegen. (#78304)"
This reverts the revert commit c6e01627acf859.

This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in bce3bfced5fe0b019 and an assertion has been added in
c7209cbb8be7a3c65813.

--------------------------------
Original commit message:

Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the
start/new values.

Fixes https://github.com/llvm/llvm-project/issues/62565

PR: https://github.com/llvm/llvm-project/pull/78304
2024-05-03 14:40:49 +01:00
Alexey Bataev
1d43cdc9f5
[LV][EVL]Support reversed loads/stores.
Support for predicated vector reverse intrinsic was added some time ago.
Adds support for predicated reversed loads/stores in the loop
vectorizer.

Reviewers: fhahn

Reviewed By: fhahn

Pull Request: https://github.com/llvm/llvm-project/pull/88025
2024-05-03 07:28:56 -04:00
Florian Hahn
bce3bfced5
[LV] Add another epilogue test with an AnyOfReduction of i1.
Additional test case from
https://github.com/llvm/llvm-project/pull/78304.
2024-05-02 21:00:40 +01:00
Florian Hahn
9c3f5fe88f
[LV] Don't consider the latch block as ScalarPredicatedBB.
The conditional branch from the loop latch will be replaced by a
single branch controlling the loop, so there is no extra overhead from
scalarization. This improves the cost esimates in some cases.
2024-04-29 19:15:46 +01:00
David Green
d486a4c29a [ARM] Ensure extra uses are not dead in tail-folding-counting-down.ll. NFC
This might help keep the test valid if vplan is removing dead intructions.
2024-04-29 15:47:24 +01:00
Maciej Gabka
bfc0317153
Move several vector intrinsics out of experimental namespace (#88748)
This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice

from the experimental namespace.

All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.
2024-04-29 10:16:45 +01:00
Florian Hahn
b6a8f5486b
[LV] Consider all exit branch conditions uniform.
If we vectorize a loop with multiple exits, all exiting branches should
be considered uniform, as the resulting loop will be controlled by the
canonical IV only. Previously we were overestimating the cost of values
contributing to the other exits.
2024-04-28 13:15:55 +01:00
Florian Hahn
6084dcbfce
[LV] Add additional cost model coverage for loops with casted inds.
Add test coverage for cost-model code-paths not covered by current unit
tests in preparation for
 https://github.com/llvm/llvm-project/pull/67934.
2024-04-27 21:30:12 +01:00
Florian Hahn
9ee8e38cdc
[VPlan] Also propagate versioned strides to users via sext/zext.
The versioned value may not be used in the loop directly but through a
sext/zext. Add new live-ins in those cases.
2024-04-26 21:29:43 +01:00
Florian Hahn
c49b74a4e6
[LV] Add tests showing missed propgation of versiond stride values.
Strides are used through a sext/zext and the known constant value (1)
isn't propagated during codegen.
2024-04-26 18:41:22 +01:00
Andreas Jonson
b8f3024a31
[InstCombine] Swap out range metadata to range attribute for cttz/ctlz/ctpop (#88776)
Since all optimizations that use range metadata now also handle range attribute, this patch replaces writes of
range metadata for call instructions to range attributes.
2024-04-25 01:45:50 +08:00
Patrick O'Neill
adb0126ef1
[VPlan] Add scalar inferencing support for Not and Or insns (#89160)
Fixes #87394.

PR: https://github.com/llvm/llvm-project/pull/89160
2024-04-23 15:48:43 +01:00
Florian Hahn
dadf6f2c5a
[VPlan] Ignore incoming values with constant false mask. (#89384)
Ignore incoming values with constant false masks when trying to simplify
VPBlendRecipes.

As a follow-on optimization, we should also be able to drop all incoming
values with false masks by creating a new VPBlendRecipe with those
operands dropped.

PR: https://github.com/llvm/llvm-project/pull/89384
2024-04-23 13:59:01 +01:00
Florian Hahn
17fb3e82f6
[VPlan] Skip extending ICmp results in trunateToMinimalBitwidth.
Results of icmp don't need extending after truncating their operands, as
the result will always be i1. Skip them during extending.

Fixes https://github.com/llvm/llvm-project/issues/79742
Fixes https://github.com/llvm/llvm-project/issues/85185
2024-04-23 11:50:26 +01:00
Florian Hahn
55fc5eb95f
[LV] Add additional cost model tests with inductions and truncates.
Add test coverage for additional cases not covered by current tests with
multiple inductions and truncates.
2024-04-23 09:24:01 +01:00
Florian Hahn
e2a72fa583
[VPlan] Introduce recipes for VP loads and stores. (#87816)
Introduce new subclasses of VPWidenMemoryRecipe for VP
(vector-predicated) loads and stores to address multiple TODOs from
https://github.com/llvm/llvm-project/pull/76172

Note that the introduction of the new recipes also improves code-gen for
VP gather/scatters by removing the redundant header mask. With the new
approach, it is not sufficient to look at users of the widened canonical
IV to find all uses of the header mask.

In some cases, a widened IV is used instead of separately widening the
canonical IV. To handle that, first collect all VPValues representing header
masks (by looking at users of both the canonical IV and widened inductions
that are canonical) and then checking all users (recursively) of those header
masks.

Depends on https://github.com/llvm/llvm-project/pull/87411.

PR: https://github.com/llvm/llvm-project/pull/87816
2024-04-19 09:44:23 +01:00
Ramkumar Ramachandra
73e7f2ff70
LoopVectorize: guard marking iv as scalar; fix bug (#88730)
When collecting loop scalars, LoopVectorize over-eagerly marks the
induction variable and its update as scalars after vectorization, even
if the induction variable update is a first-order recurrence. Guard the
process with this check, fixing a crash.

Fixes #72969.
2024-04-18 14:41:07 +01:00
Ramkumar Ramachandra
63d8058ef5
LoopVectorize: guard appending InstsToScalarize; fix bug (#88720)
In the process of collecting instructions to scalarize, LoopVectorize
uses faulty reasoning whereby it also adds instructions that will be
scalar after vectorization. If an instruction satisfies
isScalarAfterVectorization() for the given VF, it should not be appended
to InstsToScalarize. Add this extra guard, fixing a crash.

Fixes #55096.
2024-04-18 10:03:07 +01:00
Florian Hahn
a9bafe91dd
[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI). (#87411)
This patch introduces a new VPWidenMemoryRecipe base class and distinct
sub-classes to model loads and stores.

This is a first step in an effort to simplify and modularize code
generation for widened loads and stores and enable adding further more
specialized memory recipes.

PR: https://github.com/llvm/llvm-project/pull/87411
2024-04-17 11:00:58 +01:00
Arthur Eubanks
c6e01627ac Revert "Reapply "[LV] Improve AnyOf reduction codegen. (#78304)""
This reverts commit c6e38b928c56f562aea68a8e90f02dbdf0eada85.

Causes miscompiles, see comments on #78304.
2024-04-16 20:40:21 +00:00
Noah Goldstein
b6bd41db31 [InstCombine] Add canonicalization of sitofp -> uitofp nneg
This is essentially the same as #82404 but has the `nneg` flag which
allows the backend to reliably undo the transform.

Closes #88299
2024-04-16 15:26:25 -05:00
Florian Hahn
34777c238b
[VPlan] Don't mark VPBlendRecipe as phi-like.
VPBlendRecipes don't get lowered to phis and usually do not appear at
the beginning of blocks, due to their masks appearing before them.

This effectively relaxes an over-eager verifier message.

Fixes https://github.com/llvm/llvm-project/issues/88297.
Fixes https://github.com/llvm/llvm-project/issues/88804.
2024-04-16 21:24:25 +01:00
Alexey Bataev
e84b2fb48d
[LV][NFCI]Use integer for cost/trip count calculations instead of double, fix possible UB.
Using fp type in the compiler is not the best idea, here it used with
the comparison for equal to 0 and may cause undefined behavior in some
cases.

Reviewers: fhahn

Reviewed By: fhahn

Pull Request: https://github.com/llvm/llvm-project/pull/87241
2024-04-16 09:48:13 -04:00
Shih-Po Hung
f3a8112d98
[RISCV][TTI] Scale the cost of ICmp with LMUL (#88235)
Use the Val type to estimate the instruction cost for ICmp.
2024-04-16 09:37:32 +08:00
Florian Hahn
6254b6dd89
[VPlan] Version VPValue names in VPSlotTracker. (#81411)
This patch restructures the way names for printing VPValues are handled.
It moves the logic to generate names for printing to VPSlotTracker.

VPSlotTracker will now version names of the same underlying value if it
is used by multiple VPValues, by adding a .V suffix to the name.

This fixes cases where at the moment the same name is printed for
different VPValues.

PR: https://github.com/llvm/llvm-project/pull/81411
2024-04-15 12:27:45 +01:00
wanglei
1e7763557b
[LoongArch] Add support for getNumberOfRegisters() (#88372)
The `TTI` hooks are used during vectorization for calculating register
pressure. The default implementation defined wrong value for register
number (all register class are 8 registers).

This patch also defines LoongArch's own register classes.
2024-04-12 16:15:02 +08:00
Yingwei Zheng
b109477615
[InstCombine] Infer nsw/nuw for trunc (#87910)
This patch adds support for inferring trunc's nsw/nuw flags.
2024-04-11 19:10:53 +08:00
Paschalis Mpeis
e50c4c83b6
[AArch64][TLI] Add TLI mappings for ArmPL modf, sincos, sincospi (#83143)
ArmPL 24.04 release fixes a bug concerning these methods,
so now they can be re-introduced to TLI mappings.
2024-04-10 09:34:46 +01:00
Florian Hahn
a8ec1eb843
[VPlan] Dont assign slots to VPValues with an underlying value.
This makes sure the numbering for VPValues without underlying
values is consecutive.
2024-04-09 21:30:51 +01:00
Simon Pilgrim
3bfd5c6424
[TTI] getCommonMaskedMemoryOpCost - consistently use getScalarizationOverhead instead of ExtractElement costs for address/mask extraction. (#87771)
These aren't unknown extraction indices, we will be extracting every address/mask element in sequence.
2024-04-09 15:42:51 +01:00
Florian Hahn
c836983671
[VPlan] Remove unused first mask op from VPBlendRecipe. (#87770)
VPBlendRecipe does not use the first mask operand. Removing it allows
VPlan-based DCE to remove unused mask computations.

This also fixes #87410, where unused Not VPInstructions are considered
having only their first lane demanded, but some of their operands
providing a vector value due to other users.

Fixes https://github.com/llvm/llvm-project/issues/87410

PR: https://github.com/llvm/llvm-project/pull/87770
2024-04-09 11:14:05 +01:00
Florian Hahn
fa8a726672
[LV] Make global_alias.ll test independent of O1 pipeline.
Update global_alias.ll with the IR after the O1 pipeline. Depending on
the O1 makes the tests more fragile and also makes it more difficult to
reason about the behavior of the tests, as it doesn't show the IR before
LoopVectorize.
2024-04-06 14:48:41 +01:00
Florian Hahn
233c030dcb
[LV] Add extra tests for induction cost modeling. 2024-04-06 12:36:07 +01:00
Florian Hahn
c6e38b928c
Reapply "[LV] Improve AnyOf reduction codegen. (#78304)"
This reverts the revert commit 589c7abb03448.

This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in 399ff08e29d.

--------------------------------
Original commit message:

Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the
start/new values.

Fixes https://github.com/llvm/llvm-project/issues/62565

PR: https://github.com/llvm/llvm-project/pull/78304
2024-04-05 13:45:13 +01:00
Alexey Bataev
413a66f339
[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172)
This patch introduces generating VP intrinsics in the Loop Vectorizer.

Currently the Loop Vectorizer supports vector predication in a very
limited capacity via tail-folding and masked load/store/gather/scatter
intrinsics. However, this does not let architectures with active vector
length predication support take advantage of their capabilities.
Architectures with general masked predication support also can only take
advantage of predication on memory operations. By having a way for the
Loop Vectorizer to generate Vector Predication intrinsics, which (will)
provide a target-independent way to model predicated vector
instructions. These architectures can make better use of their
predication capabilities.

Our first approach (implemented in this patch) builds on top of the
existing tail-folding mechanism in the LV (just adds a new tail-folding
mode using EVL), but instead of generating masked intrinsics for memory
operations it generates VP intrinsics for loads/stores instructions. The
patch adds a new VPlanTransforms to replace the wide header predicate
compare with EVL and updates codegen for load/stores to use VP
store/load with EVL.

Other important part of this approach is how the Explicit Vector Length
is computed. (VP intrinsics define this vector length parameter as
Explicit Vector Length (EVL)). We use an experimental intrinsic
`get_vector_length`, that can be lowered to architecture specific
instruction(s) to compute EVL.

Also, added a new recipe to emit instructions for computing EVL. Using
VPlan in this way will eventually help build and compare VPlans
corresponding to different strategies and alternatives.

Differential Revision: https://reviews.llvm.org/D99750
2024-04-04 18:30:17 -04:00
Florian Hahn
7bd163d0a4
[VPlan] Clean up dead recipes after UF & VF specific simplification.
Recursively remove dead recipes after simplifying vector loop exit
branch.
2024-04-04 12:05:08 +01:00
Florian Hahn
399ff08e29
[LV] Precommit tests with any-of reductions and epilogue vectorization.
Test case for failures from
https://lab.llvm.org/buildbot/#/builders/74/builds/26697
caused the revert of 95fef1d in 589c7ab.
2024-04-03 13:32:32 +01:00