2458 Commits

Author SHA1 Message Date
Florian Hahn
e775efcec4
[LV] Apply loop guards when checking recur during hoisting RT checks.
Apply loop guards when checking if the recurrence is non-negative in
cases where runtime checks are hoisted out of an inner loop.
2024-06-04 20:37:46 +01:00
Florian Hahn
164597616c
[LV] Add test for RT check hoisting where loop guards simplify check.
Add a test case with a missed simplification when hoisting runtime
checks due to not applying loop guards.
2024-06-04 09:32:22 +01:00
Ramkumar Ramachandra
59cb55d384
VPlan: add missing case for LogicalAnd; fix crash (#93553)
VPTypeAnalysis::inferScalarTypeForRecipe is missing the case for
VPInstruction::LogicalAnd, due to which the test
vplan-incomplete-cases.ll crashes. Add this missing case, and move the
test in vplan-infer-not-or-type.ll to vplan-incomplete-cases.ll, showing
correct codegen for trip-counts 2 and 3.
2024-06-04 08:58:16 +01:00
Florian Hahn
07b330132c
[VPlan] Model FOR extract of exit value in VPlan. (#93395)
This patch introduces a new ExtractFromEnd VPInstruction opcode to
extract the value of a FOR for users outside the loop (i.e. in the
scalar loop's exits). This moves the first part of fixing first order
recurrences to VPlan, and removes some additional code to patch up
live-outs, which is now handled automatically.

The majority of test changes is due to changes in the order of which the
extracts are generated now. As we are now using VPTransformState to
generate the extracts, we may be able to re-use existing extracts in the
loop body in some cases. For scalable vectors, in some cases we now have
to compute the runtime VF twice, as each extract is now independent, but
those should be trivial to clean up for later passes (and in line with
other places in the code that also liberally re-compute runtime VFs).

PR: https://github.com/llvm/llvm-project/pull/93395
2024-06-03 20:20:30 +01:00
Florian Hahn
f7e63e8b46
[LV] Operands feeding pointers of interleave member pointers are free.
For interleave groups we only create a pointer for the start of the
interleave group, not all original loads/stores. Mark single-use ops
feeding interleave group mem ops as free when vectorizing.
2024-06-01 13:59:29 +01:00
Florian Hahn
4c6367b3e5
[LV] Add test with strided interleave groups and maximizing bandwidth. 2024-06-01 12:26:00 +01:00
Florian Hahn
f38d84ce32
[VPlan] Use ir-bb prefix for VPIRBasicBlock.
Follow-up to adjust the names and tests after
https://github.com/llvm/llvm-project/pull/93398.
2024-05-30 17:43:40 -07:00
Ramkumar Ramachandra
43100766f2
LV: generalize profitability criterion over TC (#93300)
Generalize LoopVectorizationPlanner::isMoreProfitable smoothly across
the fixed-vector and scalable-vector cases, taking the trip-count into
account, and fixing logical pitfalls that arise from a lack of
generality.
2024-05-30 10:54:32 +01:00
Florian Hahn
8b037862b6
[VPlan] Preserve DT (and SCEV) in VPlan-native path (#93287)
As a follow-up to b2f65e80, use the DTU to also update and preserve
the DT in the native path. This should also allow preserving SCEV in the
native path

PR: https://github.com/llvm/llvm-project/pull/93287
2024-05-27 17:03:53 -07:00
Florian Hahn
bb4c8f9219
[SCEV] Don't add predicates already implied by UnionPredicate. (#93397)
Update SCEVUnionPredicate::add to only add predicates from another union
predicate, if they aren't alread implied by the union predicate we add
them to.

Note that there exists logic elsewhere to avoid adding predicates if
they are already implied, but this logic misses cases when only some
predicates of a union predicate are implied by the current set of
predicates.

PR: https://github.com/llvm/llvm-project/pull/93397
2024-05-26 18:31:36 -07:00
Florian Hahn
686600b521
[LV] Add test showing missed removal of implied predicate.
Tests for https://github.com/llvm/llvm-project/pull/93397
2024-05-26 17:23:14 -07:00
Florian Hahn
ac17fbc076
[VPlan] Add test for printing FOR with live-out.
Add additional test coverage for printing VPlans with a first-order
recurrence with its result used outside the loop.
2024-05-25 21:25:57 -07:00
Shih-Po Hung
0338c55ea5
[LV, VPlan] Check if plan is compatible to EVL transform (#92092)
The transform updates all users of inductions to work based on EVL,
instead
of the VF directly. At the moment, widened inductions cannot be updated,
so
bail out if the plan contains any.
This patch introduces a check before applying EVL transform. If any
recipes in loop rely on RuntimeVF, the plan is discarded.
2024-05-25 08:22:49 +08:00
Ramkumar Ramachandra
bb0d29a72d
[LV] fix logical error in trunc cost (#91136)
In LoopVectorizationCostModel::getInstructionCost(), when the condition
canTruncateToMinimalBitwidth() is satisfied, for a trunc, the source
type is computed as the smallest type of the source vector and the
destination vector, and the destination type is computed as the largest
type of the instruction and destination type. This is clearly a logical
error, as the original source vector type could be smaller than the
original destination vector type, and the trunc semantics are broken
because we're attempting to widen.

Fixes #47665.
2024-05-24 18:01:58 +01:00
Shih-Po Hung
b008a2d12a
[LV][NFC] precommit test for EVL transform (#92203)
A precommit test case to show vector loops generated from EVL transform
- This is a precommit test for
https://github.com/llvm/llvm-project/pull/92092
2024-05-24 23:21:59 +08:00
Ramkumar Ramachandra
dc148c9fb8
[LV] add test for #47665, #88802 (#91135) 2024-05-24 10:50:43 +01:00
Freddy Ye
4def1ce101
Reland "[X86] Remove knl/knm specific ISAs supports (#92883)" (#93136)
This reverts commit aa4069ea96e5eb62bc8c7895b9d920f129611b3a.
2024-05-24 13:46:34 +08:00
David Green
46541a3636 [ARM] Add a extra MVE low-trip-count loop. NFC
This makes use of half floats, which makes the masked stores expensive.
2024-05-23 21:50:47 +01:00
Freddy Ye
aa4069ea96
Revert "[X86] Remove knl/knm specific ISAs supports (#92883)" (#93123)
This reverts commit 282d2ab58f56c89510f810a43d4569824a90c538.
2024-05-23 10:25:23 +08:00
Freddy Ye
282d2ab58f
[X86] Remove knl/knm specific ISAs supports (#92883)
Cont. patch after https://github.com/llvm/llvm-project/pull/75580
2024-05-23 09:46:44 +08:00
Simon Pilgrim
0873b4ca29 [LoopVectorize] optimal-epilog-vectorization-profitability.ll - fix LABLE -> LABEL typo
Typo identified in #91854
2024-05-22 11:07:24 +01:00
Florian Hahn
a56e6dfd2e
[LV] Add test for header mask and invariant compare cost-modeling.
Additional test coverage for the VPlan-based cost model work.
2024-05-22 09:57:35 +01:00
Sander de Smalen
1015f51dd9
[AArch64] NFC: Rename -force-streaming-compatible-sve to -force-streaming-compatible (#92774)
The behaviour of the flag should be equivalent to
__arm_streaming_compatible.

At the moment, the name suggests that '-force-streaming-compatible-sve'
on its own (i.e. without specifying `+sve`) enables the compiler to use
the streaming-compatible subset of SVE instructions, but the semantics
merely are that the function can be called with either PSTATE.SM=0 or
PSTATE.SM=1.
2024-05-22 07:58:54 +01:00
Florian Hahn
352dc7d4bb
[LV] Propagate PredicatedBBsAfterVectorization to predecessors.
This fixes some cases where predicated BBs where missed previously,
leading to under-estimating the cost of those blocks.
2024-05-21 10:27:32 +01:00
hev
1e86e92428
[LoongArch] Enable interleaved vectorization (#92629)
This PR enables interleaved vectorization for LoongArch, with a default
interleaving factor of `2`.
2024-05-21 15:31:02 +08:00
Florian Hahn
82c5d350d2
[VPlan] Add commutative binary OR matcher, use in transform. (#92539)
Split off from https://github.com/llvm/llvm-project/pull/89386, this
extends the binary matcher to support matching commuative operations.
This is used for a new m_c_BinaryOr matcher, used in simplifyRecipe.

PR: https://github.com/llvm/llvm-project/pull/92539
2024-05-20 13:03:48 +01:00
Nikita Popov
8e8d2595da
[ConstantFolding] Canonicalize constexpr GEPs to i8 (#89872)
This patch canonicalizes constant expression GEPs to use i8 source
element type, aka ptradd. This is the ConstantFolding equivalent of the
InstCombine canonicalization introduced in #68882.

I believe all our optimizations working on constant expression GEPs
(like GlobalOpt etc) have already been switched to work on offsets, so I
don't expect any significant fallout from this change.

This is part of:
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699
2024-05-20 11:47:30 +02:00
Florian Hahn
b050048d35
[VPlan] Simplify (X && Y) || (X && !Y) -> X. (#89386)
Simplify a common pattern generated for masks when folding the tail.

PR: https://github.com/llvm/llvm-project/pull/89386
2024-05-19 15:45:23 +00:00
Florian Hahn
1e7d047c71
[VPlan] Mark LoopInfo preserved in native-path as well (NFC).
LoopInfo is updated during VPlan execution now, so it will also be
updated correctly in the native path.
2024-05-17 12:18:01 +01:00
Craig Topper
487b43cdc9
[RISCV] Pass subvector type to isLegalInterleavedAccessType in getInterleavedMemoryOpCost. (#91825)
isLegalInterleavedAccessType expects the subvector type, but
getInterleavedMemoryOpCost is called with the full vector type. So we
need to divide by Factor.
2024-05-15 21:47:29 -07:00
Pietro Ghiglio
83d9aa2768
[VPlan] Add scalar inferencing support for addrspace cast (#92107)
Fixes https://github.com/llvm/llvm-project/issues/91434

PR: https://github.com/llvm/llvm-project/pull/92107
2024-05-15 14:03:21 +01:00
Florian Hahn
b0a1ae2cca
[LV] Add additional variants of tests with udiv/urem/sdiv/srem in TC.
Add additional tests with udiv/urem/sdiv/srem in trip counts, where the
divisor is constant.

For https://github.com/llvm/llvm-project/pull/92177.
2024-05-15 11:17:23 +01:00
Florian Hahn
d187005cad
[VPlan] Update VPBlendRecipe codegen for for first-lane only.
Update VPBlendRecipe::execute to support generating code for first-lane
only. This fixes a crash in the newly added test
@test_not_first_lane_only_wide_compare_incoming_order_swapped.
2024-05-15 11:00:15 +01:00
Florian Hahn
cf5db39907
[LV] Add tests with trip counts containing UDIV expressions.
Add test cases for
https://github.com/llvm/llvm-project/issues/89958.
2024-05-14 20:28:27 +01:00
Florian Hahn
67d840b60f
[VPlan] Relax over-aggressive assertion in VPTransformState::get().
There are cases where a vector value has some users that demand the
the single scalar value only (NeedsScalar), while other users demand the
vector value (see attached test cases). In those cases, the NeedsScalar
users should only demand the first lane.

Fixes https://github.com/llvm/llvm-project/issues/91883.
2024-05-14 19:10:49 +01:00
Florian Hahn
632317e9ab
[VPlan] Add non-poison propagating LogicalAnd VPInstruction opcode. (#91897)
Add a new opcode to mode non-poison propagating logical AND operations
used when generating edge masks. This follows the similar decision to
model Not as dedicated opcode as well, to improve clarity.

This also helps to simplify the matchers for
https://github.com/llvm/llvm-project/pull/89386.


PR: https://github.com/llvm/llvm-project/pull/91897
2024-05-14 09:42:49 +01:00
Fangrui Song
ef9090fcb5 [test] Fix check prefixes 2024-05-13 14:01:00 -07:00
Simon Pilgrim
079fdef7d2 [TTI] getCommonMaskedMemoryOpCost - use the target getMemoryOpCost/getCFInstrCost implementations.
We were using the default implementations instead of the CRTP versions.
2024-05-11 12:50:26 +01:00
Florian Hahn
082c81ae4a
[LV] Properly extend versioned constant strides.
We only version unknown strides to 1. If the original type is i1, then
the sign of the extension matters. Properly extend the stride value
before replacing it.

Fixes https://github.com/llvm/llvm-project/issues/91369.
2024-05-07 21:31:42 +01:00
Florian Hahn
c76ccf0f1e
[LV] Add test case for #91369.
Add tests for https://github.com/llvm/llvm-project/issues/91369.
2024-05-07 20:41:55 +01:00
Florian Hahn
b54a78d69b
[LV,LAA] Don't vectorize loops with load and store to invar address.
Code checking stores to invariant addresses and reductions made an
incorrect assumption that the case of both a load & store to the same
invariant address does not need to be handled.

In some cases when vectorizing with runtime checks, there may be
dependences with a load and store to the same address, storing a
reduction value.

Update LAA to separately track if there was a store-store and a
load-store dependence with an invariant addresses.

Bail out early if there as a load-store dependence with invariant
address. If there was a store-store one, still apply the logic checking
if they all store a reduction.
2024-05-04 20:53:54 +01:00
Florian Hahn
401ecb4ccc
[LV] Add test showing miscompile with store reductions and RT checks.
Add anew test showing how a loop gets vectorized incorrectly with a
invariant store reduction where the same location is also read, when
vectorizing with runtime checks.
2024-05-03 18:54:00 +01:00
Mel Chen
3f1fef3699
[RISCV] Support interleaved accesses for scalable vector. (#90583)
The support for interleaved accesses for scalable vector with a factor
of 2 is enabled in vectorizer. Therefore, the patch removed the
restriction for scalable vector with a factor of 2.
2024-05-03 21:56:31 +08:00
Florian Hahn
bccb7ed8ac
Reapply "[LV] Improve AnyOf reduction codegen. (#78304)"
This reverts the revert commit c6e01627acf859.

This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in bce3bfced5fe0b019 and an assertion has been added in
c7209cbb8be7a3c65813.

--------------------------------
Original commit message:

Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the
start/new values.

Fixes https://github.com/llvm/llvm-project/issues/62565

PR: https://github.com/llvm/llvm-project/pull/78304
2024-05-03 14:40:49 +01:00
Alexey Bataev
1d43cdc9f5
[LV][EVL]Support reversed loads/stores.
Support for predicated vector reverse intrinsic was added some time ago.
Adds support for predicated reversed loads/stores in the loop
vectorizer.

Reviewers: fhahn

Reviewed By: fhahn

Pull Request: https://github.com/llvm/llvm-project/pull/88025
2024-05-03 07:28:56 -04:00
Florian Hahn
bce3bfced5
[LV] Add another epilogue test with an AnyOfReduction of i1.
Additional test case from
https://github.com/llvm/llvm-project/pull/78304.
2024-05-02 21:00:40 +01:00
Florian Hahn
9c3f5fe88f
[LV] Don't consider the latch block as ScalarPredicatedBB.
The conditional branch from the loop latch will be replaced by a
single branch controlling the loop, so there is no extra overhead from
scalarization. This improves the cost esimates in some cases.
2024-04-29 19:15:46 +01:00
David Green
d486a4c29a [ARM] Ensure extra uses are not dead in tail-folding-counting-down.ll. NFC
This might help keep the test valid if vplan is removing dead intructions.
2024-04-29 15:47:24 +01:00
Maciej Gabka
bfc0317153
Move several vector intrinsics out of experimental namespace (#88748)
This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice

from the experimental namespace.

All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.
2024-04-29 10:16:45 +01:00
Florian Hahn
b6a8f5486b
[LV] Consider all exit branch conditions uniform.
If we vectorize a loop with multiple exits, all exiting branches should
be considered uniform, as the resulting loop will be controlled by the
canonical IV only. Previously we were overestimating the cost of values
contributing to the other exits.
2024-04-28 13:15:55 +01:00