2026 Commits

Author SHA1 Message Date
Florian Hahn
b0da998494
[LV] Extend recurrence test coverage for sinking memory instructions.
Extra coverage for D143604, D143605.
2023-04-17 13:08:15 +01:00
Florian Hahn
02369b75fd
[VPlan] Mark recurrence recipes as not having side-effects.
Add support for FirstOrderRecurrenceSplice and VPFirstOrderRecurrencePHI
recipes to mayHaveSideEffects. They both don't have side-effects.
2023-04-17 12:30:52 +01:00
Florian Hahn
a0d667b89b
[LV] Add users to recurrence tests to make sure they are not removable.
This ensures VPlan-based DCE won't be able to remove the unused
recurrences.

It also adds a dedicated new test (@unused_recurrence) where an unused
recurrence can be removed.
2023-04-17 11:56:56 +01:00
David Sherwood
69ee653313 [LoopVectorize] Take vscale into account when deciding to create epilogues
In LoopVectorizationCostModel::isEpilogueVectorizationProfitable we
check to see if the chosen main vector loop VF >= 16. If so, we
decide to create a vector epilogue loop. However, this doesn't
take VScaleForTuning into account because we could be targeting a
CPU where vscale > 1, and hence the runtime VF would be a multiple
of the known minimum value.

This patch multiplies scalable VFs by VScaleForTuning and several
tests have been updated that now produce vector epilogues.

Differential Revision: https://reviews.llvm.org/D147522
2023-04-17 10:49:40 +00:00
Florian Hahn
83ab5708d1
[LV] Don't sink scalar instructions that may read from memory.
The current sinking code doesn't prevent us from sinking a load past an
aliasing store. Skip sinking instructions that may read from memory to
avoid a mis-compile.

See @minimal_bit_widths_with_aliasing_store for an example where 2 loads
are sunk past aliasing stores before this fix.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147259
2023-04-17 09:30:25 +01:00
Florian Hahn
7fc0b3049d
[VPlan] Switch to checking sinking legality for recurrences in VPlan.
Building on D142885 and D142589, retire the SinkAfter map from the
recurrence handling code. It is replaced by checking whether it is
possible to sink all users of a recurrence directly in VPlan. This
results in simpler code overall and allows to handle additional cases
(see the improvements in @test_crash).

Depends on D142885.
Depends on D142589.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142886
2023-04-13 22:00:52 +01:00
Florian Hahn
cdb48eefd0
[LV] Add exit users for recurrences in test, fix names.
Add users of exit values for recurrences to make sure exit value
generation will be checked in a follow-up change.

Also adjusts/fixes naming in the test.
2023-04-13 20:23:56 +01:00
Simon Pilgrim
aa754f7e0f [IR] llvm::createMinMaxOp - create integer min/max intrinsics instead of icmp/sel
Based off D148215, when expanding a min/max reduction we should be creating min/max intrinsics directly instead of relying on instcombine to fold them back together.

This patch handles integer min/max cases. Hopefully we can add floating point support soon (at least for fastmath/nnan cases) - but we're missing some of the plumbing to pass the correct FMF to the intrinsic at the moment.

Differential Revision: https://reviews.llvm.org/D148221
2023-04-13 16:40:43 +01:00
Craig Topper
4b47d875a1 [LV] Optimize trip count SCEV.
To calculate the trip count we need to add 1 to the backedge
taken count. If we need to widen the backedge count, it's better
to do the add before the widening if we can guarantee it won't
overflow.

The code here is based on similar code I found in
LoopIdiomRecognize.

This is the vectorizer version of this InstCombine patch D142783.
Looking at the IR diffs, this does look like it gets more cases
than the InstCombine patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D147355
2023-04-12 16:17:58 -07:00
Sjoerd Meijer
d827865e9f Recommit "[AArch64][TTI] Cost model FADD/FSUB/FNEG""
Fixed two test cases that relied on Asserts, and added a fallthrough
annotation to the switch case.
2023-04-11 12:48:15 +01:00
Florian Hahn
f9d0b35d22
[LV] Re-use already computed runtime VF in fixFixedOrderRecurrence.
This was suggested as independent cleanup in D147472.

This removes a redundant runtime VF computation when using scalable
vectors.
2023-04-10 21:25:12 +01:00
Florian Hahn
35af27c30a
[VPlan] Only create extracts for recurrence exits if there are live-outs.
Move the code to collect live-out earlier and only generate extracts for
exit values if there are any live-outs that use them.

Depends on D147472.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147567
2023-04-10 21:08:34 +01:00
Florian Hahn
b0e118bd77
[LV] Update tests checking VPlans to use patterns for VPValues.
This makes the tests more robust to changes in value numbering for
VPValues.
2023-04-09 20:32:09 +01:00
Florian Hahn
620e011a25
[VPlan] Don't add live-outs if scalar epilogue is required.
Instead of clearing live outs when a scalar epilogue is required late,
don't add live outs during VPlan construction if a scalar epilogue is
required.

This enables more VPlan-based DCE (if the live out would be the only
user in the plan) and is a step towards removing an access of the cost
model in fixedVectorizedLoop (which is after VPlan execution).

Depends on D147468.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147471
2023-04-09 09:18:24 +01:00
David Sherwood
9278dd7b2b [LoopVectorize] Fix zext/sext cost calculations when types are shrunk
In getInstructionCost if we know a zext/sext is going to be shrunk
we should only be changing the destination type, and leave the
source type unchanged. For example, we may change a zext from

  zext <16 x i8> %a to <16 x i32>

to

  zext <16 x i8> %a to <16 x i16>

However, we were previously calculating the cost for doing

  zext <16 x i16> %a to <16 x i16>

which is incorrect.

Differential Revision: https://reviews.llvm.org/D147152
2023-04-06 08:52:25 +00:00
Nikita Popov
53f7f85703 [LoopVectorize] Convert some tests to opaque pointers (NFC) 2023-04-06 09:38:47 +02:00
Philip Reames
c416f6700f [IVDescriptors] Add pointer InductionDescriptors with non-constant strides (try 2)
(JFYI - This has been heavily reframed since original attempt at landing.)

This change updates the InductionDescriptor logic to allow matching a pointer IV with a non-constant stride, but also updates the LoopVectorizer to bailout on such descriptors by default. This preserves the default vectorizer behavior.

In review, it was pointed out that there's multiple unfortunate performance implications which need to be addressed before this can be enabled. Having a flag allows us to exercise the behavior, and write test cases for logic which is otherwise unreachable (or hard to reach).

This will also enable non-constant stride pointer recurrences for other consumers. I've audited said code, and don't see any obvious issues.

Differential Revision: https://reviews.llvm.org/D147336
2023-04-05 09:32:35 -07:00
Philip Reames
37646a2c28 [RISCV] Account for LMUL in memory op costs
Generally, the cost of a memory op will scale with the number of vector registers accessed. Machines might exist which have a narrow memory access than vector register width, but machines with a wider memory access width than vector register width seem unlikely.

I noticed this because we were preferring wide loads + deinterleaves on examples where the cost of a short gather (actually a strided load) would be better. Touching 8 vector registers instead of doing a 4 element gather is not a good tradeoff.

Differential Revision: https://reviews.llvm.org/D147470
2023-04-05 07:58:56 -07:00
Graham Hunter
185863f7de [LV] Use available masked vector function variants when required
LLVM has the ability to vectorize using function variants that require
a mask by creating an all-true mask, and to vectorize a conditional
call via scalarization, now we want to join the two parts together
and use a masked variant when a mask is required.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D136251
2023-04-05 11:18:38 +01:00
Florian Hahn
b5925a22c9
[LV] Add uses of recurrences in exit blocks in some tests.
This preserves the spirit of the tests even if a follow-up changes only
generates exit values for recurrences if they are actually used.
2023-04-04 21:19:29 +01:00
Luke Lau
971a4501f7 [RISCV] Model vlseg/vsseg in interleaved memory ops
If the legalized type is a legal interleaved access type (i.e. there's a
supported vlseg/vsseg instruction for it), the interleaved access pass
will pick any interleaved memory op (wide load + shuffles) and lower it
into a vlseg/vsseg intrinsic.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D146522
2023-04-04 15:05:14 +01:00
David Sherwood
af2ed59f2b [NFC][LoopVectorize] Add zext/sext cost tests when there is type shrinkage
Differential Revision: https://reviews.llvm.org/D147151
2023-04-03 13:12:11 +00:00
Florian Hahn
0d61ffd350
[Loads] Support SCEVAddExpr as start for pointer AddRec.
Extend handling to support `%base + offset` as start for AddRecs in
isDereferenceableAndAlignedInLoop. This is done by adjusting AccessSize
by the offset and effectively checking if the full object starting from
%base to %base + offset + access-size is dereferenceable.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D147260
2023-04-02 12:33:44 +01:00
Florian Hahn
fdaebbeff4
[LV] Improve test added in 74dee4791a2.
Adjust test so it triggers a case missed in the original version
of D147260.
2023-03-31 21:50:39 +01:00
Florian Hahn
74dee4791a
[LV] Add test with predicated load where EltSize % Align != 0. 2023-03-31 21:15:24 +01:00
Philip Reames
78ae870f11 {tests] Rerun autogen to reduce a diff [nfc] 2023-03-31 12:47:08 -07:00
Philip Reames
a512ce5e12 [LV] Add tests for non-constant stride pointer inductions
Reduced from the case which triggered the revert of 498aa534f472, and then generalized to cover both expansion paths.
2023-03-31 09:10:59 -07:00
David Green
965a090f02 Revert "[IVDescriptors] Add pointer InductionDescriptors with non-constant strides"
Multiple errors have being reported on
https://reviews.llvm.org/rG498aa534f472d28db893aa9a8627d0b46e17f312

Reverting until the correctness issues can be resolved.

We are also seeing a lot of performance differences from the patch.  Some are
looking good, but some are looking pretty bad.
2023-03-31 11:08:50 +01:00
Florian Hahn
b060ca7042
[LV] Regenerate check lines for test to reduce diff in follow-up patch. 2023-03-30 20:17:12 +01:00
Philip Reames
498aa534f4 [IVDescriptors] Add pointer InductionDescriptors with non-constant strides
This matches the handling for integer IVs.  I left the non-opaque cases alone, mostly because they're largely irrelevant today.

This doesn't actually make much difference in vectorization right now as we immediately fail on aliasing checks (which also bail on non-constant strides).  Slightly suprisingly, it's the case which *do* need runtime checks which work after this patch as they don't use the same dependency analysis path.

This will also enable non-constant stride pointer recurrences for other consumers.  I've auditted said code, and don't see any obvious issues.
2023-03-30 11:56:00 -07:00
Philip Reames
1c5bb25d62 [RISCV][LV] Add test coverage for strided access patterns [nfc] 2023-03-30 10:02:27 -07:00
Florian Hahn
4173ed1382
[LV] Add test cases for global struct dereferencability.
Currently LLVM fails to determine that conditional loads in
@accesses_to_struct_dereferenceable are dereferenceable unconditionally.
2023-03-29 17:47:41 +01:00
Paul Osmialowski
6b6f312cce [TLI][AArch64] Extend SLEEF vectorized functions mapping with VLA functions
This commit extends D134719 "[AArch64] Enable libm vectorized
functions via SLEEF" with the mappings for the scalable functions.

It also introduces all the necessary changes needed to support masked
interfaces.

Reviewed By: danielkiss, sdesmalen

Differential Revision: https://reviews.llvm.org/D146839
2023-03-29 13:07:09 +01:00
Paul Osmialowski
f8f1909d36 Revert "[TLI][AArch64] Extend SLEEF vectorized functions mapping with VLA functions"
Reverting it so I could land it with Arcanist.

This reverts commit 59dcf927ee43e995374907b6846b657f68d7ea49.
2023-03-29 12:54:22 +01:00
Paul Osmialowski
59dcf927ee [TLI][AArch64] Extend SLEEF vectorized functions mapping with VLA functions
This commit extends D134719 "[AArch64] Enable libm vectorized
functions via SLEEF" with the mappings for the scalable functions.

It also introduces all the necessary changes needed to support masked
interfaces.

Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com>
2023-03-29 11:07:35 +01:00
Graham Hunter
fba2a7c695 [LV][AArch64] Precommit interleaved access tests
Precommit for D145163
2023-03-29 10:26:14 +01:00
Philip Reames
64f69e453e [RISCV] Cost model for general case of single vector permute
The cost model was not accounting for the fact that we can generate vrgather + an index expression.

Two cases to call out.
1) I did not model the difference between vrgather and vrgatherei16. The result is the constant pool cost can be slightly understated on RV32. I don't think we care, but if someone disagrees, this would be easy to add.
2) Our current codegen for i8 vectors longer than 256 (which is the limit of what this costs) has some room for improvement.

Differential Revision: https://reviews.llvm.org/D147000
2023-03-28 07:34:11 -07:00
David Sherwood
636efd2e35 [SVE][LoopVectorize] Add option to disable tail-folding for reverse loops
If we use tail-folding for reverse loops that contain loads
and stores then we will need to reverse the loop predicate.
This patch adds a new 'reverse' sve-tail-folding option and
ensures they are not considered 'simple'.

I did this by adding a function called
containsDecreasingPointers to AArch64TargetTransformInfo.cpp
that searches all instructions in the loop for loads or
stores with negative strides.

Differential Revision: https://reviews.llvm.org/D146128
2023-03-27 14:10:15 +00:00
David Green
3f8160e9ce [LV][ARM][AArch64] Add multi-exit Loop Vectorizer tests. NFC
These are useful to test with the various predication schemes available on
different targets.
2023-03-27 10:21:24 +01:00
David Sherwood
1c4fedfa35 [LoopVectorize] Don't tail-fold for scalable VFs when there is no scalar tail
Currently in LoopVectorize we avoid tail-folding if we can
prove the trip count is always a multiple of the maximum
fixed-width VF. This works because we know the vectoriser
only ever chooses a VF that is a power of 2. However, if
we are also considering scalable VFs then we conservatively
bail out of the optimisation because we don't know the value
of vscale, which could be an odd or prime number, etc.

This patch tries to enable the same optimisation for scalable
VFs by asking if vscale is known to be a power of 2. If so,
we can then query the maximum value of vscale and use the same
logic as we do for fixed-width VFs. I've also added a new TTI
hook called isVScaleKnownToBeAPowerOfTwo that does the same
thing as the existing TargetLowering hook.

Differential Revision: https://reviews.llvm.org/D146199
2023-03-27 08:34:30 +00:00
David Sherwood
bd0c281fcd [NFC][LoopVectorize] Change trip counts for some tests to guarantee a scalar tail
Quite a few vectoriser tests were using a trip count of 1024,
which meant:

1. For fixed-length VFs we would never actually tail-fold, e.g.
see Transforms/LoopVectorize/RISCV/uniform-load-store.ll. This
is because we can prove at compile-time there will never be a
scalar tail.
2. As of D146199 the same optimisation mentioned above will also
apply to scalable VFs too.

I've changed all such trip counts to be 1025 instead.

Differential Revision: https://reviews.llvm.org/D146219
2023-03-24 09:43:50 +00:00
Luke Lau
8d16c6809a [RISCV] Increase default vectorizer LMUL to 2
After some discussion and experimentation, we have seen that changing the default number of vector register bits to LMUL=2 strikes a sweet spot.
Whilst we could be clever here and make the vectorizer smarter about dynamically selecting an LMUL that
a) Doesn't affect register pressure
b) Suitable for the microarchitecture
we would need to teach its heuristics about RISC-V register grouping specifics.
Instead this just does the easy, pragmatic thing by changing the default to a safe value that doesn't affect register pressure signifcantly[1], but should increase throughput and unlock more interleaving.

[1] Register spilling when compiling sqlite at various levels of `-riscv-v-register-bit-width-lmul`:

LMUL=1    2573 spills
LMUL=2    2583 spills
LMUL=4    2819 spills
LMUL=8    3256 spills

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D143723
2023-03-23 10:33:50 +00:00
Luke Lau
06f16232b1 [RISCV][NFC] Make interleaved access test more vectorizable
The previous test case stored the result of a deinterleaved load and add
into the same source address, which resulted in some scatters which we
weren't testing for and made the tests harder to understand.
Store it at a separate address, which will make the tests easier to read
when the cost model is changed after D145085 is landed

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D146442
2023-03-22 16:02:44 +00:00
Anna Thomas
4277d932ef [LV] Use speculatability within entire loop to avoid strided load predication
Use existing functionality for identifying total access size by strided
loads. If we can speculate the load across all vector iterations, we can
avoid predication for these strided loads (or masked gathers in
architectures which support it).

Differential Revision: https://reviews.llvm.org/D145616
2023-03-21 12:08:25 -04:00
Florian Hahn
962c306a11
[LV] Don't consider pointer as uniform if it is also stored.
Update isVectorizedMemAccessUse to also check if the pointer is stored.
This prevents LV to incorrectly consider a pointer as uniform if it is
used as both pointer and stored by the same StoreInst.

Fixes #61396.
2023-03-17 16:26:16 +00:00
Florian Hahn
a4bb037418
[LV] Add test where pointer is incorrectly marked as uniform.
Test for #61396.
2023-03-17 14:24:13 +00:00
Florian Hahn
565b98e793
[LV] Convert consecutive-ptr-uniforms.ll to use opaque pointers (NFC). 2023-03-17 14:07:11 +00:00
Douglas Yung
3e16488769 Mark test added in D145155 as requiring asserts since it uses the "-debug-only" option.
This should fix the test failure in Release builds.
2023-03-16 15:03:19 -07:00
Luke Lau
b9238abe05 [RISCV] Enable interleaved access vectorization
The loop vectorizer supports generating interleaved loads and stores via
shuffle patterns for fixed length vectors.
This enables it for RISC-V, since interleaved shuffle patterns can be
lowered to vlseg/vsseg in https://reviews.llvm.org/D145022

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D145155
2023-03-16 15:48:55 +00:00
Luke Lau
fc220a1aa9 Revert "[RISCV] Enable interleaved access vectorization"
This reverts commit acc03ad10af4f379a644e3956cb9aca54e40696c.
2023-03-15 22:00:48 +00:00