2372 Commits

Author SHA1 Message Date
Benjamin Kramer
f872043e05 Revert "[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821)"
This reverts commit c2c1e6ee4ce0df3d000ba880fa6cf58441da6462. It creates
a use after free.

==8342==ERROR: AddressSanitizer: heap-use-after-free on address 0x50f000001760 at pc 0x55b9fb84a8fb bp 0x7ffc18468a10 sp 0x7ffc18468a08
READ of size 1 at 0x50f000001760 thread T0
 #0 0x55b9fb84a8fa in dropPoisonGeneratingFlags llvm/lib/Transforms/Vectorize/VPlan.h:1040:13
 #1 0x55b9fb84a8fa in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>)::$_0::operator()(llvm::VPRecipeBase*) const llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1236:23
 #2 0x55b9fb84a196 in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

Can be reproduced with asan on
Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll
Transforms/LoopVectorize/X86/pr81872.ll
Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll
2024-03-20 15:14:58 +01:00
Yingwei Zheng
f0420c7bc6
[ValueTracking] Handle not in isImpliedCondition (#85397)
This patch handles `not` in `isImpliedCondition` to enable more fold in
some multi-use cases.
2024-03-20 16:16:42 +08:00
Noah Goldstein
6960ace534 Revert "[InstCombine] Canonicalize (sitofp x) -> (uitofp x) if x >= 0"
This reverts commit d80d5b923c6f611590a12543bdb33e0c16044d44.

It wasn't a particularly important transform to begin with and caused
some codegen regressions on targets that prefer `sitofp` so dropping.

Might re-visit along with adding `nneg` flag to `uitofp` so its easily
reversable for the backend.
2024-03-20 00:50:45 -05:00
Florian Hahn
c2c1e6ee4c
[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821)
Dropping disjoint from an OR may yield incorrect results, as some
analysis may have converted it to an Add implicitly (e.g. SCEV used for
dependence analysis). Instead, replace it with an equivalent Add.

This is possible as all users of the disjoint OR only access lanes where
the operands are disjoint or poison otherwise.

Note that replacing all disjoint ORs with ADDs instead of dropping the
flags is not strictly necessary. It is only needed for disjoint ORs that
SCEV treated as ADDs, but those are not tracked.

There are other places that may drop poison-generating flags; those
likely need similar treatment.

Fixes https://github.com/llvm/llvm-project/issues/81872


PR: https://github.com/llvm/llvm-project/pull/83821
2024-03-19 20:16:18 +01:00
Michele Scandale
09eb9f1136
[InstCombine] Fix for folding select into floating point binary operators. (#83200)
Folding a `select` into a floating point binary operators can only be
done if the result is preserved for both case. In particular, if the
other operand of the `select` can be a NaN, then the transformation
won't preserve the result value.
2024-03-19 09:47:07 -07:00
Kirill Stoimenov
589c7abb03 Revert "[LV] Improve AnyOf reduction codegen. (#78304)"
Broke sanitizer bots: https://lab.llvm.org/buildbot/#/builders/74/builds/26697

This reverts commit 95fef1dfefd5467206e74c089d29806fcd82889b.
2024-03-14 14:57:01 +00:00
Florian Hahn
95fef1dfef
[LV] Improve AnyOf reduction codegen. (#78304)
Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the
start/new values.

Fixes https://github.com/llvm/llvm-project/issues/62565

PR: https://github.com/llvm/llvm-project/pull/78304
2024-03-14 11:22:06 +00:00
Noah Goldstein
d80d5b923c [InstCombine] Canonicalize (sitofp x) -> (uitofp x) if x >= 0
Just a standard canonicalization.

Proofs: https://alive2.llvm.org/ce/z/9W4VFm

Closes #82404
2024-03-13 18:26:21 -05:00
Florian Hahn
1402c016ff
[VPlan] Use VPBuilder to create BranchOnCond in VPHCFGBuilder.
This simplifies the code to create the recipe slightly as well as
properly retaining the debug location of the input IR.
2024-03-13 14:30:09 +00:00
annamthomas
866ac9a165
[LV] Address postcommit review for PR84782 (#84797)
This testcase was added to show miscompile in
https://github.com/llvm/llvm-project/issues/81872
2024-03-11 13:23:00 -04:00
annamthomas
34acdb3ec2
Precommit testcase for pr81872 (#84782)
Testcase shows miscompile when dropping disjoint flag from disjoint or
during vectorization.
2024-03-11 12:16:52 -04:00
Cameron McInally
416debf79b
[test] Move pr73894.ll to AArch64 directory and update the target triple (#84269)
pr73894.ll is failing on a number of non-AArch64 buildbots. I'm not
certain that this is a proper fix, but I think it's best to move the
test to the test/Transforms/LoopVectorize/AArch64/ directory and replace
the triple with one commonly used in that directory.

llvm#73894
2024-03-06 21:25:28 -05:00
Cameron McInally
012d217174
[LV] Use scalar CMP for active-lane-mask with scalar VF (#83902)
Instead of generating a <1 x i1> active lane mask intrinsic, generate
the equivalent scalar ICMP instead. This allows us to avoid
unnecessarily extracting the scalar part from the vector mask.

Fixes llvm#73894.
2024-03-06 15:59:35 -05:00
Niwin Anto
eaf0d82529
[LV] Disable fold tail by masking when IV is used outside (#81609)
When induction variable are used outside the loop body, tail folding
by masking mis-compiles, because for users outside of the loop the
final value of the induction is computed separately from the vector
loop.

Fixes https://github.com/llvm/llvm-project/issues/76069
Fixes https://github.com/llvm/llvm-project/issues/51677
2024-03-04 11:33:30 +00:00
Shih-Po Hung
6ee9c8afbc
[RISCV][CostModel] Updates reduction and shuffle cost (#77342)
- Make `andi` cost 1 in SK_Broadcast
- Query the cost of VID_V, VRSUB_VX/VRSUB_VI which would scale with LMUL
2024-02-29 15:41:19 +08:00
Nilanjana Basu
1c211bc76e
[LV] Remove unused configuration option (#82955)
Recent set of changes (PR #67725) in loop interleaving algorithm caused removal of the loop trip count threshold for allowing interleaving. Therefore configuration option interleave-small-loop-scalar-reduction is no longer needed.
2024-02-28 10:17:25 -08:00
Niwin Anto
ce0687e2df
[LV] Add test for tail fold by masking with external IV users. (#82329)
Test case for https://github.com/llvm/llvm-project/issues/76069
2024-02-28 13:46:00 +00:00
Florian Hahn
15d9d0fa8f
[VPlan] Also print final VPlan directly before codegen/execute. (#82269)
Some optimizations are apply after UF and VF have been chosen. This
patch adds an extra print of the final VPlan just before
codegen/execution.

In the future, there will be additional transforms that are applied
later (interleaving for example).

PR: https://github.com/llvm/llvm-project/pull/82269
2024-02-28 13:19:43 +00:00
Florian Hahn
e421c12e47
[VPlan] Remove left-over CHECK-NOT line.
This removes a  CHECK-NOT: vector.body line from the test which seems to
imply the test does not get vectorized, but it does now.

This line was left over from when the test was pre-committed, remove it.
2024-02-27 09:38:40 +00:00
Florian Hahn
911055e34f
[VPlan] Consistently use (Part, 0) for first lane scalar values (#80271)
At the moment, some VPInstructions create only a single scalar value,
but use VPTransformatState's 'vector' storage for this value. Those
values are effectively uniform-per-VF (or in some cases
uniform-across-VF-and-UF). Using the vector/per-part storage doesn't
interact well with other recipes, that more accurately using (Part,
Lane) to look up scalar values and prevents VPInstructions creating
scalars from interacting with other recipes working with scalars.

This PR tries to unify handling of scalars by using (Part, 0) for scalar
values where only the first lane is demanded. This allows using
VPInstructions with other recipes like VPScalarCastRecipe and is also
needed when using VPInstructions in more cases otuside the vector loop
region to generate scalars.

Depends on https://github.com/llvm/llvm-project/pull/80269
2024-02-26 19:06:43 +00:00
Benjamin Kramer
e7c60915e6 Remove duplicated REQUIRES: asserts 2024-02-23 12:01:30 +01:00
Ramkumar Ramachandra
f5c8e9e531
LoopVectorize/test: guard pr72969 with asserts (#82653)
Follow up on 695a9d8 (LoopVectorize: add test for crash in #72969) to
guard pr72969.ll with REQUIRES: asserts, in order to be reasonably
confident that it will crash reliably.
2024-02-22 19:55:18 +00:00
Benjamin Kramer
3168af56bc LoopVectorize: Mark crash test as requiring assertions 2024-02-22 20:25:58 +01:00
Philip Reames
f67ef1a8d9 [RISCV][LV] Add additional small trip count loop coverage 2024-02-22 08:30:25 -08:00
Philip Reames
9eb5f94f9b [RISCV][AArch64] Add vscale_range attribute to tests per architecture minimums
Spent a bunch of time tracing down an odd issue "in SCEV" which turned out
to be the fact that SCEV doesn't have access to TTI.  As a result, the only
way for it to get range facts on vscales (to avoid collapsing ranges of
element counts and type sizes to trivial ranges on multiplies) is to look
at the vscale_range attribute.  Since vscale_range is set by clang by
default, manually setting it in the tests shouldn't interfere with the
test intent.
2024-02-22 08:11:24 -08:00
Ramkumar Ramachandra
695a9d84dc
LoopVectorize: add test for crash in #72969 (#74111) 2024-02-22 16:00:33 +00:00
Florian Hahn
9923d29cfa
[VPlan] Merge main VPlan verifer with HCFG verifier.
Unify VPlan verifiers in verifyVPlanIsValid. This adds verification for
various properties on blocks to the verifier used for VPlans generated
by the inner loop vectorizer. It also adds def-use checks for the
verifier used in the VPlan native path.

This drops the separate flag to enable HCFG verification. Instead, all
VPlans are verified once they have been created, if assertions are
enabled.

This also removes VPWidenPHIRecipe from VPHeaderPHIRecipe; it is used to
model any phi node in the native path.
2024-02-20 16:43:57 +00:00
Florian Hahn
0dacba3ad1
[VPlan] Handle truncating ICMPs in truncateToMinimalBWs.
Update truncateToMinimalBitwidths to handle truncating ICMPs. For ICMPs,
the new target type will be the same as the original type. In that case,
only truncate the operands, but skip the extend. This is in line with
what the original truncateToMinimalBitwidths did for compares.

Fixes https://github.com/llvm/llvm-project/issues/81415.
2024-02-16 12:58:56 +00:00
Rohit Aggarwal
36adfec155
Adding support of AMDLIBM vector library (#78560)
Hi,

AMD has it's own implementation of vector calls. This patch include the
changes to enable the use of AMD's math library using -fveclib=AMDLIBM.
Please refer https://github.com/amd/aocl-libm-ose 

---------

Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>
2024-02-15 12:13:07 +05:30
David Sherwood
1c10821022
[LoopVectorize] Fix divide-by-zero bug (#80836) (#81721)
When attempting to use the estimated trip count to refine the costs of
the runtime memory checks we should also check for sane trip counts to
prevent divide-by-zero faults on some platforms.

Fixes #80836
2024-02-14 16:07:51 +00:00
Fangrui Song
3d18c8cd26 [test] Replace aarch64-*-{eabi,gnueabi}{,hf} with aarch64
Similar to d39b4ce3ce8a3c256e01bdec2b140777a332a633
Using "eabi" or "gnueabi" for aarch64 targets is a common mistake and
warned by Clang Driver. We want to avoid them elsewhere as well. Just
use the common "aarch64" without other triple components.
2024-02-12 18:29:55 -08:00
Nikita Popov
7c0d52ca91
[ValueTracking] Support dominating known bits condition in and/or (#74728)
This extends computeKnownBits() support for dominating conditions to
also handle and/or conditions. We'll look through either and or or
depending on which edge we're considering.

This change is mainly for the sake of completeness, so we don't start
missing optimizations if SimplifyCFG decides to merge some branches.
2024-02-08 09:47:49 +01:00
Philip Reames
1aafe7605b [test] Regen a test for naming changes 2024-02-06 18:06:24 -08:00
Philip Reames
c5bf1f4b8f [test] Autogen a test for ease of update in forthcoming patch 2024-02-06 17:59:54 -08:00
Nilanjana Basu
c1c5b854ad
[LV] Remove loop trip count threshold for deciding whether to interleave a loop (#67725)
A set of microbenchmarks (https://github.com/llvm/llvm-test-suite/pull/26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.
2024-02-05 17:23:58 -08:00
Florian Hahn
8cb2de7fae
[VPlan] Implement type inference for ICmp.
This fixes a crash in the attached test case due to missing type
inference for ICmp VPInstructions.
2024-02-05 15:42:07 +00:00
Nikita Popov
2d69827c5c [Transforms] Convert tests to opaque pointers (NFC) 2024-02-05 11:57:34 +01:00
Florian Hahn
47abbf4fe9
[VPlan] Update VPInst::onlyFirstLaneUsed to check users. (#80269)
A VPInstruction only has its first lane used if all users use its first
lane only. Use vputils::onlyFirstLaneUsed to continue checking the
recipe's users to handle more cases.

Besides allowing additional introduction of scalar steps when
interleaving in some cases, this also enables using an Add VPInstruction
to model the increment - as a follow up.
2024-02-03 16:19:10 +00:00
Maciej Gabka
0f26441cb8
[TLI][AArch64] Adjust TLI mappings to vector functions taking linear pointers (#80296)
The masked symbols in SLEEF are incorrectly implemented as calls to non
masked variants, what only works fine for functions which do not modify
memory.
For vector variants which modify memory we can only use a non masked
symbols for now.
The SVE ArmPL mappings need to be removed for now as well.
2024-02-02 08:42:29 +00:00
Florian Hahn
cec24f0d7e
[VPlan] Update stale test after 9536a6286, fix formatting. 2024-01-31 13:45:38 +00:00
Florian Hahn
9536a6286e
[VPlan] Preserve original induction order when creating scalar steps.
Update createScalarIVSteps to take an insert point as parameter. This
ensures that the inserted scalar steps are in the same order as the
recipes they replace (vs in reverse order as currently). This helps to
reduce the diff for follow-up changes.
2024-01-31 13:31:28 +00:00
Nilanjana Basu
c492eb6b28
[LV] Update interleaving count computation when scalar epilogue loop needs to run at least once (#79651)
Update loop interleaving count computation to address loops that require at least one scalar iteration in the epilogue loop. For this case, the available trip count for interleaving the loop is one less.
2024-01-29 13:41:15 -08:00
Nilanjana Basu
155f24b11e
[Tests][LV][AArch64] Pre-commit tests for changing loop interleaving count computation for loops that need to run scalar iterations (#79640)
This patch contains a set of pre-commit tests for changing the loop interleaving count computation in a subsequent patch in order to address loops that need to execute at least a single scalar iteration in the epilogue.
2024-01-29 10:21:23 -08:00
David Sherwood
962fbafecf
[LoopVectorize] Refine runtime memory check costs when there is an outer loop (#76034)
When we generate runtime memory checks for an inner loop it's
possible that these checks are invariant in the outer loop and
so will get hoisted out. In such cases, the effective cost of
the checks should reduce to reflect the outer loop trip count.

This fixes a 25% performance regression introduced by commit

49b0e6dcc296792b577ae8f0f674e61a0929b99d

when building the SPEC2017 x264 benchmark with PGO, where we
decided the inner loop trip count wasn't high enough to warrant
the (incorrect) high cost of the runtime checks. Also, when
runtime memory checks consist entirely of diff checks these are
likely to be outer loop invariant.
2024-01-26 14:43:48 +00:00
Florian Hahn
731c2049a4
[VPlan] Relax IV user assertion after 0ab539f for epilogue vec.
After 0ab539fd6748adf2f638e10514dd9419597d8863, the canonical IV in the
epilogue vector loop may be used by a trunc. Relax the corresponding
assert.

This should fix some build-bot failures, including
    https://lab.llvm.org/buildbot/#/builders/187/builds/14113
    https://lab.llvm.org/buildbot/#/builders/98/builds/32350
    https://lab.llvm.org/buildbot/#/builders/239/builds/5473
2024-01-26 13:19:25 +00:00
Graham Hunter
d4c0171423
[LV] Fix handling of interleaving linear args (#78725)
Currently when interleaving vector calls with linear arguments,
the Part is ignored and all vector calls use the initial value
from the first lane of the current iteration.

Fix this to extract from the correct part of the linear vector.
2024-01-26 11:30:35 +00:00
Florian Hahn
0ab539fd67
[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113)
Add a new recipe to model scalar cast instructions, without relying on
an underlying instruction.

This allows creating scalar casts, without relying on an underlying
instruction (like the current VPReplicateRecipe). The new recipe is 
used to explicitly model both truncating the induction step and the
VPDerivedIVRecipe, thus simplifying both the recipe and code
needed to introduce it.

Truncating VPWidenIntOrFpInductionRecipes should also be modeled using
the new recipe, as follow-up.

PR: https://github.com/llvm/llvm-project/pull/78113
2024-01-26 11:13:05 +00:00
David Spickett
4a91206359 [llvm][LV] Move new test into X86 subfolder
Added in a04f6152914ea21f3068aaba9d8fc21d2e703d3e.

Failing on our Arm only bots:
https://lab.llvm.org/buildbot/#/builders/245/builds/19684
2024-01-25 17:04:34 +00:00
Florian Hahn
a04f615291
[LV] Check for innermost loop instead of EnableVPlanNativePath in CM.
Replace EnableVPlanNativePath checks in the cost-model by assertions
that the code is only called for innermost loops. This ensures that the
cost model isn't used in the VPlanNativePath, which is only used for
outer-loop vectorization.

Even with EnableVPlanNativePath, inner loops are processed by the
inner loop vectorization path, not the native path, so checking for
EnableVPlanNativePath may impact decisions for inner loops and can
cause crashes, like in the attached test case.
2024-01-25 12:49:52 +00:00
Nikita Popov
90ba33099c
[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00