2363 Commits

Author SHA1 Message Date
annamthomas
866ac9a165
[LV] Address postcommit review for PR84782 (#84797)
This testcase was added to show miscompile in
https://github.com/llvm/llvm-project/issues/81872
2024-03-11 13:23:00 -04:00
annamthomas
34acdb3ec2
Precommit testcase for pr81872 (#84782)
Testcase shows miscompile when dropping disjoint flag from disjoint or
during vectorization.
2024-03-11 12:16:52 -04:00
Cameron McInally
416debf79b
[test] Move pr73894.ll to AArch64 directory and update the target triple (#84269)
pr73894.ll is failing on a number of non-AArch64 buildbots. I'm not
certain that this is a proper fix, but I think it's best to move the
test to the test/Transforms/LoopVectorize/AArch64/ directory and replace
the triple with one commonly used in that directory.

llvm#73894
2024-03-06 21:25:28 -05:00
Cameron McInally
012d217174
[LV] Use scalar CMP for active-lane-mask with scalar VF (#83902)
Instead of generating a <1 x i1> active lane mask intrinsic, generate
the equivalent scalar ICMP instead. This allows us to avoid
unnecessarily extracting the scalar part from the vector mask.

Fixes llvm#73894.
2024-03-06 15:59:35 -05:00
Niwin Anto
eaf0d82529
[LV] Disable fold tail by masking when IV is used outside (#81609)
When induction variable are used outside the loop body, tail folding
by masking mis-compiles, because for users outside of the loop the
final value of the induction is computed separately from the vector
loop.

Fixes https://github.com/llvm/llvm-project/issues/76069
Fixes https://github.com/llvm/llvm-project/issues/51677
2024-03-04 11:33:30 +00:00
Shih-Po Hung
6ee9c8afbc
[RISCV][CostModel] Updates reduction and shuffle cost (#77342)
- Make `andi` cost 1 in SK_Broadcast
- Query the cost of VID_V, VRSUB_VX/VRSUB_VI which would scale with LMUL
2024-02-29 15:41:19 +08:00
Nilanjana Basu
1c211bc76e
[LV] Remove unused configuration option (#82955)
Recent set of changes (PR #67725) in loop interleaving algorithm caused removal of the loop trip count threshold for allowing interleaving. Therefore configuration option interleave-small-loop-scalar-reduction is no longer needed.
2024-02-28 10:17:25 -08:00
Niwin Anto
ce0687e2df
[LV] Add test for tail fold by masking with external IV users. (#82329)
Test case for https://github.com/llvm/llvm-project/issues/76069
2024-02-28 13:46:00 +00:00
Florian Hahn
15d9d0fa8f
[VPlan] Also print final VPlan directly before codegen/execute. (#82269)
Some optimizations are apply after UF and VF have been chosen. This
patch adds an extra print of the final VPlan just before
codegen/execution.

In the future, there will be additional transforms that are applied
later (interleaving for example).

PR: https://github.com/llvm/llvm-project/pull/82269
2024-02-28 13:19:43 +00:00
Florian Hahn
e421c12e47
[VPlan] Remove left-over CHECK-NOT line.
This removes a  CHECK-NOT: vector.body line from the test which seems to
imply the test does not get vectorized, but it does now.

This line was left over from when the test was pre-committed, remove it.
2024-02-27 09:38:40 +00:00
Florian Hahn
911055e34f
[VPlan] Consistently use (Part, 0) for first lane scalar values (#80271)
At the moment, some VPInstructions create only a single scalar value,
but use VPTransformatState's 'vector' storage for this value. Those
values are effectively uniform-per-VF (or in some cases
uniform-across-VF-and-UF). Using the vector/per-part storage doesn't
interact well with other recipes, that more accurately using (Part,
Lane) to look up scalar values and prevents VPInstructions creating
scalars from interacting with other recipes working with scalars.

This PR tries to unify handling of scalars by using (Part, 0) for scalar
values where only the first lane is demanded. This allows using
VPInstructions with other recipes like VPScalarCastRecipe and is also
needed when using VPInstructions in more cases otuside the vector loop
region to generate scalars.

Depends on https://github.com/llvm/llvm-project/pull/80269
2024-02-26 19:06:43 +00:00
Benjamin Kramer
e7c60915e6 Remove duplicated REQUIRES: asserts 2024-02-23 12:01:30 +01:00
Ramkumar Ramachandra
f5c8e9e531
LoopVectorize/test: guard pr72969 with asserts (#82653)
Follow up on 695a9d8 (LoopVectorize: add test for crash in #72969) to
guard pr72969.ll with REQUIRES: asserts, in order to be reasonably
confident that it will crash reliably.
2024-02-22 19:55:18 +00:00
Benjamin Kramer
3168af56bc LoopVectorize: Mark crash test as requiring assertions 2024-02-22 20:25:58 +01:00
Philip Reames
f67ef1a8d9 [RISCV][LV] Add additional small trip count loop coverage 2024-02-22 08:30:25 -08:00
Philip Reames
9eb5f94f9b [RISCV][AArch64] Add vscale_range attribute to tests per architecture minimums
Spent a bunch of time tracing down an odd issue "in SCEV" which turned out
to be the fact that SCEV doesn't have access to TTI.  As a result, the only
way for it to get range facts on vscales (to avoid collapsing ranges of
element counts and type sizes to trivial ranges on multiplies) is to look
at the vscale_range attribute.  Since vscale_range is set by clang by
default, manually setting it in the tests shouldn't interfere with the
test intent.
2024-02-22 08:11:24 -08:00
Ramkumar Ramachandra
695a9d84dc
LoopVectorize: add test for crash in #72969 (#74111) 2024-02-22 16:00:33 +00:00
Florian Hahn
9923d29cfa
[VPlan] Merge main VPlan verifer with HCFG verifier.
Unify VPlan verifiers in verifyVPlanIsValid. This adds verification for
various properties on blocks to the verifier used for VPlans generated
by the inner loop vectorizer. It also adds def-use checks for the
verifier used in the VPlan native path.

This drops the separate flag to enable HCFG verification. Instead, all
VPlans are verified once they have been created, if assertions are
enabled.

This also removes VPWidenPHIRecipe from VPHeaderPHIRecipe; it is used to
model any phi node in the native path.
2024-02-20 16:43:57 +00:00
Florian Hahn
0dacba3ad1
[VPlan] Handle truncating ICMPs in truncateToMinimalBWs.
Update truncateToMinimalBitwidths to handle truncating ICMPs. For ICMPs,
the new target type will be the same as the original type. In that case,
only truncate the operands, but skip the extend. This is in line with
what the original truncateToMinimalBitwidths did for compares.

Fixes https://github.com/llvm/llvm-project/issues/81415.
2024-02-16 12:58:56 +00:00
Rohit Aggarwal
36adfec155
Adding support of AMDLIBM vector library (#78560)
Hi,

AMD has it's own implementation of vector calls. This patch include the
changes to enable the use of AMD's math library using -fveclib=AMDLIBM.
Please refer https://github.com/amd/aocl-libm-ose 

---------

Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>
2024-02-15 12:13:07 +05:30
David Sherwood
1c10821022
[LoopVectorize] Fix divide-by-zero bug (#80836) (#81721)
When attempting to use the estimated trip count to refine the costs of
the runtime memory checks we should also check for sane trip counts to
prevent divide-by-zero faults on some platforms.

Fixes #80836
2024-02-14 16:07:51 +00:00
Fangrui Song
3d18c8cd26 [test] Replace aarch64-*-{eabi,gnueabi}{,hf} with aarch64
Similar to d39b4ce3ce8a3c256e01bdec2b140777a332a633
Using "eabi" or "gnueabi" for aarch64 targets is a common mistake and
warned by Clang Driver. We want to avoid them elsewhere as well. Just
use the common "aarch64" without other triple components.
2024-02-12 18:29:55 -08:00
Nikita Popov
7c0d52ca91
[ValueTracking] Support dominating known bits condition in and/or (#74728)
This extends computeKnownBits() support for dominating conditions to
also handle and/or conditions. We'll look through either and or or
depending on which edge we're considering.

This change is mainly for the sake of completeness, so we don't start
missing optimizations if SimplifyCFG decides to merge some branches.
2024-02-08 09:47:49 +01:00
Philip Reames
1aafe7605b [test] Regen a test for naming changes 2024-02-06 18:06:24 -08:00
Philip Reames
c5bf1f4b8f [test] Autogen a test for ease of update in forthcoming patch 2024-02-06 17:59:54 -08:00
Nilanjana Basu
c1c5b854ad
[LV] Remove loop trip count threshold for deciding whether to interleave a loop (#67725)
A set of microbenchmarks (https://github.com/llvm/llvm-test-suite/pull/26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.
2024-02-05 17:23:58 -08:00
Florian Hahn
8cb2de7fae
[VPlan] Implement type inference for ICmp.
This fixes a crash in the attached test case due to missing type
inference for ICmp VPInstructions.
2024-02-05 15:42:07 +00:00
Nikita Popov
2d69827c5c [Transforms] Convert tests to opaque pointers (NFC) 2024-02-05 11:57:34 +01:00
Florian Hahn
47abbf4fe9
[VPlan] Update VPInst::onlyFirstLaneUsed to check users. (#80269)
A VPInstruction only has its first lane used if all users use its first
lane only. Use vputils::onlyFirstLaneUsed to continue checking the
recipe's users to handle more cases.

Besides allowing additional introduction of scalar steps when
interleaving in some cases, this also enables using an Add VPInstruction
to model the increment - as a follow up.
2024-02-03 16:19:10 +00:00
Maciej Gabka
0f26441cb8
[TLI][AArch64] Adjust TLI mappings to vector functions taking linear pointers (#80296)
The masked symbols in SLEEF are incorrectly implemented as calls to non
masked variants, what only works fine for functions which do not modify
memory.
For vector variants which modify memory we can only use a non masked
symbols for now.
The SVE ArmPL mappings need to be removed for now as well.
2024-02-02 08:42:29 +00:00
Florian Hahn
cec24f0d7e
[VPlan] Update stale test after 9536a6286, fix formatting. 2024-01-31 13:45:38 +00:00
Florian Hahn
9536a6286e
[VPlan] Preserve original induction order when creating scalar steps.
Update createScalarIVSteps to take an insert point as parameter. This
ensures that the inserted scalar steps are in the same order as the
recipes they replace (vs in reverse order as currently). This helps to
reduce the diff for follow-up changes.
2024-01-31 13:31:28 +00:00
Nilanjana Basu
c492eb6b28
[LV] Update interleaving count computation when scalar epilogue loop needs to run at least once (#79651)
Update loop interleaving count computation to address loops that require at least one scalar iteration in the epilogue loop. For this case, the available trip count for interleaving the loop is one less.
2024-01-29 13:41:15 -08:00
Nilanjana Basu
155f24b11e
[Tests][LV][AArch64] Pre-commit tests for changing loop interleaving count computation for loops that need to run scalar iterations (#79640)
This patch contains a set of pre-commit tests for changing the loop interleaving count computation in a subsequent patch in order to address loops that need to execute at least a single scalar iteration in the epilogue.
2024-01-29 10:21:23 -08:00
David Sherwood
962fbafecf
[LoopVectorize] Refine runtime memory check costs when there is an outer loop (#76034)
When we generate runtime memory checks for an inner loop it's
possible that these checks are invariant in the outer loop and
so will get hoisted out. In such cases, the effective cost of
the checks should reduce to reflect the outer loop trip count.

This fixes a 25% performance regression introduced by commit

49b0e6dcc296792b577ae8f0f674e61a0929b99d

when building the SPEC2017 x264 benchmark with PGO, where we
decided the inner loop trip count wasn't high enough to warrant
the (incorrect) high cost of the runtime checks. Also, when
runtime memory checks consist entirely of diff checks these are
likely to be outer loop invariant.
2024-01-26 14:43:48 +00:00
Florian Hahn
731c2049a4
[VPlan] Relax IV user assertion after 0ab539f for epilogue vec.
After 0ab539fd6748adf2f638e10514dd9419597d8863, the canonical IV in the
epilogue vector loop may be used by a trunc. Relax the corresponding
assert.

This should fix some build-bot failures, including
    https://lab.llvm.org/buildbot/#/builders/187/builds/14113
    https://lab.llvm.org/buildbot/#/builders/98/builds/32350
    https://lab.llvm.org/buildbot/#/builders/239/builds/5473
2024-01-26 13:19:25 +00:00
Graham Hunter
d4c0171423
[LV] Fix handling of interleaving linear args (#78725)
Currently when interleaving vector calls with linear arguments,
the Part is ignored and all vector calls use the initial value
from the first lane of the current iteration.

Fix this to extract from the correct part of the linear vector.
2024-01-26 11:30:35 +00:00
Florian Hahn
0ab539fd67
[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113)
Add a new recipe to model scalar cast instructions, without relying on
an underlying instruction.

This allows creating scalar casts, without relying on an underlying
instruction (like the current VPReplicateRecipe). The new recipe is 
used to explicitly model both truncating the induction step and the
VPDerivedIVRecipe, thus simplifying both the recipe and code
needed to introduce it.

Truncating VPWidenIntOrFpInductionRecipes should also be modeled using
the new recipe, as follow-up.

PR: https://github.com/llvm/llvm-project/pull/78113
2024-01-26 11:13:05 +00:00
David Spickett
4a91206359 [llvm][LV] Move new test into X86 subfolder
Added in a04f6152914ea21f3068aaba9d8fc21d2e703d3e.

Failing on our Arm only bots:
https://lab.llvm.org/buildbot/#/builders/245/builds/19684
2024-01-25 17:04:34 +00:00
Florian Hahn
a04f615291
[LV] Check for innermost loop instead of EnableVPlanNativePath in CM.
Replace EnableVPlanNativePath checks in the cost-model by assertions
that the code is only called for innermost loops. This ensures that the
cost model isn't used in the VPlanNativePath, which is only used for
outer-loop vectorization.

Even with EnableVPlanNativePath, inner loops are processed by the
inner loop vectorization path, not the native path, so checking for
EnableVPlanNativePath may impact decisions for inner loops and can
cause crashes, like in the attached test case.
2024-01-25 12:49:52 +00:00
Nikita Popov
90ba33099c
[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00
wanglei
fcff4582f0
[LoongArch] Permit auto-vectorization using LSX/LASX with auto-vec feature (#78943)
With enough codegen complete, we can now correctly report the size of
vector registers for LSX/LASX, allowing auto vectorization (The
`auto-vec` feature needs to be enabled simultaneously).

As described, the `auto-vec` feature is an experimental one. To ensure
that automatic vectorization is not enabled by default, because the
information provided by the current `TTI` cannot yield additional
benefits for automatic vectorization.
2024-01-23 09:06:35 +08:00
Alexandros Lamprineas
530c72b498
[TLI] Add missing ArmPL mappings (#78474)
Adds TLI mappings for fixed and scalable vector variants of cospi(f),
fmax(f), ilogb(f) and ldexp(f).
2024-01-22 17:15:17 +00:00
Jay Foad
7017efa1a1 Fix typo "widended" 2024-01-19 13:50:26 +00:00
Graham Hunter
689da340ed [NFC][LV] Test precommit for interleaved linear args 2024-01-19 12:59:09 +00:00
Alexandros Lamprineas
92289db82f
[VFABI] Move the Vector ABI demangling utility to LLVMCore. (#77513)
This fixes #71892 allowing us to check magled names in the IR verifier.
2024-01-17 09:55:30 +00:00
Fangrui Song
9e9907f1cf
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.

For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.

This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:

```
  LLVM :: CodeGen/AMDGPU/fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fabs.ll
  LLVM :: CodeGen/AMDGPU/floor.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
  LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
  LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
2024-01-16 21:54:58 -08:00
Maciej Gabka
279dfe77da
[TLI][AArch64] Add extra SLEEF mappings and tests (#78140)
This patch is adding more scalar to vector mappings to the TLI
for the SLEEF vector library.
The added mappings are for the following functions:

 acosh, asinh, cbrt, copysign, cospi
 erf, erfc, expm1, fdim, fma, fmax, fmin
 hypot, ilogb, ldexp, log1p, nextafter, sinpi.

It also brings back accidentally removed tests for sincospi.
2024-01-16 14:51:38 +00:00
Mel Chen
b6e8f6604c
[LV] Skipping all debug instructions when native vplan is enabled (#77413)
The following internal error occurred when using native vplan to
vectorize the program with the debug info generation.

Assertion `!isa<DbgInfoIntrinsic>(CI) && "DbgInfoIntrinsic should have been dropped during VPlan construction"' failed.

This patch ignored all debug instructions to fix the error when native
vplan is enabled.
2024-01-16 11:08:10 +08:00
Jonas Paulsson
62b7e35f10
[SystemZ] Don't assert for i128 vectors in getInterleavedMemoryOpCost() (#78009)
This assert does not seem justified given that the LoopVectorizer can
form interleave groups containing i128 elements where the number of
elements per vector is indeed just one.
2024-01-15 17:31:18 +01:00