2615 Commits

Author SHA1 Message Date
Florian Hahn
dd94537b40
[LV] Update call widening decision when scalarzing calls.
collectInstsToScalarize may decide to scalarize a call. If so, we have
to update the widening decision for the call, otherwise the call won't
be scalarized as expected during VPlan construction.

This issue was uncovered by f82543d509.
2024-09-03 14:12:41 +01:00
Simon Pilgrim
6c8746b6e3
[Analysis] getIntrinsicForCallSite - add vectorization support for acos/asin/atan and cosh/sinh/tanh libcalls (#106844)
Followup to #106584 - ensure acos/asin/atan and cosh/sinh/tanh libcalls correctly map to the llvm intrinsic equivalents
2024-09-03 10:05:56 +01:00
Florian Hahn
954ed05c10
[VPlan] Simplify MUL operands at recipe construction.
This moves the logic to create simplified operands using SCEV to MUL
recipe creation. This is needed to match the behavior of the legacy's cost
model. TODOs are to extend to other opcodes and move to a transform.

Note that this also restricts the number of SCEV simplifications we
apply to more precisely match the cases handled by the legacy cost
model.

Fixes https://github.com/llvm/llvm-project/issues/107015.
2024-09-02 21:25:31 +01:00
Florian Hahn
50a02e7c68
[VPlan] Pass intrinsic inst to TTI in VPWidenCallRecipe::computeCost.
Follow-up to 9ccf825, adjust computeCost to also pass IntrinsicInst to
TTI if available, as there are multiple places in TTI which use the
IntrinsicInst.

Fixes https://github.com/llvm/llvm-project/issues/107016.
2024-09-02 20:47:37 +01:00
Florian Hahn
b0de7fa466
[VPlan] Use op from underlying call in computeCost if needed.
This fixes a divergence between legacy and VPlan-based cost model, e.g.
if one of the operands has an first-order recurrence phi as operand.
2024-09-02 14:00:10 +01:00
Nikita Popov
f044564db1
[InstCombine] Make backedge check in op of phi transform more precise (#106075)
The op of phi transform wants to prevent moving an operation across a
backedge, as this may lead to an infinite combine loop.

Currently, this is done using isPotentiallyReachable(). The problem with
that is that all blocks inside a loop are reachable from each other.
This means that the op of phi transform is effectively completely
disabled for code inside loops, even when it's not actually operating on
a loop phi (just a phi that happens to be in a loop).

Fix this by explicitly computing the backedges inside the function
instead. Do this via RPOT, which is a bit more efficient than using
FindFunctionBackedges() (which does it without any pre-computed
analyses).

For irreducible cycles, there may be multiple possible choices of
backedge, and this just picks one of them. This is still sufficient to
prevent combine loops.

This also removes the last use of LoopInfo in InstCombine -- I'll drop
the analysis in a followup.
2024-09-02 09:09:21 +02:00
Florian Hahn
654bb4e9f2
[LV] Don't consider branches leaving loop in collectValuesToIgnore.
Branches exiting the loop will remain regardless, so don't consider them
in collectValuesToIgnore.

This fixes another divergence between legacy and VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/106780.
2024-09-01 20:35:36 +01:00
Yingwei Zheng
380fa875ab
[InstCombine] Replace all dominated uses of condition with constants (#105510)
This patch replaces all dominated uses of condition with true/false to
improve context-sensitive optimizations. It eliminates a bunch of
branches in llvm-opt-benchmark.

As a side effect, it may introduce new phi nodes in some corner cases.
See the following case:
```
define i1 @test(i1 %cmp, i1 %cond) {
entry:
   br i1 %cond, label %bb1, label %bb2
bb1:
   br i1 %cmp, label %if.then, label %if.else
if.then:
   br %bb2
if.else:
   br %bb2
bb2:
  %res = phi i1 [%cmp, %entry], [%cmp, %if.then], [%cmp, %if.else]
  ret i1 %res
}
```
It will be simplified into:
```
define i1 @test(i1 %cmp, i1 %cond) {
entry:
   br i1 %cond, label %bb1, label %bb2
bb1:
   br i1 %cmp, label %if.then, label %if.else
if.then:
   br %bb2
if.else:
   br %bb2
bb2:
  %res = phi i1 [%cmp, %entry], [true, %if.then], [false, %if.else]
  ret i1 %res
}
```

I am planning to fix this in late pipeline/CGP since this problem exists
before the patch.
2024-09-01 09:49:23 +08:00
Simon Pilgrim
4d412bedcc [LoopVectorize][X86] amdlibm-calls.ll - add missing sinh and f64 test coverage to all functions
Shows failure to vectorise acos/asin/atan and cosh/sinh/tanh libcalls if they don't have a corresponding veclib mapping
2024-08-31 11:48:22 +01:00
Philip Reames
4b553f4916 Regen a bunch of vectorizer tests to avoid naming churn in upcoming review 2024-08-30 10:13:02 -07:00
Simon Pilgrim
d58d105cda
[Analysis] isTriviallyVectorizable - add vectorization support for acos/asin/atan and cosh/sinh/tanh intrinsics (#106584)
Show fallback cases in amdlibm tests where it doesn't have that specific op
2024-08-30 16:49:23 +01:00
Paul Walker
ce5620ba9a
[LLVM][VPlan] Pick more optimal initial value for VPBlend. (#104019)
By choosing an initial value whose mask is only used by the blend we can
remove the need for the mask entirely.
2024-08-30 13:30:23 +01:00
Florian Hahn
f0e34f3818
[VPlan] Don't skip optimizable truncs in planContainsAdditionalSimps.
A optimizable cast can also be removed by VPlan simplifications. Remove
the restriction from planContainsAdditionalSimplifications, as this
causes it to miss relevant simplifications, triggering false positives
for the cost decision verification.

Also adds debug output for printing additional cost-precomputations.

Fixes https://github.com/llvm/llvm-project/issues/106641.
2024-08-30 11:29:30 +01:00
Florian Hahn
c4906588ce
[VPlan] Use skipCostComputation when pre-computing induction costs.
This ensures we skip any instructions identified to be ignored by the
legacy cost model as well. Fixes a divergence between legacy and
VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/106417.
2024-08-29 21:20:00 +01:00
Simon Pilgrim
81acc84997 [LoopVectorize][X86] amdlibm-calls.ll - add 2/4/8/16 vector widths test checks for fallback to llvm intrinsics
Check for cases where there isn't a amdlib call but it still vectorises the math call
2024-08-29 17:31:55 +01:00
Simon Pilgrim
2f95298727 [LoopVectorize][X86] amdlibm-calls.ll - add additional 2/4/8/16 vector widths test checks
This should cover most amdlibm functions, but still not added every VF combo (e.g. 2f32/16f64 often vectorises to the llvm intrinsic for that vector type)
2024-08-29 14:27:31 +01:00
Simon Pilgrim
c57abc66e2 [LoopVectorize][X86] amdlibm-calls.ll - cleanup test checks for 2/4/8/16 vector widths
This cleans up the existing tests and shows the gaps in the test checks (for instance we're often testing VF4 + VF16 but not VF8 even though amdlibm supports it).
2024-08-29 14:27:31 +01:00
Florian Hahn
0a272d3a17
[LV] Use SCEV to analyze second operand for cost query.
Improve operand analysis using SCEV for cost purposes. This fixes a
divergence between legacy and VPlan-based cost-modeling after
533e6bbd0d34.

Fixes https://github.com/llvm/llvm-project/issues/106248.
2024-08-29 12:08:27 +01:00
Florian Hahn
7912abe149
[LV] Add extra tests with interleave groups and different insert pos.
Add additional test coverage for interleave groups with different insert
positions.
2024-08-28 19:35:31 +01:00
Florian Hahn
4b84288f00
[VPlan] Pass live-ins used as exit values straight to live-out.
Live-ins that are used as exit values don't need to be extracted, they
can be passed through directly. This fixes a crash when trying to
extract from a live-in.

Fixes https://github.com/llvm/llvm-project/issues/106257.
2024-08-28 19:12:05 +01:00
Maciej Gabka
95d2d1cba0
Move stepvector intrinsic out of experimental namespace (#98043)
This patch is moving out stepvector intrinsic from the experimental
namespace.

This intrinsic exists in LLVM for several years now, and is widely used.
2024-08-28 12:48:20 +01:00
Mel Chen
dfde1a7232
[LV][NFC] Update and clean up the test case LoopVectorize/RISCV/inloop-reduction.ll. (#102907) 2024-08-28 17:46:58 +08:00
Florian Hahn
d43a80936d
Revert "[LAA] Remove loop-invariant check added in 234cc40adc61."
This reverts commit a80053322b765eec93951e21db490c55521da2d8.

The new asserts exposed an underlying issue where the expanded bounds
could wrap, causing the parts of the code to incorrectly determine that
accesses do not overlap.

Reproducer below based on @mstorsjo's test case.

opt -passes='print<access-info>'

target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"

define i32 @j(ptr %P, i32 %x, i32 %y) {
entry:
  %gep.P.4 = getelementptr inbounds nuw i8, ptr %P, i32 4
  %gep.P.8 = getelementptr inbounds nuw i8, ptr %P, i32 8
  br label %loop

loop:
  %1 = phi i32 [ %x, %entry ], [ %sel, %loop.latch ]
  %iv = phi i32 [ %y, %entry ], [ %iv.next, %loop.latch ]
  %gep.iv = getelementptr inbounds i64, ptr %gep.P.8, i32 %iv
  %l = load i32, ptr %gep.iv, align 4
  %c.1 = icmp eq i32 %l, 3
  br i1 %c.1, label %loop.latch, label %if.then

if.then:                                          ; preds = %for.body
  store i64 0, ptr %gep.iv, align 4
  %l.2 = load i32, ptr %gep.P.4
  br label %loop.latch

loop.latch:
  %sel = phi i32 [ %l.2, %if.then ], [ %1, %loop ]
  %iv.next = add nsw i32 %iv, 1
  %c.2 = icmp slt i32 %iv.next, %sel
  br i1 %c.2, label %loop, label %exit

exit:
  %res = phi i32 [ %iv.next, %loop.latch ]
  ret i32 %res
}
2024-08-27 11:55:47 +01:00
Florian Hahn
a80053322b
[LAA] Remove loop-invariant check added in 234cc40adc61.
234cc40adc61 introduced a loop-invariance check to limit the
compile-time impact of the newly added checks.

This patch removes the restriction and avoids extra compile-time impact
by sinking the check to exits where we would return an unknown
dependence. This notably reduces the amount the extra checks are
executed while not missing out on any improvements from them.

https://llvm-compile-time-tracker.com/compare.php?from=33e7cd6ff23f6c904314d17c68dc58168fd32d09&to=7c55e66d4f31ce8262b90c119a8e84e1f9515ff1&stat=instructions:u
2024-08-26 10:24:00 +01:00
Florian Hahn
533e6bbd0d
[VPlan] Simplify live-ins if they are SCEVConstant.
The legacy cost model in some parts checks if any of the operands are
constants via SCEV. Update VPlan construction to replace live-ins that
are constants via SCEV with such constants. This means VPlans (and
codegen) reflects what we computing the cost of and removes another case
where the legacy and VPlan cost model diverged.

Fixes https://github.com/llvm/llvm-project/issues/105722.
2024-08-26 09:15:58 +01:00
Florian Hahn
885c4365c1
[VPlan] Skip branches marked as dead in cost precomputation.
Don't consider the cost of branches marked to be skipped in VPlan cost
pre-computation. Those aren't included in the legacy cost, so they
should not be included in the VPlan cast.
2024-08-23 15:58:29 +01:00
Florian Hahn
cb4efe1d07
[VPlan] Don't trigger VF assertion if VPlan has extra simplifications.
There are cases where VPlans contain some simplifications that are very
hard to accurately account for up-front in the legacy cost model. Those
cases are caused by un-simplified inputs, which trigger the assert
ensuring both the legacy and VPlan-based cost model agree on the VF.

To avoid false positives due to missed simplifications in general, only
trigger the assert if the chosen VPlan doesn't contain any additional
simplifications.

Fixes https://github.com/llvm/llvm-project/issues/104714.
Fixes https://github.com/llvm/llvm-project/issues/105713.
2024-08-22 21:38:06 +01:00
Florian Hahn
4e04286d61
[VPlan] Only use selectVectorizationFactor for cross-check (NFCI). (#103033)
Use getBestVF to select VF up-front and only use
selectVectorizationFactor to get the VF legacy VF to check the
vectorization decision matches the VPlan-based cost model.

PR: https://github.com/llvm/llvm-project/pull/103033
2024-08-21 13:09:01 +02:00
Nikita Popov
a105877646
[InstCombine] Remove some of the complexity-based canonicalization (#91185)
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.

However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.

The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.

For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
2024-08-21 12:02:54 +02:00
Florian Hahn
99741ac285
[VPlan] Introduce explicit ExtractFromEnd recipes for live-outs. (#100658)
Introduce explicit ExtractFromEnd recipes to extract the final values
for live-outs instead of implicitly extracting in VPLiveOut::fixPhi.

This is a follow-up to the recent changes of modeling extracts for
recurrences and consolidates live-out extract creation for fixed-order
recurrences at a single place: addLiveOutsForFirstOrderRecurrences.

It is also in preparation of replacing VPLiveOut with VPIRInstructions
wrapping the original scalar phis.

PR: https://github.com/llvm/llvm-project/pull/100658
2024-08-21 10:06:44 +02:00
Florian Hahn
b8dccb7d56
[VPlan] Emit note when UserVF > MaxUserVF (NFCI).
As suggested in https://github.com/llvm/llvm-project/pull/103033, add a
remark when the UserVF is ignored due to it being larger than MaxUserVF.

Only changes behavior of diagnostic/debug output.
2024-08-19 12:40:20 +01:00
Florian Hahn
e9e3a183d6
[LV] Don't cost branches and conditions to empty blocks.
Update the legacy cost model skip branches with successors blocks
that are empty or only contain dead instructions, together with their
conditions. Such branches and conditions won't result in any
generated code and will be cleaned up by VPlan transforms.

This fixes a difference between the legacy and VPlan-based cost model.

When running LV in its usual pipeline position, such dead blocks should
already have been cleaned up, but they might be generated manually or by
fuzzers.

Fixes https://github.com/llvm/llvm-project/issues/100591.
2024-08-18 12:51:17 +01:00
Florian Hahn
42555cdba4
[VPlan] Run VPlan optimizations on plans in native path.
Update buildVPlans (used in native path) to also run general VPlan
optimizations in another small step to align both codepaths.
2024-08-15 13:05:51 +01:00
Volodymyr Vasylkun
8320b97ab9
[InstCombine] Fold an unsigned comparison of add nsw X, C with a constant into a signed comparison (#103480)
Given an unsigned integer comparison of `add nsw X, C1` with some
constant `C2` we can fold it into a signed comparison of `X` and `C2 -
C1` under the following conditions:
  * There's a `nsw` flag on the addition
  * `C2` is non-negative
  * `X + C1` is non-negative
  * `C2 - C1` is non-negative
2024-08-14 15:31:19 +01:00
Florian Hahn
3efcc8ec7d
[LV] Add test where diff checks not used when re-trying with RT checks. 2024-08-14 14:19:35 +01:00
Paul Walker
9e318bac5b [LLVM] Regenerate some test outputs for llvm/test/Transforms/LoopVectorize. 2024-08-14 10:59:46 +00:00
Florian Hahn
31f593eb95
[LAA] Also clear DiffChecks in LAI::reset().
DiffChecks will get populated twice when re-trying with runtime checks.
Without clearing it like the regular Checks vector, it will contain some
duplicates and the order the checks are created may not match the order
the checks have been queued when re-trying.
2024-08-14 10:19:29 +01:00
Florian Hahn
1360b9d412
[LV] Add test for diff check creation order.
Add a test where diff checks are generated initial and then re-generated
when re-trying with runtime checks.

At the moment, the order doesn't match the order they are created in, as
the DiffChecks field in LAI isn't cleared as other fields holding
runtime checks.
2024-08-14 10:17:00 +01:00
Madhur Amilkanthwar
b73771cf0f
[AArch64] Increase scatter overhead on Neoverse-V2 (#101296)
This patch increases scatter overhead on Neoverse-V2 to 13. This
benefits s128 kernel from TSVC_2 test suite.
SPEC 17, RAJAPerf, and Sptter are unaffected by this patch.

This patch boosts s128 kernel's performance from TSVC test suite by about
40% as this enables vectorization. Also, handle minor code refactoring
for gather related part.
2024-08-14 10:12:40 +05:30
Nikita Popov
306b9c7b48
[SCEV] Handle more add/addrec mixes in computeConstantDifference() (#101999)
computeConstantDifference() can currently look through addrecs with
identical steps, and then through adds with identical operands (apart
from constants).

However, it fails to handle minor variations, such as two nested add
recs, or an outer add with an inner addrec (rather than the other way
around).

This patch supports these cases by adding a loop over the
simplifications, limited to a small number of iterations. The motivation
is the same as in #101339, to make
computeConstantDifference() powerful enough to replace existing uses of
`dyn_cast<SCEVConstant>(getMinusSCEV())` with it. Though as the IR test
diff shows, other callers may also benefit.
2024-08-13 11:01:39 +02:00
Florian Hahn
2ab910c08c
[LV] Check pointer user are in loop when checking for uniform pointers.
Widening decisions are not set for users outside the loop. Avoid
crashing by only calling isVectorizedMemAccessUse for users in the loop.

Fixes https://github.com/llvm/llvm-project/issues/102934.
2024-08-13 09:23:44 +01:00
Florian Hahn
c7a44ec031
[VPlan] Check successors in VPlan to check if scalar epi required (NFC)
Now that the branches to the scalar epilogue are modeled in VPlan
directly, check the VPlan to see if a scalar epilogue is required.

Preparation for https://github.com/llvm/llvm-project/pull/100658.
2024-08-12 15:33:52 +01:00
Florian Hahn
cd08fadd03
[LV] Include chains feeding inductions in cost precomputation.
Include chain of ops feeding inductions in cost precomputation for
inductions, not just the induction increment. In VPlan, those
instructions will be cleaned up, as both phi and increment are generated
by VPWidenIntOrFpInductionRecipe independently.

Fixes https://github.com/llvm/llvm-project/issues/101337.
2024-08-12 14:45:43 +01:00
Florian Hahn
db0603cb7b
[LV] Only OR unique edges when creating block-in masks.
This removes redundant ORs of matching masks.

Follow-up to f0df4fbd0c7b to reduce the number of redundant ORs for
masks.
2024-08-12 10:17:40 +01:00
Florian Hahn
55d7e59023
[VPlan] Replace hard-coded value number in test with pattern.
Make test more robust w.r.t. future changes.
2024-08-12 09:36:12 +01:00
Florian Hahn
5a42a677aa
[VPlan] Mark VPVectorPointer as only using the first part of the ptr.
VPVectorPointerRecipe only uses the first part of the pointer operand,
so mark it accordingly.

Follow-up suggested as part of
https://github.com/llvm/llvm-project/pull/99808.
2024-08-12 08:46:55 +01:00
Farzon Lotfi
efc6b50d2d
[LoopVectorize][X86][AMDLibm] Add Missing AMD LibM trig vector intrinsics (#101125)
Adding the following linked to their docs:
-
[amd_vrs16_acosf](9c0b67293b/scripts/libalm.def (L221))
-
[amd_vrd2_cosh](9c0b67293b/scripts/libalm.def (L124))
-
[amd_vrs16_tanhf](9c0b67293b/scripts/libalm.def (L224))
2024-08-11 22:11:09 -04:00
Florian Hahn
60680f7181
[LV] Handle SwitchInst in ::isPredicatedInst.
After f0df4fbd0c7b, isPredicatedInst needs to handle SwitchInst as well.
Handle it the same as BranchInst.

This fixes a crash in the newly added test and improves the results for
one of the existing tests in predicate-switch.ll

Should fix https://lab.llvm.org/buildbot/#/builders/113/builds/2099.
2024-08-11 20:56:58 +01:00
Florian Hahn
f0df4fbd0c
[LV] Support generating masks for switch terminators. (#99808)
Update createEdgeMask to created masks where the terminator in Src is a
switch. We need to handle 2 separate cases:

1. Dst is not the default desintation. Dst is reached if any of the
cases with destination == Dst are taken. Join the conditions for each
case where destination == Dst using a logical OR.
2. Dst is the default destination. Dst is reached if none of the cases
with destination != Dst are taken. Join the conditions for each case
where the destination is != Dst using a logical OR and negate it.

Edge masks are created for every destination of cases and/or 
default when requesting a mask where the source is a switch.

Fixes https://github.com/llvm/llvm-project/issues/48188.

PR: https://github.com/llvm/llvm-project/pull/99808
2024-08-11 20:38:36 +02:00
Florian Hahn
5286656609
[LV] Regenerate check lines in preparation for #99808.
Regenerate check lines for test to avoid unrelated changes in
https://github.com/llvm/llvm-project/pull/99808.
2024-08-11 15:06:05 +01:00