626 Commits

Author SHA1 Message Date
Shih-Po Hung
ffcff2f465
[VPlan][NFC] Fix the value name of VECTOR_GEP (#107544)
This patch passes the string `"vector.gep"` to CreateGEP instead of
CreateMul.
2024-09-18 19:22:36 +08:00
Florian Hahn
012dbec604
[VPlan] Handle ForceTargetInstructionCost in during precomputeCosts.
Make sure ForceTargetInstruction is respected in precomputeCosts.
2024-09-15 10:53:43 +01:00
Florian Hahn
ea83e1c05a
[LV] Assign cost to all interleave members when not interleaving.
At the moment, the full cost of all interleave group members is assigned
to the instruction at the group's insert position, even if the decision
was to not form an interleave group.

This can lead to inaccurate cost estimates, e.g. if the instruction at
the insert position is dead. If the decision is to not vectorize but
scalarize or scather/gather, then the cost will be to total cost for all
members. In those cases, assign individual the cost per member, to more
closely reflect to choice per instruction.

This fixes a divergence between legacy and VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/108098.
2024-09-11 21:04:34 +01:00
Florian Hahn
a794ee4559
[VPlan] Add VPValue for VF, use it for VPWidenIntOrFpInductionRecipe. (#95305)
Similar to VFxUF, also add a VF VPValue to VPlan and use it to get the
runtime VF in VPWidenIntOrFpInductionRecipe. Code for VF is only
generated if there are users of VF, to avoid unnecessary test changes.

PR: https://github.com/llvm/llvm-project/pull/95305
2024-09-10 10:41:35 +01:00
Florian Hahn
aa158bf402
[LV] Update tests to replace some code with loop varying instructions.
Update some tests with loop-invariant instructions, where hoisting them
out of the loop changes the vectorization decision. This should preserve
their original spirit when making further improvements.
2024-09-09 14:10:12 +01:00
Florian Hahn
3bd161e98d
[LV] Honor forced scalars in setVectorizedCallDecision.
Similarly to dd94537b4, setVectorizedCallDecision also did not consider
ForcedScalars. This lead to VPlans not reflecting the decision by the
legacy cost model (cost computation would use scalar cost, VPlan would
have VPWidenCallRecipe).

To fix this, check if the call has been forced to scalar in
setVectorizedCallDecision.

Note that this requires moving setVectorizedCallDecision after
collectLoopUniforms (which sets ForcedScalars). collectLoopUniforms does
not depend on call decisions and can safely be moved.

Fixes https://github.com/llvm/llvm-project/issues/107051.
2024-09-03 21:06:32 +01:00
Philip Reames
1fbb6b4efc
[LV] Prefer FLT_MIN/MAX for fmin/fmax reductions with ninf (#107141)
Analogous to 2c7786e94a1058bd4f96794a1d4f70dcb86e5cc5, cleanup a case
where the vectorizer is emitting a non-canonical identity value given
the available flags. We use largest/smallest value during ISEL, and VP
expansion, but not during vectorization.

Since the fmin/fmax/fminimum/fmaximum intrinsics don't require a start
value, this difference is only visible when masking of inactive lanes is
required.

Primary motivation of this change is simply to remove a difference
between version of code which reason about the identity value of a
reduction so I can kill all but one off.

In review, it was pointed out that this is actually a functional fix as well. 
The old code used inf on a noinf reduction instruction - whose
result is poison!  That wasn't the intent of the code.
2024-09-03 12:21:54 -07:00
Philip Reames
2c7786e94a
Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770)
This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In general, we prefer 0.0
over -0.0 as the identity value for an fadd. We use that value in
several places, but don't in others. So, let's be consistent and use the
same identity (when nsz allows) everywhere.

This creates a bunch of test churn, but due to 924907bc6, most of that
churn doesn't actually indicate a change in codegen. The exception is
that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd
reductions. Or said differently, it allows the neutral value of an
ordered fadd reduction to be 0.0.
2024-09-03 09:16:37 -07:00
Florian Hahn
dd94537b40
[LV] Update call widening decision when scalarzing calls.
collectInstsToScalarize may decide to scalarize a call. If so, we have
to update the widening decision for the call, otherwise the call won't
be scalarized as expected during VPlan construction.

This issue was uncovered by f82543d509.
2024-09-03 14:12:41 +01:00
Florian Hahn
954ed05c10
[VPlan] Simplify MUL operands at recipe construction.
This moves the logic to create simplified operands using SCEV to MUL
recipe creation. This is needed to match the behavior of the legacy's cost
model. TODOs are to extend to other opcodes and move to a transform.

Note that this also restricts the number of SCEV simplifications we
apply to more precisely match the cases handled by the legacy cost
model.

Fixes https://github.com/llvm/llvm-project/issues/107015.
2024-09-02 21:25:31 +01:00
Florian Hahn
50a02e7c68
[VPlan] Pass intrinsic inst to TTI in VPWidenCallRecipe::computeCost.
Follow-up to 9ccf825, adjust computeCost to also pass IntrinsicInst to
TTI if available, as there are multiple places in TTI which use the
IntrinsicInst.

Fixes https://github.com/llvm/llvm-project/issues/107016.
2024-09-02 20:47:37 +01:00
Florian Hahn
b0de7fa466
[VPlan] Use op from underlying call in computeCost if needed.
This fixes a divergence between legacy and VPlan-based cost model, e.g.
if one of the operands has an first-order recurrence phi as operand.
2024-09-02 14:00:10 +01:00
Yingwei Zheng
380fa875ab
[InstCombine] Replace all dominated uses of condition with constants (#105510)
This patch replaces all dominated uses of condition with true/false to
improve context-sensitive optimizations. It eliminates a bunch of
branches in llvm-opt-benchmark.

As a side effect, it may introduce new phi nodes in some corner cases.
See the following case:
```
define i1 @test(i1 %cmp, i1 %cond) {
entry:
   br i1 %cond, label %bb1, label %bb2
bb1:
   br i1 %cmp, label %if.then, label %if.else
if.then:
   br %bb2
if.else:
   br %bb2
bb2:
  %res = phi i1 [%cmp, %entry], [%cmp, %if.then], [%cmp, %if.else]
  ret i1 %res
}
```
It will be simplified into:
```
define i1 @test(i1 %cmp, i1 %cond) {
entry:
   br i1 %cond, label %bb1, label %bb2
bb1:
   br i1 %cmp, label %if.then, label %if.else
if.then:
   br %bb2
if.else:
   br %bb2
bb2:
  %res = phi i1 [%cmp, %entry], [true, %if.then], [false, %if.else]
  ret i1 %res
}
```

I am planning to fix this in late pipeline/CGP since this problem exists
before the patch.
2024-09-01 09:49:23 +08:00
Philip Reames
4b553f4916 Regen a bunch of vectorizer tests to avoid naming churn in upcoming review 2024-08-30 10:13:02 -07:00
Paul Walker
ce5620ba9a
[LLVM][VPlan] Pick more optimal initial value for VPBlend. (#104019)
By choosing an initial value whose mask is only used by the blend we can
remove the need for the mask entirely.
2024-08-30 13:30:23 +01:00
Maciej Gabka
95d2d1cba0
Move stepvector intrinsic out of experimental namespace (#98043)
This patch is moving out stepvector intrinsic from the experimental
namespace.

This intrinsic exists in LLVM for several years now, and is widely used.
2024-08-28 12:48:20 +01:00
Florian Hahn
885c4365c1
[VPlan] Skip branches marked as dead in cost precomputation.
Don't consider the cost of branches marked to be skipped in VPlan cost
pre-computation. Those aren't included in the legacy cost, so they
should not be included in the VPlan cast.
2024-08-23 15:58:29 +01:00
Nikita Popov
a105877646
[InstCombine] Remove some of the complexity-based canonicalization (#91185)
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.

However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.

The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.

For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
2024-08-21 12:02:54 +02:00
Florian Hahn
42555cdba4
[VPlan] Run VPlan optimizations on plans in native path.
Update buildVPlans (used in native path) to also run general VPlan
optimizations in another small step to align both codepaths.
2024-08-15 13:05:51 +01:00
Paul Walker
9e318bac5b [LLVM] Regenerate some test outputs for llvm/test/Transforms/LoopVectorize. 2024-08-14 10:59:46 +00:00
Madhur Amilkanthwar
b73771cf0f
[AArch64] Increase scatter overhead on Neoverse-V2 (#101296)
This patch increases scatter overhead on Neoverse-V2 to 13. This
benefits s128 kernel from TSVC_2 test suite.
SPEC 17, RAJAPerf, and Sptter are unaffected by this patch.

This patch boosts s128 kernel's performance from TSVC test suite by about
40% as this enables vectorization. Also, handle minor code refactoring
for gather related part.
2024-08-14 10:12:40 +05:30
Florian Hahn
5a42a677aa
[VPlan] Mark VPVectorPointer as only using the first part of the ptr.
VPVectorPointerRecipe only uses the first part of the pointer operand,
so mark it accordingly.

Follow-up suggested as part of
https://github.com/llvm/llvm-project/pull/99808.
2024-08-12 08:46:55 +01:00
Matt Arsenault
4f067dc467
TTI: Fix special casing vectorization costs of saturating add/sub (#97463) 2024-08-06 17:33:52 +04:00
Paul Walker
7775a4882d
[LLVM][TTI][SME] Allow optional auto-vectorisation for streaming functions. (#101679)
The command line option enable-scalable-autovec-in-streaming-mode is
used to enable scalable vectors but the same check is missing from
enableScalableVectorization, which is blocking auto-vectorisation.
2024-08-05 11:25:44 +01:00
Florian Hahn
66ce4f771e
[VPlan] Port invalid cost remarks to VPlan. (#99322)
This patch moves the logic to create remarks for instructions with
invalid costs to work on recipes and decoupling it from
selectVectorizationFactor. This is needed to replace the remaining uses
of selectVectorizationFactor with getBestPlan using the VPlan-based cost
model.

The current implementation iterates over all VPlans and their recipes
again, to find recipes with invalid costs, which is more work but will
only be done when remarks for LV are enabled. Once the remaining uses of
selectVectorizationFactor are retired, we can collect VPlans with
invalid costs as part of getBestPlan if we want to optimize the remarks
case a bit, at the cost of adding additional complexity.

PR: https://github.com/llvm/llvm-project/pull/99322
2024-07-27 12:52:12 +01:00
Florian Hahn
72532c9219
[LV] Don't predicate divs with invariant divisor when folding tail (#98904)
When folding the tail, at least one of the lanes must execute
unconditionally. If the divisor is loop-invariant no predication is
needed, as predication would not prevent the divide-by-0 on the executed
lane.

Depends on https://github.com/llvm/llvm-project/pull/98892.

PR: https://github.com/llvm/llvm-project/pull/98904
2024-07-25 12:21:09 +01:00
Florian Hahn
c8c0b18b5d
[LV] Update tests to not have dead interleave groups.
Update existing tests with dead interleave groups by adding users. This
ensures the tests keep testing what they were intended to test with a
planned change to skip unused instructions in cost computations.
2024-07-21 14:03:40 +01:00
Florian Hahn
710dab6e18
[VPlan] Remove VPPredInstPHIRecipes without users after region merging.
After merging replicate regions, VPPredInstPHIRecipes may become unused.
Remove them directly instead of moving them to the merged region.
2024-07-20 13:21:32 +01:00
Farzon Lotfi
e2f463b5b6
[aarch64] Add hyperbolic and arc trig intrinsic lowering (#98937)
## The change(s)
- `VecFuncs.def`: define intrinsic to  sleef/armpl mapping
- `LegalizerHelper.cpp`: add missing `fewerElementsVector` handling for
the new trig intrinsics
- `AArch64ISelLowering.cpp`: Add arch64 specializations for lowering
like neon instructions
- `AArch64LegalizerInfo.cpp`: Legalize the new trig intrinsics. aarch64
has specail legalization requirments in `AArch64LegalizerInfo.cpp`. If
we redirect the clang builtin without handling this we will break the
aarch64 compiler

## History
This change is part of an implementation of
https://github.com/llvm/llvm-project/issues/87367's investigation on
supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This change adds wasm lowering cases for `acos`, `asin`, `atan`, `cosh`,
`sinh`, and `tanh`.

https://github.com/llvm/llvm-project/issues/70079
https://github.com/llvm/llvm-project/issues/70080
https://github.com/llvm/llvm-project/issues/70081
https://github.com/llvm/llvm-project/issues/70083
https://github.com/llvm/llvm-project/issues/70084
https://github.com/llvm/llvm-project/issues/95966

## Why is aarch64 needed
The last step is to redirect the `acos`, `asin`, `atan`, `cosh`, `sinh`,
and `tanh` to emit the intrinsic. We can't emit the intrinsic without
the intrinsics becoming legal for aarch64 in `AArch64LegalizerInfo.cpp`
2024-07-19 10:18:23 -04:00
Florian Hahn
008df3cf85
[LV] Check isPredInst instead of isScalarWithPred in uniform analysis. (#98892)
Any instruction marked as uniform will result in a uniform
VPReplicateRecipe. If it requires predication, it will be placed in a
replicate region, even if isScalarWithPredication returns false.

Check isPredicatedInst instead of isScalarWithPredication to avoid
generating uniform VPReplicateRecipes placed inside a replicate region.
This fixes an assertion when using scalable VFs.

Fixes https://github.com/llvm/llvm-project/issues/80416. 
Fixes https://github.com/llvm/llvm-project/issues/94328.
Fixes https://github.com/llvm/llvm-project/issues/99625.

PR: https://github.com/llvm/llvm-project/pull/98892
2024-07-19 12:02:25 +01:00
Florian Hahn
270f5e42b8
[LV] Add tests where uniform recipe gets predicated for scalable VFs.
Currently the tests crash, due to a VPReplicateRecipe getting predicated
for scalable vectors.

Precommits tests for https://github.com/llvm/llvm-project/pull/98892.

Test cases for
 * https://github.com/llvm/llvm-project/issues/80416 and
 * https://github.com/llvm/llvm-project/issues/94328
2024-07-19 09:21:40 +01:00
Sjoerd Meijer
c5329c827a
[LV][AArch64] Prefer Fixed over Scalable if cost-model is equal (Neoverse V2) (#95819)
For the Neoverse V2 we would like to prefer fixed width over scalable
    vectorisation if the cost-model assigns an equal cost to both for certain
    loops. This improves 7 kernels from TSVC-2 and several production kernels by
    about 2x, and does not affect SPEC21017 INT and FP. This also adds a new TTI
    hook that can steer the loop vectorizater to preferring fixed width
    vectorization, which can be set per CPU. For now, this is only enabled for the
    Neoverse V2.

    There are 3 reasons why preferring NEON might be better in the case the
    cost-model is a tie and the SVE vector size is the same as NEON (128-bit):
    architectural reasons, micro-architecture reasons, and SVE codegen reasons. The
    latter will be improved over time, so the more important reasons are the former
    two. I.e., (micro) architecture reason is the use of LPD/STP instructions which
    are not available in SVE2 and it avoids predication.

    For what it is worth: this codegen strategy to generate more NEON is inline
    with GCC's codegen strategy, which is actually even more aggressive in
    generating NEON when no predication is required. We could be smarter about the
    decision making, but this seems to be a first good step in the right direction,
    and we can always revise this later (for example make the target hook more
    general).
2024-07-17 10:46:28 +01:00
Florian Hahn
4469a1e587
[LV] Add missing check lines in vector.ph in tests.
Match all instructions in vector.ph in sve-inductions-unusual-types.ll.

This should help to better show the impact of
https://github.com/llvm/llvm-project/pull/95305.
2024-07-16 10:45:53 +01:00
Dinar Temirbulatov
31d4c97506
[LoopVectorize] LLVM fails to vectorise loops with multi-bool varables (#89226)
This change allows to consider compare instructions in the loop with
multiple use inside the loop and outside.

This change allows to vectorise this loop:
int foo(float* a, int n) {
  _Bool any = 0;
  _Bool all = 1;
  for (int i = 0; i < n; i++) {
    if (a[i] < 0.0f) {
      any = 1;
    } else {
      all = 0;
    }
  }
  return all ? 1 : any ? 2 : 3;
}
2024-07-15 20:21:50 +01:00
Florian Hahn
fc9cd3272b
[VPlan] Don't add live-outs for IV phis.
Resume and exit values for inductions are currently still created
outside of VPlan and independent of the induction recipes. Don't add
live-outs for now, as the additional unneeded users can pessimize other
anlysis.

Fixes https://github.com/llvm/llvm-project/issues/98660.
2024-07-14 20:49:03 +01:00
Graham Hunter
22a7f6dcc4
Revert "[LV] Autovectorization for the all-in-one histogram intrinsic" (#98493)
Reverts llvm/llvm-project#91458 to deal with post-commit reviewer
requests.
2024-07-11 16:39:30 +01:00
Florian Hahn
9a5a8731e7
[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. (#94760)
This patch introduces a new ResumePhi VPInstruction which creates a phi
in a leaf block of a VPlan. The first use is to create the phi node for
fixed-order recurrence resume values in the scalar preheader.

The VPInstruction takes 2 operands: 1) the incoming value from the
middle-block and a default value to be used for all other incoming
blocks.

In follow-up changes, it will also be used to create phis for reduction
and induction resume values.

Depends on https://github.com/llvm/llvm-project/pull/92651

PR: https://github.com/llvm/llvm-project/pull/94760
2024-07-11 16:08:04 +01:00
Graham Hunter
1860fd049e
[LV] Autovectorization for the all-in-one histogram intrinsic (#91458)
This patch implements limited loop vectorization support for the 'all-in-one' histogram intrinsic. The feature is disabled by default, and when enabled will only vectorize if there are no other users of values in the gather-modify-scatter sequence.
2024-07-11 15:33:30 +01:00
Florian Hahn
7346e7cc47
[VPlan] Update HCFG builder after 72937203dd3b to fix leak.
Update buildPlainCFG to re-use the vector and latch VPBBs created as
part of the initial skeleton in 72937203dd3b.

This should fix the leak sanitizer failure discovered by
https://lab.llvm.org/buildbot/#/builders/52/builds/619.
2024-07-09 15:28:43 +01:00
Florian Hahn
99d6c6d936
[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently
introduced VPIRBasicBlock.

Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.

After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.

This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.

Depends on https://github.com/llvm/llvm-project/pull/92525

PR: https://github.com/llvm/llvm-project/pull/92651
2024-07-05 10:08:42 +01:00
Florian Hahn
06079233f8
[VPlan] Return std::nullopt early if plans are empty.
Fixes a crash caused by abf5969.
2024-06-27 12:25:59 +01:00
David Green
352a836176
[InstCombine] Canonicalize non-i8 gep of mul to i8 (#96606)
This is a small canonicalization for `gep i32, p, (mul x, C)` -> `gep
i8, p, (mul x, C*4)`, so that the mul can combine both of the constant
multiplications, and we take a small step towards canonicalizing more
geps to i8.

It currently doesn't attempt to check for multiple uses on the mul, but
that should be possible if it sounds better. Let me know what you think
of the idea in general.
2024-06-26 14:25:54 +01:00
Florian Hahn
8681bb8bed
[LV] Add additional test coverage for cost modeling.
Add missing tests uncovered by
https://github.com/llvm/llvm-project/pull/92555.

Includes test for https://github.com/llvm/llvm-project/issues/96294 and
https://github.com/llvm/llvm-project/issues/96328
2024-06-26 10:18:01 +01:00
David Sherwood
ec9ce89a08
[LoopVectorize] Fix build issue caused by #95920 (#96647) 2024-06-25 15:51:32 +01:00
David Sherwood
2dd4167a09
[LoopVectorize][AArch64] Add limited support for scalable vectorisation of i1 types (#95920)
Previously isElementTypeLegalForScalableVector returned false for i1
types, which also prevented vectorisation of loops with i1 reductions.
This is overkill - we only need to disable vectorisation for loads
and/or stores of i1 types. I've added i1 as a legal type, but changed
the cost model to return an invalid cost for loads and stores.
2024-06-25 15:04:24 +01:00
Sander de Smalen
738533c84a
[AArch64] Consider streaming mode in TTI interfaces for vectorization. (#96305)
At the moment, vectorization is only enabled in streaming(-compatible)
mode when enabled through an option. But the interfaces should check
more than just 'hasSVE()', because a function with +sme in streaming
mode should also vectorize with the option enabled.

Additionally, a streaming-compatible function should only be able to use
fixed-length autovec if SVE is available, otherwise the vector code will
be scalarised by the backend.
2024-06-24 11:06:16 +01:00
Sander de Smalen
747f9dacfe [AArch64] NFC: Precommit new RUN lines to test sme-vectorize.ll 2024-06-21 13:29:21 +00:00
Florian Hahn
c07be08df5
[LV] Add tail folding test with scalarized store and wide header mask.
Add additional test with salarized store which caused crashes with
earlier versions of https://github.com/llvm/llvm-project/pull/92555.
2024-06-20 17:24:59 +01:00
Florian Hahn
3808ba78de
[VPlan] Model middle block via VPIRBasicBlock. (#95816)
Use VPIRBasicBlock to wrap the middle block and implement patching up
branches in predecessors in VPIRBasicBlock::execute. The IR middle block
is only created after skeleton creation. Initially a regular
VPBasicBlock is created, which will later be replaced by a
VPIRBasicBlock once the middle IR basic block has been created.

Note that this slightly changes the order of instructions created in the
middle block; code generated by recipe execution in the middle block
will now be inserted before the terminator (and in between the compare
to used by the terminator). The original order will be restored in
https://github.com/llvm/llvm-project/pull/92651.


PR: https://github.com/llvm/llvm-project/pull/95816
2024-06-20 13:42:20 +01:00
Florian Hahn
ffc51b966e
[LV] Remove loads from null from pr73894.ll test.
Load from null is UB, load from pointer arg instead.
2024-06-20 10:57:25 +01:00