3059 Commits

Author SHA1 Message Date
Nikita Popov
330cb03269 [LoadStoreVectorizer] Check for guaranteed-to-transfer (PR52950)
Rather than checking for nounwind in particular, make sure the
instruction is guaranteed to transfer execution, which will also
handle non-willreturn calls correctly.

Fixes https://github.com/llvm/llvm-project/issues/52950.
2022-01-03 10:55:47 +01:00
Florian Hahn
6e0a333f71
[LV] Use Builder.CreateVectorReverse directly. (NFC)
IRBuilder::CreateVectorReverse already handles all cases required by
LoopVectorize. It can be used directly instead of reverseVector.
2022-01-02 19:09:30 +00:00
Kazu Hirata
7e163afd9e Remove redundant void arguments (NFC)
Identified by modernize-redundant-void-arg.
2022-01-02 10:20:19 -08:00
Florian Hahn
b1a333f0fe
[VPlan] Don't consider VPWidenCanonicalIVRecipe phi-like.
VPWidenCanonicalIVRecipe does not create PHI instructions, so it does
not need to be placed in the phi section of a VPBasicBlock.

Also tidies the code so the WidenCanonicalIV recipe and the
compare/lane-masks are created in the header.

Discussed D113223.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116473
2022-01-02 12:48:17 +00:00
Kazu Hirata
fd4808887e [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-01 16:18:18 -08:00
Florian Hahn
7305798049
[VPlan] Remove VPWidenPHIRecipe constructor without start value (NFC).
This was suggested as a separate cleanup in recent reviews.
2022-01-01 13:53:48 +00:00
Florian Hahn
e2f1c4c706
[LV] Turn check for unexpected VF into assertion (NFC).
VF should always be non-zero in widenIntOrFpInduction. Turn check into
assertion.
2021-12-31 13:19:03 +00:00
Alexey Bataev
e0efedd2c3 [SLP][NFC]Fix non-determinism in reordering, NFC.
Need to clear CurrentOrder order mask if it is determined that
extractelements form identity order and need to use a vector-like
construct when iterating over ordered entries in the reorderTopToBottom
function.
2021-12-30 13:10:25 -08:00
Florian Hahn
ba9016a030
[LV] Replace redundant tail-fold check with assert (NFC).
The code path can only be reached when folding the tail, so turn the
check into an assertion.
2021-12-29 19:00:41 +01:00
Florian Hahn
9d297c7894
[VPlan] Add prepareToExecute to set up live-ins (NFC).
This patch adds a new prepareToExecute helper to set up live-ins, so
VPTransformState doesn't need to hold values like TripCount.

This also requires making the trip count operand for ActiveLaneMask
explicit in VPlan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116320
2021-12-28 17:49:47 +01:00
Sanjay Patel
0edf99950e [Analysis] allow caller to choose signed/unsigned when computing constant range
We should not lose analysis precision if an 'add' has both no-wrap
flags (nsw and nuw) compared to just one or the other.

This patch is modeled on a similar construct that was added with
D59386.

I don't think it is possible to expose a problem with an unsigned
compare because of the way this was coded (nuw is handled first).

InstCombine has an assert that fires with the example from:
https://github.com/llvm/llvm-project/issues/52884
...because it was expecting InstSimplify to handle this kind of
pattern with an smax.

Fixes #52884

Differential Revision: https://reviews.llvm.org/D116322
2021-12-28 09:45:37 -05:00
Florian Hahn
c2275278c6
[VPlan] Add abstract base class for header phi recipes (NFC).
Not all header phis widen the phi, e.g. like the new
VPCanonicalIVPHIRecipe in D113223. To let those recipes also inherit
from a phi-like base class, add a more generic VPHeaderPHIRecipe
abstract base class.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116304
2021-12-28 15:37:47 +01:00
Florian Hahn
c66286ed59
[LV] Use specific first-order recurrence recipe as arg type (NFC).
Required for further refactoring in D116304.
2021-12-28 10:58:21 +01:00
Florian Hahn
2e630eabd3
[LV] Sink BTC creation to actual use (NFC).
Suggested separately in D116123.
2021-12-27 11:25:46 +01:00
Florian Hahn
511726c64d
[LV] Move getStepVector out of ILV (NFC).
First step to split up induction handling and move it outside ILV.
Used in D116123 and following.
2021-12-26 21:17:26 +01:00
Kazu Hirata
76f0f1cc5c Use {DenseSet,SetVector,SmallPtrSet}::contains (NFC) 2021-12-24 21:43:06 -08:00
Florian Hahn
ede7c2438f
[VPlan] Create header & latch blocks for skeleton up front (NFC).
By creating the header and latch blocks up front and adding blocks and
recipes in between those 2 blocks we ensure that the entry and exits of
the plan remain valid throughout construction.

In order to avoid test changes and keep printing of the plans the same,
we use the new header block instead of creating a new block on the first
iteration of the loop traversing the original loop.

We also fold the latch into its predecessor.

This is a follow up to a post-commit suggestion in D114586.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D115793
2021-12-22 12:44:25 +00:00
Florian Hahn
c83ef407df
[LV] Adjust comment to say the induction is created in header.
Follow-up suggested post-commit for 1a54889f48fa.
2021-12-22 11:56:40 +00:00
Florian Hahn
1a54889f48
[LV] Ensure WidenCanonicalIVRecipe is always created in header (NFC).
The VPWidenCanonicalIVRecipe must always be created in the phi section
of the header block. Use that block as insert point.
2021-12-21 15:14:48 +00:00
Paul Walker
7c68ed8892 [SVE] Reintroduce -scalable-vectorization=preferred as an alias to "on".
Some buildbots still rely on the experimental flag, so let's keep
it until everything has been migrated to the new "on by default"
state.
2021-12-21 12:54:04 +00:00
Kazu Hirata
500c4b68dc [llvm] Construct SmallVector with iterator ranges (NFC) 2021-12-20 23:43:24 -08:00
Sander de Smalen
b1ff20fd35 [LV] Enable scalable vectorization by default for SVE cores.
The availability of SVE should be sufficient to enable scalable
auto-vectorization.

This patch adds a new TTI interface to query the target what style of
vectorization it wants when scalable vectors are available. For other
targets than AArch64, this currently defaults to 'FixedWidthOnly'.

Differential Revision: https://reviews.llvm.org/D115651
2021-12-20 16:23:29 +00:00
Alexey Bataev
ab9078f3d3 [SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy.
Need to check for the number of the unique non-constant values since the
unique values may include several constants.

Differential Revision: https://reviews.llvm.org/D115939
2021-12-20 07:21:20 -08:00
Alexey Bataev
4459a11f4d Revert "[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy."
This reverts commit fcaf290d0278bb83387e1a1d972c55e08b8c40e3 to fix test
mismatch reported in https://lab.llvm.org/buildbot#builders/117/builds/3531
2021-12-20 07:21:18 -08:00
Florian Hahn
5b362e4c7f
[VPlan] Add Debugloc to VPInstruction.
Upcoming changes require attaching debug locations to VPInstructions,
e.g. adding induction increment recipes in D113223.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D115123
2021-12-20 15:10:41 +00:00
Alexey Bataev
fcaf290d02 [SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy.
Need to check for the number of the unique non-constant values since the
unique values may include several constants.

Differential Revision: https://reviews.llvm.org/D115939
2021-12-20 05:15:01 -08:00
Alexey Bataev
71fe59212c [SLP][NFC]Adjust type in debug output loop.
The ReuseShuffleIndices indeces are integer, not unsigned, need to fix
the type in the debug print loop.
2021-12-17 12:43:01 -08:00
Alexey Bataev
46ad66b817 [SLP][NFC]Use 'llvm::copy' instead of element-by-elemen copying. 2021-12-17 12:07:59 -08:00
Florian Hahn
564d109b35
[LV] Pass VectorHeader block to emitTransformedIndex (NFC).
Pass in the vector header instead of relying on ILV::LoopVectorBody.
This reduces the dependence on state from ILV. Where VPTransformState is
available, State.CFG.PrevBB can be used.
2021-12-17 10:11:16 +00:00
Alexey Bataev
65fc992579 [SLP]Early exit out of the reordering if shuffled/perfect diamond match found.
Need to early exit out of the reordering process if the perfect/shuffled match is found in the operands. Such pattern will result in not profitable reordering because of (false positive) external use of scalars.

Differential Revision: https://reviews.llvm.org/D115811
2021-12-16 11:09:49 -08:00
Florian Hahn
3b35113ff0
[VPlan] Add VPBlockBase::successors() returning an iterator_range (NFC).
This will also be helpful for D115793.
2021-12-16 14:28:50 +00:00
Arthur Eubanks
5a81a60391 [NFC] Remove more calls to getAlignment()
These are deprecated and should be replaced with getAlign().

Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.
2021-12-15 14:40:57 -08:00
Alexey Bataev
6f2e087631 [SLP]Do not represent splats as node with the reused scalars.
No need to represent splats as a node with the reused scalars, it may
increase the cost (currently pass just ignores extra shuffle cost and it
is still not correct).

Differential Revision: https://reviews.llvm.org/D115800
2021-12-15 06:33:11 -08:00
Alexey Bataev
bd05376986 [SLP]Improve multinode analysis.
Changes the preliminary multinode analysis:
1. Introduced scores for reversed loads/extractelements.
2. Improved shallow score calculation.
3. Lowered the cost of external uses (no need to consider it several times, just ones).
4. The initial lane for analysis is the one with the minimal possible
   reorderings.

These changes in general shall reduce compile time and improve the
reordering in many cases.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D101109
2021-12-14 06:01:52 -08:00
Alexey Bataev
e5b191a433 [SLP]Improve/fix reodering for gather nodes with extractelements/undefs.
If the gather node is a mix of undefvalues and exractelement
instructions, need to take the ordering for such nodes into account too.
It allows to reorder some (sub)trees and remove some extra shuffles,
improving overall vectorization.
Also, outlined common functionality into a separate function.

Differential Revision: https://reviews.llvm.org/D115358
2021-12-13 10:59:38 -08:00
Nikita Popov
432c41ebe9 [SLP] Avoid getPointerElementType() call
Use the load result type instead of the element type of the load
pointer operand.
2021-12-13 15:46:13 +01:00
Evgeniy Brevnov
7002125cff [LV][NFC] Fix debug message to print out resulting clamped VF 2021-12-13 18:54:05 +07:00
Florian Hahn
e90630e5a5
[VPlan] Remove unused createNaryOp (NFC). 2021-12-13 11:11:00 +00:00
Evgeniy Brevnov
2025e0985c [LV] Make sure VF doesn't exceed compile time known TC
For the simple copy loop (see test case) vectorizer selects VF equal to 32 while the loop is known to have 17 iterations only. Such behavior makes no sense to me since such vector loop will never be executed. The only case we may want to select VF large than TC is masked vectoriztion. So I haven't touched that case.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D114528
2021-12-13 13:48:46 +07:00
Florian Hahn
b6a2ddb6c8
[LV] Use info from State in some helper functions (NFC).
This updates several helper functions to use information provided by
VPTransformState instead of ILV directly, to help with the transition
out of ILV.
2021-12-12 20:48:38 +00:00
David Green
fed3041863 [LV][ARM] Improve reduction costmodel for mismatching extension types.
Given a MLA reduction from two different types (say i8 and i16), we were
previously failing to find the reduction pattern, often making us chose
the lower vector factor. This improves that by using the largest of the
two extension types, allowing us to use the larger VF as the type of the
reduction.

As per https://godbolt.org/z/KP549EEYM the backend handles this
valiantly, leading to better performance.

Differential Revision: https://reviews.llvm.org/D115432
2021-12-10 15:40:58 +00:00
Florian Hahn
505ad03c7d
[LV] Remove redundant IV casts using VPlan (NFCI).
This patch simplifies handling of redundant induction casts, by
removing dead cast instructions after initial VPlan construction.
This has the following benefits:

  1. fixes a crash
     (see @test_optimized_cast_induction_feeding_first_order_recurrence)
  2. Simplifies VPWidenIntOrFpInduction to a single-def recipes
  3. Retires recordVectorLoopValueForInductionCast.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D115112
2021-12-10 13:57:03 +00:00
Florian Hahn
acea6e9cfa
[Passes] Only run extra vector passes if loops have been vectorized.
This patch uses a similar trick as in D113947 to only run the extra
passes after vectorization on functions where loops have been
vectorized.

The reason for running the 'extra vector passes' is
simplification/unswitching of the runtime checks created by LV, there
should be no need to run them if nothing got vectorized

To do that, a new dummy analysis ShouldRunExtraVectorPasses has been
added. If loops have been vectorized for a function, LV will cache the
analysis. At the moment it uses MadeCFGChanges as proxy for loop
vectorized, which isn't perfect (it could be too aggressive, e.g.
because no runtime checks have been added), but should be good enough
for now.

The extra passes are now managed by a new FunctionPassManager that
runs its passes only if ShouldRunExtraVectorPasses has been cached.

Without this patch, `-extra-vectorizer-passes` has the following
compile-time impact:

NewPM-O3: +4.86%
NewPM-ReleaseThinLTO: +3.56%
NewPM-ReleaseLTO-g: +7.17%

http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=c292da649e2c6e88a31e702fdc474727d09c72bc&stat=instructions

With this patch, that gets reduced to

NewPM-O3: +1.43%
NewPM-ReleaseThinLTO: +1.00%
NewPM-ReleaseLTO-g: +1.58%

http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=e67d86b57810011cf285eb9aa1944781be6096f0&stat=instructions

It is probably still too high to enable by default, but much better.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D115052
2021-12-10 11:42:45 +00:00
Florian Hahn
978883d254
[VPlan] Add InductionDescriptor to VPWidenIntOrFpInduction. (NFC)
This allows easier access to the induction descriptor from VPlan,
without needing to go through Legal. VPReductionPHIRecipe already
contains a RecurrenceDescriptor in a similar fashion.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D115111
2021-12-10 09:55:09 +00:00
Alexey Bataev
19c5cf4167 [SLP]Fix comparator for cmp instruction vectorization.
The comparator for the sort functions should provide strict weak
ordering relation between parameters. Current solution causes compiler
crash with some standard c++ library implementations, because it does
not meet this criteria. Tried to fix it + it improves the iverall
vectorization result.

Differential Revision: https://reviews.llvm.org/D115268
2021-12-09 10:57:57 -08:00
Philip Reames
b24db85c0b [recurrence] Delete dead flag/fmf handling [NFC]
The recurrence lowering code has handling which claims to be about flag intersection, but all the callers pass empty arrays to the arguments.  The sole exception is a caller of a method which has the argument, but no implementation.

I don't know what the intent was here, but it certaintly doesn't actually do anything today.
2021-12-09 10:43:53 -08:00
Florian Hahn
d74a8a78ad
[LV] Mark various functions as const (NFC).
Make sure various accessors do not modify any state, in preparation for
D115111.
2021-12-09 10:51:29 +00:00
Florian Hahn
e9a2944495
[VPlan] Verify plan entry and exit blocks, set correct exit block.
Both the entry and exit blocks of the top-region of a plan must be
VPBasicBlocks. They also must have no predecessors or successors
respectively.

This invariant was broken when splitting a block for sink-after. To fix
the issue, set the exit block of the region *after* sink-after is done.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D114586
2021-12-07 16:26:31 +00:00
Cullen Rhodes
0395e01583 [IR] Split vscale_range interface
Interface is split from:

  std::pair<unsigned, unsigned> getVScaleRangeArgs()

into separate functions for min/max:

  unsigned getVScaleRangeMin();
  Optional<unsigned> getVScaleRangeMax();

Reviewed By: sdesmalen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D114075
2021-12-07 10:38:26 +00:00
Alexey Bataev
a101a9b64b [SLP]Fix compiler crash when calculating extract cost for undefs.
Need to add an extra check for potential undef values in
computeExtractCost function to avoid compiler crash on casting to
instructon.

Differential Revision: https://reviews.llvm.org/D115162
2021-12-06 10:46:13 -08:00