3258 Commits

Author SHA1 Message Date
serge-sans-paille
7030654296 [iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since fa5a4e1b95c8f37796 detected a few
regressions, fixing them.

Differential Revision: https://reviews.llvm.org/D124847
2022-05-04 08:32:38 +02:00
Igor Kirillov
4e5e042d9a [LoopVectorize] Support reductions that store intermediary result
Adds ability to vectorize loops containing a store to a loop-invariant
address as part of a reduction that isn't converted to SSA form due to
lack of aliasing info. Runtime checks are generated to ensure the store
does not alias any other accesses in the loop.

Ordered fadd reductions are not yet supported.

Differential Revision: https://reviews.llvm.org/D110235
2022-05-03 10:12:30 +01:00
David Green
6f81903e89 [LV][SLP] Mark fptosi_sat as vectorizable
This adds fptosi_sat and fptoui_sat to the list of trivially
vectorizable functions, mainly so that the loop vectorizer can vectorize
the instruction. Marking them as trivially vectorizable also allows them
to be SLP vectorized, and Scalarized.

The signature of a fptosi_sat requires two type overrides
(@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only
take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd
to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first
operand of the intrinsic as a overloaded (but not scalar) operand.

Differential Revision: https://reviews.llvm.org/D124358
2022-05-03 09:32:34 +01:00
Alexey Bataev
e74a73782f [SLP][NFC]Minor code changes for better readability, NFC. 2022-05-02 12:58:25 -07:00
Alexey Bataev
7ea03f0b4e [SLP]Improve reductions analysis and emission, part 1.
Currently SLP vectorizer walks through the instructions and selects
3 main classes of values: 1) reduction operations - instructions with same
reduction opcode (add, mul, min/max, etc.), which build the reduction,
2) reduced values - instructions with the same opcodes, but different
from the reduction opcode, 3) extra arguments - all other values,
instructions from the different basic block rather than the root node,
instructions with to many/less uses.

This scheme is not very efficient. It excludes some instructions and all
non-instruction values from the reductions (constants, proficient
gathers), to many possibly reduced values are marked as extra arguments.
Patch improves this process by introducing a bit extended analysis
stage. During this stage, we still try to select 3 classes of the
values: 1) reduction operations - same as before, 2) possibly reduced
values - all instructions from the current block/non-instructions, which
may build a vectorization tree, 3) extra arguments - instructions from
the different basic blocks. Additionally, an extra sorting of the
possibly reduced values occurs to build the scalar sequences which
highly likely will bed vectorized, e.g. loads are grouped by the
distance between them, constants are grouped together, cmp instructions
are sorted by their compare types and predicates, extractelement
instructions are sorted by the vector operand, etc. Also, these groups
are reordered by their length so the longest group is the first in the
list of the possibly reduced values.

The vectorization process tries to emit the reductions for all these
groups. These reductions, remaining non-vectorized possible reduced
values and extra arguments are then combined into the final expression
just like it was before.

Differential Revision: https://reviews.llvm.org/D114171
2022-05-02 12:03:58 -07:00
Florian Hahn
0ef8ca6d88
[VPlan] Do not create VPWidenCall recipes for scalar vector factors.
'Widen' recipe are only used when actual vector values are generated.
Fix tryToWidenCall to do not create VPWidenCallRecipes for scalar vector
factors.

This was exposed by D123720, because the widened recipes are considered
vector users.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D124718
2022-05-02 19:40:33 +01:00
Simon Pilgrim
34f97a3709 [VectorCombine] Merge isa<>/cast<> into dyn_cast<>. NFC.
We want to handle the the assert in VectorCombine so avoid the repeated isa/cast code.
2022-05-01 20:09:10 +01:00
Simon Pilgrim
09761ce295 [SLPVectorizer] Remove weird unicode character from comment. NFCI.
Whatever it was, Visual Assist really didn't like it....
2022-05-01 16:37:21 +01:00
Alexey Bataev
484fcb9888 [SLP][NFC]Fix a comment. 2022-04-29 09:27:13 -07:00
Anna Thomas
205246cb64 [CompileTime] [Passes] Avoid computing unnecessary analyses. NFC
Similar to c515b2f39e77, If there are no loops in the function as seen
through LI, we should avoid computing the remaining expensive analyses
(such as SCEV, BPI).  Reordered the analyses requests and early return
if there are no loops.

The logic of avoiding expensive analyses is applied to LoopVectorizer,
LoopLoadElimination and LoopUnrollPass, i.e. all function passes which operate
on loops.

This is an NFC with compile time improvement.

Differential Revision: https://reviews.llvm.org/D124529
2022-04-29 10:00:06 -04:00
Ricky Zhou
24a133e16f
[LV] Rename CountRoundDown to VectorTripCount (NFC)
The name CountRoundDown is potentially misleading, as the number of
iterations can be rounded up when folding the tail.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D119681
2022-04-29 13:50:00 +01:00
Florian Hahn
e66127e69b
[VPlan] Simplify & adjust code as suggested in D123005.
Improve code as suggested in D123005. Applied separately, because the
comments where made a diff that has not been rebased to current main.
2022-04-29 13:34:54 +01:00
David Green
7047c47918 [VecCombine] Fix sort comparator logic in foldShuffleFromReductions
I think this sort comparator was overly complex, and the windows
expensive check bot agreed, failing as it was not giving a strict weak
ordering. Change it to use the comparison of the mask values as unsigned
integers. This should sort the undef elements to the end whilst keeping
X<Y otherwise.
2022-04-29 09:30:02 +01:00
Florian Hahn
f4e1eaa375
Revert "[VPlan] Remove uneeded needsVectorIV check."
This reverts commit 43842b887e0a7b918bb2d6c9f672025b2c621f8a while I
investigate a buildbot failure.

It also reverts the follow-up commit
2883de05145fc5b4afb99b91f69ebb835af36af5.
2022-04-28 20:16:21 +01:00
David Green
ded8187e35 [VectorCombine] Try to reduce shuffle cost for commutative reduction operands
Given a shuffle feeding a commutative reduction, the lane ordering of
the shuffle will not alter the result. This is also true if there are a
number of operations between the reduction and the shuffle, providing
they only operate lane-wise. This patch searches for cases like that in
Vector Combine, allowing us to check the cost of the shuffle vs an
in-order identity shuffle and replace the order if possible. This only
handles a single shuffle at the moment to keep things simple, and is
able to ignore splats that produce results where every result is the
same.

This is a more powerful version of a combine that already happens in
instrcombine, capable of optimizing more cases by looking through more
instructions and being able to cost the shuffle.

Differential Revision: https://reviews.llvm.org/D123494
2022-04-28 19:46:12 +01:00
Alexey Bataev
75e1cf4a6a [COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Differential Revision: https://reviews.llvm.org/D100486
2022-04-28 10:04:41 -07:00
Florian Hahn
2883de0514
[VPlan] Fix comment formatting from 43842b887e. 2022-04-28 16:31:48 +01:00
Florian Hahn
43842b887e
[VPlan] Remove uneeded needsVectorIV check.
Remove one of the last remaining uses of ::needsVectorIV, preparing for
its removal. Now that usesScalars is available and based on the
information explicit in VPlan, there is no need to use the pre-computed
needsVectorIV.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123720
2022-04-28 16:27:34 +01:00
Alexey Bataev
9861ca0c23 Revert "[COST]Improve cost model for shuffles in SLP."
This reverts commit 29a470e3804ca216d4e76c88a38086eb61c200f9 to fix
a crash reported in https://reviews.llvm.org/D100486#3479989.
2022-04-28 08:11:56 -07:00
Alexey Bataev
29a470e380 [COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Differential Revision: https://reviews.llvm.org/D100486
2022-04-27 10:56:26 -07:00
Shilei Tian
a6b355dd31 [SLP] Fix a typo that causes redundant assertion and potential segment fault
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D124497
2022-04-27 10:07:59 -04:00
Vasileios Porpodas
fa8a9fea47 Recommit "[SLP][TTI] Refactoring of getShuffleCost Args to work like getArithmeticInstrCost"
This reverts commit 6a9bbd9f20dcd700e28738788bb63a160c6c088c.

Code review: https://reviews.llvm.org/D124202
2022-04-26 14:02:40 -07:00
Vasileios Porpodas
6a9bbd9f20 Revert "[SLP][TTI] Refactoring of getShuffleCost Args to work like getArithmeticInstrCost"
This reverts commit 55ce296d6f217fd0defed2592ff7b74b79b2c1f0.
2022-04-26 11:25:26 -07:00
Vasileios Porpodas
55ce296d6f [SLP][TTI] Refactoring of getShuffleCost Args to work like getArithmeticInstrCost
Before this patch `Args` was used to pass a broadcat's arguments by SLP.
This patch changes this. `Args` is now used for passing the operands of
the shuffle.

Differential Revision: https://reviews.llvm.org/D124202
2022-04-26 11:11:29 -07:00
Valery N Dmitriev
88b9e46fb5 [SLP] Steer for the best chance in tryToVectorize() when rooting with binary ops.
tryToVectorize() method implements one of searching paths for vectorizable tree roots in SLP vectorizer,
specifically for binary and comparison operations. Order of making probes for various scalar pairs
was defined by its implementation: the instruction operands, then climb over one operand if
the instruction is its sole user and then perform same actions for another operand if previous
attempts failed. Problem with this approach is that among these options we can have more than a
single vectorizable tree candidate and it is not necessarily the one that encountered first.
Trying to build vectorizable tree for each possible combination for just evaluation is expensive.
But we already have lookahead heuristics mechanism which we use for finding best pick among
operands of commutative instructions. It calculates cumulative score for candidates in two
consecutive lanes. This patch introduces use of the heuristics for choosing the best pair among
several combinations. We only try one that looks as most promising for vectorization.
Additional benefit is that we reduce total number of vectorization trees built for probes
because we skip those looking non-profitable early.

Reviewed By: Alexey Bataev (ABataev), Vasileios Porpodas (vporpo)
Differential Revision: https://reviews.llvm.org/D124309
2022-04-25 12:25:33 -07:00
David Green
9727c77d58 [NFC] Rename Instrinsic to Intrinsic 2022-04-25 18:13:23 +01:00
Valery N Dmitriev
edf7bed87b [SLP][NFC] Outline lookahead heuristics into a separate helper class.
Minor refactoring to reduce size of functional change D124309:
  look-ahead scoring routines pulled out of VLOperands and formed
  new LookAheadHeuristics helper class.

Reviewed By: Alexey Bataev (ABataev), Vasileios Porpodas (vporpo)
Differential Revision: https://reviews.llvm.org/D124313
2022-04-22 18:59:08 -07:00
Vasileios Porpodas
889588ee97 [SLP] Refactoring isLegalBroadcastLoad() to use ElementCount.
Replacing `unsigned` with `ElementCount` in the argument of `isLegalBroadcastLoad()`.
This helps reduce the diff of a future SLP patch for AArch64.
2022-04-21 10:19:00 -07:00
Florian Hahn
bea69b232f
[VPlan] Initial modeling of middle block in VPlan.
This patch extends the scope of VPlan to also include the exit (aka
middle) block.

For now, the exit block remains empty, but handling of exit values will
subsequently be moved to VPlan, by adding recipes to model exit values
in the exit block.

As a first step, this will allow fixing #51366.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123457
2022-04-20 19:34:41 +01:00
Vasileios Porpodas
8d4b5e0833 [NFC][SLP] Improved description of getShallowScore() and getScoreAtLevelRec()
Differential Revision: https://reviews.llvm.org/D124027
2022-04-19 12:15:36 -07:00
Florian Hahn
4026b718b8
[VPlan] Remove unused SCEV forward declaration (NFC). 2022-04-19 17:16:17 +02:00
Alexey Bataev
883571928c Revert "[SLP]Improve reductions analysis and emission, part 1."
This reverts commit 0e1f4d4d3cb08ff84df5adc4f5e41d0a2cebc53d to fix
a crash reported in PR54976
2022-04-19 06:17:03 -07:00
Florian Hahn
a65f2730d2
[VPlan] Expand induction step in VPlan pre-header.
This patch moves SCEV expansion of steps used by
VPWidenIntOrFpInductionRecipes to the pre-header using
VPExpandSCEVRecipe. This ensures that those steps are expanded while the
CFG is in a valid state. Previously, SCEV expansion may happen during
vector body code-generation, during which the CFG may be invalid,
causing issues with SCEV expansion.

Depends on D122095.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D122096
2022-04-19 13:06:39 +02:00
Vasileios Porpodas
b1333f03d9 Recommit "[SLP] Support internal users of splat loads"
Code review: https://reviews.llvm.org/D121940

This reverts commit 359dbb0d3daa8295848a09ddd083c79f6851888e.
2022-04-18 15:58:01 -07:00
Vasileios Porpodas
359dbb0d3d Revert "[SLP] Support internal users of splat loads"
This reverts commit f8e1337115623cb879f734940fd9dfeb29a611e7.
2022-04-18 12:12:34 -07:00
Vasileios Porpodas
f8e1337115 [SLP] Support internal users of splat loads
Until now we would only accept a broadcast load pattern if it is only used
by a single vector of instructions.

This patch relaxes this, and allows for the broadcast to have more than one
user vector, as long as all of its uses are internal to the SLP graph and
vectorized.

Differential Revision: https://reviews.llvm.org/D121940
2022-04-18 11:59:44 -07:00
Florian Hahn
73f5d7d0d6
[VPlan] Handle equal address and store ops in onlyFirstLaneDemanded.
With opaque pointers, the stored value and address can be the same.

Previously the code in VPWidenMemoryInstructionRecipe::onlyFirstLaneDemanded
incorrectly considers stores with matching store and pointer operands as
only demanding the first lane, causing a crash.
2022-04-15 22:53:33 +02:00
Johannes Doerfert
1fb415fee9 [AMDGPU][FIX] Proper load-store-vectorizer result with opaque pointers
The original code relied on the fact that we needed a bitcast
instruction (for non constant base objects). With opaque pointers there
might not be a bitcast. Always check if reordering is required instead.

Fixes: https://github.com/llvm/llvm-project/issues/54896

Differential Revision: https://reviews.llvm.org/D123694
2022-04-15 13:42:46 -05:00
Florian Hahn
2c14cdf831
[VPlan] Turn external defs in Value -> VPValue mapping.
This addresses an existing TODO by keeping a mapping of external IR
Value * definitions wrapped in VPValues for use in a VPlan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123700
2022-04-14 12:03:09 +02:00
serge-sans-paille
fa5a4e1b95 [iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since a96638e50ef5 detected a few
regressions, fixing them.
2022-04-13 20:53:19 +02:00
Alexey Bataev
0e1f4d4d3c [SLP]Improve reductions analysis and emission, part 1.
Currently SLP vectorizer walks through the instructions and selects
3 main classes of values: 1) reduction operations - instructions with same
reduction opcode (add, mul, min/max, etc.), which build the reduction,
2) reduced values - instructions with the same opcodes, but different
from the reduction opcode, 3) extra arguments - all other values,
instructions from the different basic block rather than the root node,
instructions with to many/less uses.

This scheme is not very efficient. It excludes some instructions and all
non-instruction values from the reductions (constants, proficient
gathers), to many possibly reduced values are marked as extra arguments.
Patch improves this process by introducing a bit extended analysis
stage. During this stage, we still try to select 3 classes of the
values: 1) reduction operations - same as before, 2) possibly reduced
values - all instructions from the current block/non-instructions, which
may build a vectorization tree, 3) extra arguments - instructions from
the different basic blocks. Additionally, an extra sorting of the
possibly reduced values occurs to build the scalar sequences which
highly likely will bed vectorized, e.g. loads are grouped by the
distance between them, constants are grouped together, cmp instructions
are sorted by their compare types and predicates, extractelement
instructions are sorted by the vector operand, etc. Also, these groups
are reordered by their length so the longest group is the first in the
list of the possibly reduced values.

The vectorization process tries to emit the reductions for all these
groups. These reductions, remaining non-vectorized possible reduced
values and extra arguments are then combined into the final expression
just like it was before.

Differential Revision: https://reviews.llvm.org/D114171
2022-04-12 17:46:11 -07:00
Muhammad Omair Javaid
42ebfa8269 Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"
This reverts commit 64b6192e812977092242ae34d6eafdcd42fea39d.

This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage:

https://lab.llvm.org/buildbot/#/builders/176/builds/1515

llvm-tblgen crashes after applying this patch.
2022-04-13 04:53:07 +05:00
Florian Hahn
5f1eb74850
[VPlan] Place VPExpandSCEVRecipe in pre-header.
After D121624 models the pre-header in VPlan, VPExpandSCEVRecipes can be
placed there. This ensures SCEV expansion happens before modifying the
CFG during VPlan execution, when CFG is incomplete.

Depends on D121624.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D122095
2022-04-10 10:26:20 +02:00
Florian Hahn
256c6b0ba1
[VPlan] Model pre-header explicitly.
This patch extends the scope of VPlan to also model the pre-header.
The pre-header can be used to place recipes that should be code-gen'd
outside the loop, like SCEV expansion.

Depends on D121623.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121624
2022-04-09 14:19:47 +02:00
Florian Hahn
467dbcd9f1
[LV] Set debug loc after setting insert point.
This fixes the code to actually use the location of the instruction, if
available. Previously, SetInsertPoint would overwrite the insert point
set from the instruction.
2022-04-08 20:34:40 +02:00
Florian Hahn
29fe998eaa
[VPlan] Preserve debug location when creating branch.
Update createEmptyBasicBlock to preserve the debug location of the
previous terminator.
2022-04-08 17:22:53 +02:00
Florian Hahn
4388c979da
[VPlan] Use vector.body as header name in VPlan native path.
This brings the VPlan block naming in line with the naming of the
generated basic blocks.
2022-04-07 10:31:12 +02:00
Jingu Kang
64b6192e81 [AArch64] Set maximum VF with shouldMaximizeVectorBandwidth
Set the maximum VF of AArch64 with 128 / the size of smallest type in loop.

Differential Revision: https://reviews.llvm.org/D118979
2022-04-05 13:16:52 +01:00
Florian Hahn
1ff022e21b
[LV] Add vector.body block to parent loop during skeleton creation.
When creating induction resume values, SCEV queries may rely on
LoopInfo. Make sure vector.body gets added to the loop of the pre-header
during skeleton construction.

%vector.body will be moved to the vector preheader during VPlan
execution.

Fixes #54745.
2022-04-05 11:54:17 +01:00
Florian Hahn
1817c526e1
[VPlan] Update VPInterleavedAccessInfo to use getVectorLoopRegion.
Update VPInterleavedAccessInfo  to use the generic getVectorLoopRegion
helper instead of relying on the entry block being the top-most vector
loop region.
2022-04-04 10:26:39 +01:00