5038 Commits

Author SHA1 Message Date
Alexey Bataev
f9bc00e4bb
[SLP]Initial support for interleaved loads
Adds initial support for interleaved loads, which allows
emission of segmented loads for RISCV RVV.

Vectorizes extra code for RISCV
CFP2006/447.dealII, CFP2006/453.povray,
CFP2017rate/510.parest_r, CFP2017rate/511.povray_r,
CFP2017rate/526.blender_r, CFP2017rate/538.imagick_r, CINT2006/403.gcc,
CINT2006/473.astar, CINT2017rate/502.gcc_r, CINT2017rate/525.x264_r

Reviewers: RKSimon, preames

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/112042
2024-10-14 09:12:33 -04:00
Florian Hahn
ec42778071
[LV] Remove unused type declaration from ILV (NFC). 2024-10-12 20:36:42 +01:00
Alexey Bataev
3ed8acf2f0 [SLP][NFC]Simplify check for external user parent basic block, NFC. 2024-10-11 13:11:16 -07:00
vporpo
31b85c6ead
[SandboxVec][Interval] Implement Interval::comesBefore() (#112026)
This patch implements `Interval::comesBefore(const Interval &Other)`
which returns true if this interval is strictly before Other in program
order. The function asserts that the intervals are disjoint.
2024-10-11 11:51:38 -07:00
vporpo
e8dd95e97b
[SandboxVec][DAG] Extend DAG (#111908)
This patch implements growing the DAG towards the top or bottom. This
does the necessary dependency checks and adds new mem dependencies.
2024-10-11 08:12:29 -07:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Florian Hahn
65da32c634
[LV] Account for any-of reduction when computing costs of blend phis.
Any-of reductions are narrowed to i1. Update the legacy cost model to
use the correct type when computing the cost of a phi that gets lowered
to selects (BLEND).

This fixes a divergence between legacy and VPlan-based cost models after
36fc291b6ec6d.

Fixes https://github.com/llvm/llvm-project/issues/111874.
2024-10-11 11:27:22 +01:00
David Sherwood
72f339de45
[LoopVectorize] Use predicated version of getSmallConstantMaxTripCount (#109928)
There are a number of places where we call getSmallConstantMaxTripCount
without passing a vector of predicates:

getSmallBestKnownTC
isIndvarOverflowCheckKnownFalse
computeMaxVF
isMoreProfitable

I've changed all of these to now pass in a predicate vector so that
we get the benefit of making better vectorisation choices when we
know the max trip count for loops that require SCEV predicate checks.

I've tried to add tests that cover all the cases affected by these
changes.
2024-10-11 10:10:15 +01:00
Alexey Bataev
4b5018d231 [SLP]Track repeated reduced value as it might be vectorized
Need to track changes with the repeated reduced value, since it might be
vectorized in the next attempt for reduction vectorization, to correctly
generate the code and avoid compiler crash.

Fixes #111887
2024-10-10 13:41:56 -07:00
vporpo
69c0067927
[SandboxVec][DAG] Refactoring: Outline code that looks for mem nodes (#111750) 2024-10-10 13:25:03 -07:00
vporpo
a4916d2005
[SandboxVec][DAG] Refactoring: Move MemPreds from DGNode to MemDGNode (#111897) 2024-10-10 12:42:28 -07:00
Florian Hahn
bb937e276d
[LV] Compute value of escaped induction based on the computed end value. (#110576)
Update fixupIVUsers to compute the value for escaped inductions using
the already computed end value of the induction (EndValue), but
subtracting the step.

This results in slightly simpler codegen, as we avoid computing the full
transformed index at VectorTripCount - 1.

PR: https://github.com/llvm/llvm-project/pull/110576
2024-10-10 20:04:46 +01:00
vporpo
747d8f3fc9
[SandboxVec][DAG] Implement PredIterator (#111604)
This patch implements an iterator for iterating over both use-def and
mem dependencies of MemDGNodes.
2024-10-10 12:01:56 -07:00
Ramkumar Ramachandra
1f919aa778
VectorCombine: lift one-use limitation in foldExtractedCmps (#110902)
There are artificial one-use limitations on foldExtractedCmps. Adjust
the costs to account for multi-use, and strip the one-use matcher,
lifting the limitations.
2024-10-10 14:10:41 +01:00
Piotr Fusik
a7a4daa429
[LV][NFC] Improve readability with bool instead of auto (#111532) 2024-10-10 12:30:18 +02:00
Jorge Gorbe Moya
756ec99c36
[SandboxVec] Re-land "Use sbvec-passes flag to create a pipeline of Region passes after BottomUpVec. (#111223)" (#111772)
https://github.com/llvm/llvm-project/pull/111223 was reverted because of
a build failure with `-DBUILD_SHARED_LIBS=on`.

The Passes component depends on Vectorizer (because PassBuilder needs to
be able to instantiate SandboxVectorizerPass). This resulted in CMake
doing this

1. when it builds lib/libLLVMVectorize.so.20.0git it adds
lib/libLLVMSandboxIR.so.20.0git to the command line, because it's listed
as a dependency (as expected)
2. when it's trying to build lib/libLLVMPasses.so.20.0git it adds
lib/libLLVMVectorize.so.20.0git to the command line, because it's listed
as a dependency (also as expected). But not libLLVMSandboxIR.so.

When SandboxVectorizerPass has its ctors/dtors defined inline, this
caused "undefined reference to vtable" linker errors. This change works
around that by moving ctors/dtors out of line.

Also fix a bazel build problem by adding the new
`llvm/lib/Transforms/Vectorize/SandboxVectorizer/Passes/PassRegistry.def`
as a textual header in the Vectorizer target.
2024-10-09 18:00:17 -07:00
Alexey Bataev
f020bf1526
[SLP]Initial support for non-power-of-2 (but whole reg) vectorization for stores
Allows non-power-of-2 vectorization for stores, but still requires, that
vectorized number of elements forms full vector registers.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/111194
2024-10-09 15:22:44 -04:00
Jorge Gorbe Moya
102c384b57
Revert "[SandboxVectorizer] Use sbvec-passes flag to create a pipeline of Region passes after BottomUpVec." (#111727)
Reverts llvm/llvm-project#111223

It broke one of the build bots:

LLVM Buildbot has detected a new failure on builder flang-aarch64-libcxx
running on linaro-flang-aarch64-libcxx while building llvm at step 5
"build-unified-tree".

Full details are available at:
https://lab.llvm.org/buildbot/#/builders/89/builds/8127
2024-10-09 10:45:23 -07:00
Jorge Gorbe Moya
10ada4ae73
[SandboxVectorizer] Use sbvec-passes flag to create a pipeline of Region passes after BottomUpVec. (#111223)
The main change is that the main SandboxVectorizer pass no longer has a
pipeline of function passes. Now it is a wrapper that creates sandbox IR
from functions before calling BottomUpVec.

BottomUpVec now builds its own RegionPassManager from the `sbvec-passes`
flag, using a PassRegistry.def file. For now, these region passes are
not run (BottomUpVec doesn't create Regions yet), and only a null pass
for testing exists.

This commit also changes the ownership model for sandboxir::PassManager:
instead of having a PassRegistry that owns passes, and PassManagers that
contain non-owning pointers to the passes, now PassManager owns (via
unique pointers) the passes it contains.

PassRegistry is now deleted, and the logic to parse and create a pass
pipeline is now in PassManager::setPassPipeline.
2024-10-09 10:37:05 -07:00
vporpo
ee0e17a4d8
[SandboxVec][DAG] Drop RAR and fix dependency scanning loop (#111715) 2024-10-09 10:29:48 -07:00
David Green
c136d3237a
[VectorCombine] Do not try to operate on OperandBundles. (#111635)
This bails out if we see an intrinsic with an operand bundle on it, to
make sure we don't process the bundles incorrectly.

Fixes #110382.
2024-10-09 16:20:03 +01:00
Florian Hahn
fa3258ecb8
[VPlan] Sink retrieving legacy costs to more specific computeCost impls. (#109708)
Make legacy cost retrieval independent of getInstructionForCost by
sinking it to more specific ::computeCost implementation (specifically
VPInterleaveRecipe::computeCost and VPSingleDefRecipe::computeCost).

Inline getInstructionForCost to VPRecipeBase::cost(), as it is now only
used to decide which recipes to skip during cost computation and when to
apply forced costs.

PR: https://github.com/llvm/llvm-project/pull/109708
2024-10-09 13:58:58 +01:00
Florian Hahn
01cbbc52dc
[VPlan] Request lane 0 for pointer arg in PtrAdd.
After 7f74651, the pointer operand may be replicated of a PtrAdd. Instead
of requesting a single scalar, request lane 0, which correctly handles the
case when there is a scalar-per-lane.

Fixes https://github.com/llvm/llvm-project/issues/111606.
2024-10-09 13:18:54 +01:00
Simon Pilgrim
00c1c589e0 DependencyGraph.cpp - mix MSVC "not all control paths return a value" warning. NFC. 2024-10-09 11:47:43 +01:00
David Sherwood
e080be5ac2
[NFC][LoopVectorize] Clean up some code around getting a context (#111114)
There are several places in LoopVectorize where we do more work
than necessary to obtain a LLVMContext. I've tried to make the
code more efficient.
2024-10-09 09:28:16 +01:00
Vasileios Porpodas
267e852109 [SandboxVec][DAG][NFC] Rename enumerators 2024-10-08 20:01:43 -07:00
vporpo
04a8bffdf7
[SandboxVec][DAG] Build actual dependencies (#111094)
This patch implements actual dependencies checking using BatchAA. This
adds memory dep edges between MemDGNodes.
2024-10-08 16:18:57 -07:00
Vasileios Porpodas
56d2c626f7 [SandboxVec][Interval] Add print() and dump() 2024-10-08 15:27:07 -07:00
Florian Hahn
6fbbe152fa
[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. (#110486)
This patch splits off intrinsic hanlding to a new
VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the
intrinsic ID to widen and the scalar result type (in case the intrinsic
is overloaded on the result type). It does not need access to an
underlying IR call instruction or function.

This means VPWidenIntrinsicRecipe can be created easily without access
to underlying IR.
2024-10-08 22:37:20 +01:00
Alexey Bataev
9f3c55954e
[SLP]Fix loads sorting for loads from diffrent basic blocks
Patch fixes lookup for loads from different basic blocks. Originally,
the code checked is the main key (combined with parent basic block) was
created, but did not include the key into LoadsMap. When the code looked for
the load pointer in LoadsMap, it skipped check for parent basic block
and could mix loads from different basic blocks (but the same underlying
pointer). Currently, it does lead to any issues, since later the code
compares parent basic blocks and sorts loads properly. But it increases
compile time and affects compile time.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/111521
2024-10-08 16:44:16 -04:00
Alexey Bataev
a65a5feb1a
[SLP]Improve masked loads vectorization, attempting gathered loads
If the vector of loads can be vectorized as masked gather and there are
several other masked gather nodes, compiler can try to attempt to check,
if it possible to gather such nodes into big consecutive/strided loads
  node, which provide better performance.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/110151
2024-10-08 16:43:10 -04:00
Florian Hahn
36fc291b6e
[VPlan] Implement VPBlendRecipe::computeCost.
Implement VPBlendRecipe::computeCost. VPBlendRecipe is currently is also
used if only the first lane is used.

This also requires pre-computing costs for forced scalars and
instructions considered profitable to scalarize. For those, the cost
will be computed separately in the legacy cost model. This will also be
needed when implementing VPReplicateRecipe::computeCost.
2024-10-08 21:33:42 +01:00
Florian Hahn
3829fd75c8
[VPlan] Remove redundant getVPSingleValue for VPSingleDefRecipes (NFC). 2024-10-08 20:31:41 +01:00
Simon Pilgrim
d38addf099 Fix MSVC signed/unsigned mismatch warning 2024-10-08 17:36:35 +01:00
Alexey Bataev
45826513ef [SLP][NFC]Fix clang-tidy suggestions, cleanup, NFC. 2024-10-08 08:31:23 -07:00
Florian Hahn
3ec6f805c5
[VPlan] Don't created GEP x, 0 for interleave group pointers.
The GEP with offet 0 is redundant, remove it. This addresses a TODO
from 7f74651837b ((#106431).
2024-10-08 12:08:13 +01:00
Sterling-Augustine
3f5039323c
[SandboxVectorizer][NFC] Remove unused include (#111418) 2024-10-07 11:47:00 -07:00
Sterling-Augustine
93bfa7886b
[SandboxVectorizer] Define SeedBundle: a set of instructions to be vectorized [retry] (#111073)
[Retry 110696 with a proper rebase.]

Seed collection will assemble instructions to be vectorized into
SeedBundles. This data structure is not intended to be used directly,
but will be the basis for load bundles, store bundles, and so on.
2024-10-07 11:20:50 -07:00
David Sherwood
66b282014c
[LoopVectorize] Remove redundant code in emitSCEVChecks (#111132)
There was some code in emitSCEVChecks to update the dominator
tree if LoopBypassBlocks is empty, however there are no tests
that fail when replacing this code with an assert. I built
both SPEC2017 and the LLVM test suite and also didn't see any
build failures. I've removed the code for now and added an
assert to guard this in case anything changes, since it seems
pointless to have code that's impossible to defend.
2024-10-07 07:58:27 +01:00
Florian Hahn
7f74651837
[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431)
Update VPInterleaveRecipe to always use the pointer to member 0 as
pointer argument. This in many cases helps to remove unneeded index
adjustments and simplifies VPInterleaveRecipe::execute.

In some rare cases, the address of member 0 does not dominate the insert
position of the interleave group. In those cases a PtrAdd VPInstruction
is emitted to compute the address of member 0 based on the address of
the insert position. Alternatively we could hoist the recipe computing
the address of member 0.
2024-10-06 22:53:13 +01:00
Florian Hahn
45b526afa2
[LV] Honor uniform-after-vectorization in setVectorizedCallDecision.
The legacy cost model always computes the cost for uniforms as cost of
VF = 1, but VPWidenCallRecipes would be created, as
setVectorizedCallDecisions would not consider uniform calls.

Fix setVectorizedCallDecision to set to Scalarize, if the call is
uniform-after-vectorization.

This fixes a bug in VPlan construction uncovered by the VPlan-based
cost model.

Fixes https://github.com/llvm/llvm-project/issues/111040.
2024-10-06 10:35:06 +01:00
Florian Hahn
68210c7c26
[VPlan] Only generate first lane for VPPredInstPHI if no others used.
IF only the first lane of the result is used, only generate the first
lane.

Fixes https://github.com/llvm/llvm-project/issues/111042.
2024-10-05 19:15:05 +01:00
Alexey Bataev
7692d106b4 [SLP][NFC]Remove dead code + use nlogn lookups instead of n^2 2024-10-04 15:32:04 -07:00
Alexey Bataev
f74879cf0c
[SLP]Make PHICompare comparator follow weak strict ordering requirement
Reviewers: efriedma-quic

Reviewed By: efriedma-quic

Pull Request: https://github.com/llvm/llvm-project/pull/110529
2024-10-04 14:23:48 -04:00
Alexey Bataev
d991e05452 [SLP]Fix compiler crash on vectorizing gatehrd loads with different types
Need to check not only parents, but also types for compatible loads,
when trying to build the vectorizable sequences.

Fixes crash reported in https://github.com/llvm/llvm-project/pull/107461#issuecomment-2392980214
2024-10-04 08:36:57 -07:00
Han-Kuan Chen
f5815b9903
[SLP] NFC. Set NumOperands directly if VL[0] is IntrinsicInst. (#111103) 2024-10-04 19:38:33 +08:00
Vasileios Porpodas
45582ed240 [SandboxVec][DAG][NFC] Rename isMemDepCandidate() to isMemDepNodeCandidate() 2024-10-03 10:39:10 -07:00
Alexey Bataev
133c1224de [SLP]Fix a crash on accessing element with index -1 for reused mask with PoisonMaskElem
Need to check if the index from the ReuseShuffleIndices mask is not
equal to PoisonMaskElem before trying to access the element by index.
2024-10-03 08:24:05 -07:00
Han-Kuan Chen
5901463ada
[SLP] NFC. BaseIndex is not used for getSameOpcode. (#110948) 2024-10-03 19:58:44 +08:00
Alexey Bataev
c1b911c579 [SLP]Do correct signedness analysis for clustered nodes
Should get the signedness info from the original scalar instructions, if
possible, to correctly generate sext/zext instructions. Also, the
clustered node must be assigned a gather node user info to correctly
estimate its bitwidth/sign.
2024-10-02 12:56:49 -07:00