5102 Commits

Author SHA1 Message Date
vporpo
a461869db3
[SandboxIR][Pass] Implement Analyses class (#113962)
The Analyses class provides a way to pass around commonly used Analyses
to SandboxIR passes throught `runOnFunction()` and `runOnRegion()`
functions.
2024-10-28 18:00:52 -07:00
vporpo
bf4b31ad54
[SandboxVec][Legality] Check Fastmath flags (#113967) 2024-10-28 15:32:20 -07:00
vporpo
5ea694816b
[SandboxVec][Legality] Check opcodes and types (#113741) 2024-10-28 14:05:58 -07:00
Florian Hahn
0d0abb351b
[VPlan] Use ResumePhi to create reduction resume phis. (#110004)
Use VPInstruction::ResumePhi to create phi nodes for reduction resume
values in the scalar preheader, similar to how ResumePhis are used for
first-order recurrence resume values after 9a5a8731e77.

This allows simplifying createAndCollectMergePhiForReduction to only
collect reduction resume phis when vectorizing epilogue loops and adding
extra incoming edges from the main vector loop. Updating phis for the
epilogue vector loops requires special attention, because additional
incoming values from the bypass blocks need to be added.

PR: https://github.com/llvm/llvm-project/pull/110004
2024-10-28 20:14:08 +01:00
Ellis Hoag
6ab26eab4f
Check hasOptSize() in shouldOptimizeForSize() (#112626) 2024-10-28 09:45:03 -07:00
Alexey Bataev
7152bf3bc8 [SLP]Do not create new vector node if scalars fully overlap with the existing one
If the list of scalars vectorized as the part of the same vector node,
no need to generate vector node again, it will be handled as part of
overlapping matching.

Fixes #113810
2024-10-28 06:59:41 -07:00
Florian Hahn
7fe149cdf0
[VPlan] Replace getIRBasicBlock with IRBB in VPIRBB::execute (NFC).
Suggested in https://github.com/llvm/llvm-project/pull/109975. This
makes the function consistent throughout.
2024-10-27 16:22:18 +01:00
Shih-Po Hung
266ff98cba
[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974)
Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the
runtime VF, similar to #95305.

Since only reverse vector pointers require the runtime VF, the patch
sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for
reverse vector pointers. As a result, the generation of reverse vector
pointers is moved into a separate recipe.
2024-10-26 23:18:50 +08:00
Vasileios Porpodas
1540f772c7 Reapply "[SandboxVec][Legality] Reject non-instructions (#113190)"
This reverts commit eb9f4756bc3daaa4b19f4f46521dc05180814de4.
2024-10-25 12:55:58 -07:00
Vasileios Porpodas
eb9f4756bc Revert "[SandboxVec][Legality] Reject non-instructions (#113190)"
This reverts commit 6c9bbbc818ae8a0d2849dbc1ebd84a220cc27d20.
2024-10-25 12:52:31 -07:00
vporpo
6c9bbbc818
[SandboxVec][Legality] Reject non-instructions (#113190) 2024-10-25 12:47:19 -07:00
Florian Hahn
e724226da7
[VPlan] Return cost of 0 for VPWidenCastRecipe without underlying value.
In some cases, VPWidenCastRecipes are created but not considered in the
legacy cost model, including truncates/extends when evaluating a reduction
in a smaller type. Return 0 for such casts for now, to avoid divergences
between VPlan and legacy cost models.

Fixes https://github.com/llvm/llvm-project/issues/113526.
2024-10-25 21:25:44 +02:00
Florian Hahn
9648271a3c
[LV] Pass flag indicating epilogue is vectorized to executePlan (NFC)
This clarifies the flag, which is now only passed if the epilogue loop
is being vectorized.
2024-10-25 20:39:47 +02:00
Florian Hahn
7b9f988a53
[VPlan] Limit stride replacement to vector region and middle VPBB (NFC).
At the moment this in NFC, but ensures we only replace uses that are
dominated by runtime checks as we model more of the skeleton in VPlan.
2024-10-24 15:08:36 -07:00
Alexey Bataev
e914421d7f [SLP]Do correct signedness analysis for externally used scalars
If the scalars is used externally is in the root node, it may have
incorrect signedness info because of the conflict with the demanded bits
analysis. Need to perform exact signedness analysis and compute it
rather than rely on the precomputed value, which might be incorrect for
alternate zext/sext nodes.

Fixes #113520
2024-10-24 08:59:24 -07:00
Alexey Bataev
d2e7ee77d3 [SLP]Do not check for clustered loads only
Since SLP support "clusterization" of the non-load instructions, the
restriction for reduced values for loads only should be removed to avoid
compiler crash.

Fixes #113516
2024-10-24 08:16:42 -07:00
Alexey Bataev
cb5046da26 [SLP]Do not ignore undefs when trying to replace with "poisonous" shuffles
Need to consider undefs correctly, when trying to replace them with
potentially poisonous values in shuffles. Such elements should not be
silently replaced by poison values, instead complex analysis should be
implemented to see if it is safe to do it.

Fixes #113425
2024-10-24 07:47:23 -07:00
Florian Hahn
ef217a0f6b
[VPlan] Introduce and use getVectorPreheader (NFC).
Introduce a dedicated function to retrieve the vector preheader. This
ensures the correct block is used, even if the skeleton is exetended.
2024-10-23 21:01:52 -07:00
Florian Hahn
2dfb1c664c
[VPlan] Try to hoist Previous (and operands), if sinking fails for FORs. (#108945)
In some cases, Previous (and its operands) can be hoisted. This allows
supporting additional cases where sinking of all users of to FOR fails,
e.g. due having to sink recipes with side-effects.

This fixes a crash where we fail to create a scalar VPlan for a
first-order recurrence, but can create a vector VPlan, because the trunc
instruction of an IV which generates the previous value of the
recurrence has been optimized to a truncated induction recipe, thus
hoisting it to the beginning.

Fixes https://github.com/llvm/llvm-project/issues/106523.

PR: https://github.com/llvm/llvm-project/pull/108945
2024-10-23 13:12:03 -07:00
Alexey Bataev
b65b2b4ab6 [SLP]Expand vector to the whole register size in extracts adjustment
Need to expand the number of elements to the whole register to correctly
process estimation and avoid compiler crash.

Fixes #113462
2024-10-23 12:04:40 -07:00
Alexey Bataev
a3508e0246 [SLP]Small buidlvector only graph should contains scalars from same block
If the graph is small and has single buildvector node, all scalars
instructions must be from the same basic block to prevent compiler
crash.

Fixes #113451
2024-10-23 10:46:38 -07:00
Sterling-Augustine
f1be516223
[SandboxVectorizer] New class to actually collect and manage seeds (#113386)
This relands d91318b643188bb855f115b02af9532f87c787b7, with test-only
changes to make gcc-10 happy.
2024-10-23 10:26:18 -07:00
Florian Hahn
2437784a17
[LV] Replace unreachable by folding into else with assert (NFC).
Simplify code as suggested post-commit in
https://github.com/llvm/llvm-project/pull/110576/.
2024-10-21 21:46:48 -07:00
Kazu Hirata
38fca7b7db
[Vectorize] Simplify code with DenseMap::operator[] (NFC) (#113246) 2024-10-21 21:35:38 -07:00
Elvis Wang
b3edc764f7
[VPlan] Implement VPWidenCastRecipe::computeCost(). (NFCI) (#111339)
This patch implement `VPWidenCastRecipe::computeCost()` and skip cast
recipies in the in-loop reduction.
2024-10-22 12:23:49 +08:00
Florian Hahn
a4819bd46d
[VPlan] Restore case accidentially dropped in 34cdd67c85.
34cdd67c85 accidentially dropped the case for VPWidenIntrinsicSC. Add it
back again. Thanks @Mel-Chen for spotting this.
2024-10-21 20:33:58 -07:00
Florian Hahn
1d9b3222f3
[VPlan] Implement VPWidenSelectRecipe::computeCost.
Implement VPlan-based cost computation for VPWidenSelectRecipe.
2024-10-22 03:10:04 +01:00
Florian Hahn
dac0f7e83e
[VPlan] Add general recipe matcher, replace handwritten ones (NFC)
The new matcher is more flexible and can be used to build matchers for
additional recipe types without unnecessary duplication.
2024-10-21 16:46:45 -07:00
Sterling-Augustine
0de8de1b84
[SandboxVectorizer] revert New class to actually collect and manage s… (#113231)
…eeds (#112979)

This reverts commit d91318b643188bb855f115b02af9532f87c787b7.
2024-10-21 16:00:35 -07:00
Sterling-Augustine
d91318b643
[SandboxVectorizer] New class to actually collect and manage seeds (#112979)
There are many more tests to add, but I would like to get this reviewed
and the details sorted out before it grows too big.
2024-10-21 15:39:51 -07:00
Alexey Bataev
4b1b51ac52 [SLP]Initial non-power-of-2 support (but still whole register) for reductions
Enables initial non-power-of-2 support (but still requires number of
elements, forming whole registers) for reductions.
Enables extra vectorization for
MultiSource/Benchmarks/7zip/7zip-benchmark, CINT2006/464.h264ref and
CFP2017rate/526.blender_r (checked for SSE2)

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/112361
2024-10-21 12:25:39 -07:00
vporpo
54c93aabec
[SandboxVec][Legality] Scaffolding for Legality (#112623)
This patch adds a LegalityResultWithReason class for describing the
reason why legality decided not to vectorize the code.
2024-10-21 09:17:46 -07:00
Alexey Bataev
9e03920cbf [SLP]Ignore root gather node, when searching for reuses
Root gather/buildvector node should be ignored when SLP vectorizer tries
to find matching gather nodes, vectorized earlier. This node is
definitely the last one in the pipeline and it does not have users. It
may cause the compiler crash

Fixes #113143
2024-10-21 09:16:16 -07:00
David Green
17ac10c28f Revert "[SLP]Initial non-power-of-2 support (but still whole register) for reductions"
This reverts commit 7f2e937469a8cec3fe977bf41ad2dfb9b4ce648a as it causes
regressions in the tests it modifies, and undoes what was added in #100653
(which itself was a fix for a previous regression).
2024-10-21 13:37:44 +01:00
Florian Hahn
173907b5d7
[LV] Move logic to check if op is invariant to legacy cost model. (NFC)
This allows the function to be re-used in other places
2024-10-20 17:26:15 -07:00
Florian Hahn
cba5c77a71
[VPlan] Mark unreachable code path when retrieving the scalar PH. (NFCI) 2024-10-19 19:14:21 -07:00
Florian Hahn
2deb3a26fa
[LV] Fixup IV users only once during epilogue vectorization. (NFC)
Induction users only need to be updated when vectorizing the epilogue.
Avoid running fixupIVUsers when vectorizing the main loop during
epilogue vectorization.
2024-10-19 18:11:06 -07:00
Florian Hahn
2a6b09e0d3
[LV] Use type from InsertPos for cost computation of interleave groups.
Previously the legacy cost model would pick the type for the cost
computation depending on the order of the members in the input IR.
This is incompatible with the VPlan-based cost model (independent of
original IR order) and also doesn't match code-gen, which uses the type
of the insert position.

Update the legacy cost model to use the type (and address space) from
the Group's insert position.

This brings the legacy cost model in line with the legacy cost model and
fixes a divergence between both models.

Note that the X86 cost model seems to assign different costs to groups
with i64 and double types. Added a TODO to check.

Fixes https://github.com/llvm/llvm-project/issues/112922.
2024-10-18 19:12:40 -07:00
vporpo
1d09925b4a
[SandboxVec][Scheduler] Boilerplate and initial implementation. (#112449)
This patch implements a ready-list-based scheduler that operates on
DependencyGraph.
It is used by the sandbox vectorizer to test the legality of vectorizing
a group of instrs.

SchedBundle is a helper container, containing all DGNodes that
correspond to the instructions that we are attempting to schedule with
trySchedule(Instrs).
2024-10-18 16:18:43 -07:00
Florian Hahn
c7496cebac
[LV] Use SCEV to check if minimum iteration check is known. (#111310)
Use SCEV to check if the minimum iteration check (TC < Step) is known to
be false.

This is a first step towards addressing
https://github.com/llvm/llvm-project/issues/111098. To catch the exact
case from the issue, we need to do extra work to make sure the wrap
flags on the shl are preserved and used by SCEV.

Note that skeleton creation will be gradually moved to VPlan and this
simplification should be done as VPlan transform eventually. The current
plan is to move skeleton creation to VPlan starting from parts closest
to the parts already created by VPlan, starting with induction resume
value creation (started with
https://github.com/llvm/llvm-project/pull/110577), then memory and SCEV
checks and finally minimum iteration checks.

PR: https://github.com/llvm/llvm-project/pull/111310
2024-10-18 15:22:59 -07:00
Alexey Bataev
709abacdc3 [SLP]Check that operand of abs does not overflow before making it part of minbitwidth transformation
Need to check that the operand of the abs intrinsic can be safely
truncated before making it part of the minbitwidth transformation.

Fixes #112577
2024-10-18 13:56:19 -07:00
Alexey Bataev
e56e9dd8ad [SLP]Fix minbitwidth emission and analysis for freeze instruction
Need to add minbw emission and analysis for freeze instruction to fix
incorrect signedness propagation.

Fixes #112460
2024-10-18 13:36:37 -07:00
Alexey Bataev
f148d5791b
[LV]Initial support for safe distance in predicated DataWithEVL vectorization mode.
Enabled initial support for max safe distance in DataWithEVL mode. If
max safe distance is required, need to emit special code:
CMP = icmp ult AVL, MAX_SAFE_DISTANCE
SAFE_AVL = select CMP, AVL, MAX_SAFE_DISTANCE
EVL = call i32 @llvm.experimental.get.vector.length(i64 SAFE_AVL)

while vectorize the loop in DataWithEVL tail folding mode.

Reviewers: fhahn

Reviewed By: fhahn

Pull Request: https://github.com/llvm/llvm-project/pull/102897
2024-10-18 15:51:49 -04:00
Alexey Bataev
7f2e937469 [SLP]Initial non-power-of-2 support (but still whole register) for reductions
Enables initial non-power-of-2 support (but still requires number of
elements, forming whole registers) for reductions.
Enables extra vectorization for
MultiSource/Benchmarks/7zip/7zip-benchmark, CINT2006/464.h264ref and
CFP2017rate/526.blender_r (checked for SSE2)

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/112361
2024-10-18 12:50:11 -07:00
Florian Hahn
b497010854
[VPlan] Use VPInstruction::Name when assigning names (NFCI).
This slightly improves the printing of VPInstructions. NFC except debug
output.
2024-10-18 05:52:35 +01:00
Florian Hahn
81bbe19383
[VPlan] Add VPSingleDefRecipe::dump() to resolve ambigous lookup (NFC).
This allows calling ::dump() on various sub-classes of VPSingleDefRecipe
directly, as it resolves an ambigous name lookup.

Previously, calling VPWidenRecipe::dump() (and others), would result in

the following errors:
llvm/unittests/Transforms/Vectorize/VPlanTest.cpp:1284:19: error: member 'dump' found in multiple base classes of different types
 1284 |           WidenR->dump();
      |                   ^
llvm/include/../lib/Transforms/Vectorize/VPlanValue.h:434:8: note: member found by ambiguous name lookup
  434 |   void dump() const;
      |        ^
llvm/include/../lib/Transforms/Vectorize/VPlanValue.h:108:8: note: member found by ambiguous name lookup
  108 |   void dump() const;
      |        ^
1 error generated.
2024-10-17 05:31:29 +01:00
Florian Hahn
1b4a173fa4
[LV] Remove unneeded LoopScalarBody member variable. (NFC) 2024-10-17 04:15:47 +01:00
Jim Lin
5e9166e02a [SLP] Remove TTI parameter from vectorizeHorReduction and vectorizeRootInstruction. NFC.
Since TTI is a member variable.
2024-10-17 09:35:22 +08:00
Sterling-Augustine
ec24e23d84
[SandboxVectorizer][NFC] Make SeedContainer dump follow preferred approach (#112634) 2024-10-16 17:06:23 -07:00
Piotr Fusik
e55869ae8a
[LV][NFC] Fix typos (#111971) 2024-10-16 08:30:22 +02:00