23 Commits

Author SHA1 Message Date
Florian Hahn
a5891fa4d2
[VPlan] Initial modeling of VF * UF as VPValue. (#74761)
This patch starts initial modeling of VF * UF in VPlan.
Initially, introduce a dedicated VFxUF VPValue, which is then
populated during VPlan::prepareToExecute. Initially, the VF * UF
applies only to the main vector loop region. Once we extend the
scope of VPlan in the future, we may want to associate different VFxUFs
with different vector loop regions (e.g. the epilogue vector loop)

This allows explicitly parameterizing recipes that rely on the
VF * UF, like the canonical induction increment. At the moment, this
mainly helps to avoid generating some duplicated calls to vscale with
scalable vectors. It should also allow using EVL as induction increments
explicitly in D99750. Referring to VF * UF is also needed in other
places that we plan to migrate to VPlan, like the minimum trip count
check during skeleton creation.

The first version creates the value for VF * UF directly in
prepareToExecute to limit the scope of the patch. A follow-on patch will
model VF * UF computation explicitly in VPlan using recipes.

Moved from Phabricator (https://reviews.llvm.org/D157322)
2023-12-08 18:30:30 +00:00
Florian Hahn
38f8b7cbe4
[LV] Replace value numbers with patterns in tests (NFC).
Replace some hardcoded value numbers in CHECK-LINES to use patterns, to
 make the tests more robust wrt renumbering.
2023-10-16 19:53:44 +01:00
Sergey Kachkov
0a5d52a757
[RISCV][CostModel] Add getCFInstrCost RISC-V implementation (#65599)
This patch implements getCFInstrCost TTI hook that mostly affects
LoopVectorizer decisions. It sets zero cost for PHI nodes and zero
throughput cost for branches (assuming that branches are likely to
be predicted). The implementation is similar to X86/AArch64/PowerPC
targets and reduces loop cost by excluding induction PHIs/loop latch
branches, which in turn leads to selecting smaller vectorization
factor.
2023-09-25 12:26:01 +03:00
Florian Hahn
af635a5547
[VPlan] Model wrap flags directly, remove *NUW opcodes (NFC)
Model wrap flags directly using VPRecipeWithIRFlags and clean up the
duplicated *NUW opcodes.

D157144 will build on this and also model FMFs for VPInstruction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D157194
2023-08-08 12:12:30 +01:00
Florian Hahn
93c5bae00e
[VPlan] Use printOperands for VPInstruction.
Use the printOperands for printing VPInstruction's operands to be more
in line with other recipes and ensure consistent printing after D15719.

Also removes some stray spaces in print output.
2023-08-08 11:31:21 +01:00
Philip Reames
7cc6b80d9a [RISCV][CostModel] Model vrgather.vv as being quadradic in LMUL
vrgather.vv across multiple vector registers (i.e. LMUL > 1) requires all to all data movement. This includes two conceptual sets of changes:

    For permutes, we were modeling these as being linear in LMUL.
    For reverse, we were modeling them as being fixed cost in LMUL.

Both were wrong, and have been adjusted to O(LMUL^2).  Noticed via code inspection while looking at something else.

Its worth asking whether we should be lowering reverse to something other than a vrgather at high LMULs. That shuffle is quite expensive.  (Future work)

Differential Revision: https://reviews.llvm.org/D152019
2023-07-18 11:52:34 -07:00
Florian Hahn
299f0ff60e
[VPlan] Print IR flags for VPRecipeWithIRFlags.
Now that IR flags are modeled as part of VPRecipeWithIRFlags, include
the flags when printing recipes.

Depends on D150027.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D150029
2023-05-23 20:36:16 +01:00
Florian Hahn
b0e118bd77
[LV] Update tests checking VPlans to use patterns for VPValues.
This makes the tests more robust to changes in value numbering for
VPValues.
2023-04-09 20:32:09 +01:00
Philip Reames
37646a2c28 [RISCV] Account for LMUL in memory op costs
Generally, the cost of a memory op will scale with the number of vector registers accessed. Machines might exist which have a narrow memory access than vector register width, but machines with a wider memory access width than vector register width seem unlikely.

I noticed this because we were preferring wide loads + deinterleaves on examples where the cost of a short gather (actually a strided load) would be better. Touching 8 vector registers instead of doing a 4 element gather is not a good tradeoff.

Differential Revision: https://reviews.llvm.org/D147470
2023-04-05 07:58:56 -07:00
sgokhale
4f9a5447c6 [LV] Reland "Update logic for calculating register usage due to invariants"
Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening
instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details).

An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check.

This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493

Differential Revision: https://reviews.llvm.org/D143422
2023-02-28 17:32:39 +05:30
sgokhale
3c8ddbde37 Revert "[LV] Update logic for calculating register usage due to invariants"
Observing test failure for llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll

This reverts commit d1628266946fdddb44bdad2b3ccf3cd5fc769f42.
2023-02-28 15:46:59 +05:30
sgokhale
d162826694 [LV] Update logic for calculating register usage due to invariants
Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening
instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details).

An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check.

This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493

Differential Revision: https://reviews.llvm.org/D143422
2023-02-28 11:05:26 +05:30
Luke Lau
15f9cf164c [LV][RISCV] Don't interleave scalable vector loops
It's less clear with scalable vectors than fixed length vectors that
interleaving exposes more ILP, as scalable vectors can be thought of a
sort of hardware form of interleaving, especially with larger LMULs.
This also addresses the unexpected additional unrolling that occurs when
using larger LMULs in the loop vectorizer.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144485
2023-02-22 10:15:11 +00:00
Roman Lebedev
be51fa4580
[NFC] Port all runlines for LoopVectorize pass tests to -passes syntax 2022-12-05 22:17:30 +03:00
Philip Reames
73eacf94e0 [RISCV] Incorporate LMUL into costs for arithmetic and shuffles
This reuses the routine implemented in 0e6f0b7 to implement several existing TODOs. Many of the operations scale linearly with LMUL; this change represents that in the cost model.

Differential Revision: https://reviews.llvm.org/D139039
2022-12-01 10:46:27 -08:00
Florian Hahn
0c5df7cd2f
Recommit "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe."
This reverts commit bf15f1e489aa2f1ac13268c9081a992a8963eb5b.

The updated version fixes a crash by checking the induction kind instead
of the opcode; for integer inductions, the step is always added, but the
opcode might not be set.
2022-11-30 17:04:20 +00:00
Florian Hahn
bf15f1e489
Revert "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe."
This reverts commit 0fa666ecedc3f36471c0fee925d664512e7525a8.

This triggers an assertion during AArch64 stage2 builds. Revert while I
investigate.

See https://lab.llvm.org/buildbot/#/builders/179/builds/4967/steps/11/logs/stdio
2022-11-28 22:43:11 +00:00
Philip Reames
db07d79ab0 [RISCV] Add cost model for integer and float vector arithmetic instructions.
This patch implements getArithmeticInstrCost for RISCV, supports cost
model for integer and float vector arithmetic instructions.

Differential Revision: https://reviews.llvm.org/D133552 (Original patch by jacquesguan.  Subset by me with todos added.)
2022-11-28 09:04:38 -08:00
Florian Hahn
0fa666eced
[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe.
This patch splits off the logic to transform the canonical IV to a
a value for an induction with a different start and step. This
transformation only needs to be done once (independent of VF/UF) and
enables sinking of VPScalarIVStepsRecipe as follow-up.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133758
2022-11-28 16:32:31 +00:00
Florian Hahn
12f9c7b270
[LV] Update RISCV test missed by bc19b7c3cc16. 2022-07-07 08:51:15 -07:00
Philip Reames
0e29a80fdc [RISCV] Add cost model for reverse shuffle
The majority of the cost appears to be forming the indices vector.

Differential Revision: https://reviews.llvm.org/D127141
2022-06-09 07:21:40 -07:00
Philip Reames
6071de3db6 [RISCV] Autogen a test for ease of update 2022-06-06 12:44:34 -07:00
Liqin.Weng
a84026821b [RISCV] Add test for experimental.vector.reverse
```
void vector_reverse_i64(int *A, int *B, int n) {
  #pragma clang loop vectorize_width(4, scalable)
  for (int i = n-1; i >= 0; i--)
    A[i] = B[i] + 1;
}
```
When option: scalable-vectorization is on (or set #pragma clang loop vectorize_width(elements, scalable)), Reverse Iterators can't loop vectorization as <vscale x elements x elementType>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125866
2022-05-27 06:30:07 +00:00