56 Commits

Author SHA1 Message Date
Florian Hahn
4eb9838409
[VPlan] Generalize VPValue::isDefinedOutsideLoopRegions.
Update isDefinedOutsideLoopRegions to check if a recipe is defined
outside any region. Split off already approved
https://github.com/llvm/llvm-project/pull/95842 now that this can be
tested separately after landing VPlan-based LICM
https://github.com/llvm/llvm-project/issues/107501
2024-09-20 15:34:00 +01:00
Florian Hahn
a861ed411a
[VPlan] Add initial loop-invariant code motion transform. (#107894)
Add initial transform to move out loop-invariant recipes.

This also helps to fix a divergence between legacy and VPlan-based cost
model due to legacy using ScalarEvolution::isLoopInvariant in some
cases.

Fixes https://github.com/llvm/llvm-project/issues/107501.

PR: https://github.com/llvm/llvm-project/pull/107894
2024-09-20 11:22:03 +01:00
Florian Hahn
ea83e1c05a
[LV] Assign cost to all interleave members when not interleaving.
At the moment, the full cost of all interleave group members is assigned
to the instruction at the group's insert position, even if the decision
was to not form an interleave group.

This can lead to inaccurate cost estimates, e.g. if the instruction at
the insert position is dead. If the decision is to not vectorize but
scalarize or scather/gather, then the cost will be to total cost for all
members. In those cases, assign individual the cost per member, to more
closely reflect to choice per instruction.

This fixes a divergence between legacy and VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/108098.
2024-09-11 21:04:34 +01:00
Florian Hahn
bb60dd391f
[VPlan] Only use force-target-instruction-cost for recipes with insts.
To match the behavior of the legacy cost model, only apply
-force-target-instruction-cost to recipes with underlying instructions
for now, as only original IR instructions are considered by the legacy
cost model.

This fixes a difference between legacy and VPlan based cost model,
triggering the verification assertion, reported by @JonPsson1.
2024-07-23 21:05:10 +01:00
Florian Hahn
c8c0b18b5d
[LV] Update tests to not have dead interleave groups.
Update existing tests with dead interleave groups by adding users. This
ensures the tests keep testing what they were intended to test with a
planned change to skip unused instructions in cost computations.
2024-07-21 14:03:40 +01:00
Florian Hahn
b8741cc185
[VPlan] Relax assertion retrieving a scalar from VPTransformState::get.
The current assertion VPTransformState::get when retrieving a single
scalar only does not account for cases where a def has multiple users,
some demanding all scalar lanes, some demanding only a single scalar.

For an example, see the modified test case. Relax the assertion by also
allowing requesting scalar lanes only when the Def doesn't have only its
first lane used.

Fixes https://github.com/llvm/llvm-project/issues/88849.
2024-07-19 11:33:57 +01:00
Florian Hahn
17f98baf70
[LV] Add test with users both demanding all lanes and first-lane-only.
Add a test case where scalar steps are used by both a VPReplicateRecipe
(demands all scalar lanes) and a VPInstruction that only demands the
first lane.

Test case for https://github.com/llvm/llvm-project/issues/88849.
2024-07-19 10:29:43 +01:00
Florian Hahn
9a5a8731e7
[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. (#94760)
This patch introduces a new ResumePhi VPInstruction which creates a phi
in a leaf block of a VPlan. The first use is to create the phi node for
fixed-order recurrence resume values in the scalar preheader.

The VPInstruction takes 2 operands: 1) the incoming value from the
middle-block and a default value to be used for all other incoming
blocks.

In follow-up changes, it will also be used to create phis for reduction
and induction resume values.

Depends on https://github.com/llvm/llvm-project/pull/92651

PR: https://github.com/llvm/llvm-project/pull/94760
2024-07-11 16:08:04 +01:00
Florian Hahn
29b8b72117
[LV] Move check if any vector insts will be generated to VPlan. (#96622)
This patch moves the check if any vector instructions will be generated
from getInstructionCost to be based on VPlan. This simplifies
getInstructionCost, is more accurate as we check the final result and
also allows us to exit early once we visit a recipe that generates
vector instructions.

The helper can then be re-used by the VPlan-based cost model to match
the legacy selectVectorizationFactor behavior, this fixing a crash and
paving the way to recommit
https://github.com/llvm/llvm-project/pull/92555.

PR: https://github.com/llvm/llvm-project/pull/96622
2024-07-07 20:08:01 +01:00
Florian Hahn
959ff45bda
[LV] Regenerate test checks for zero_unroll.ll (NFC).
Regenerate test checks to better show impact of
https://github.com/llvm/llvm-project/pull/96622.
2024-07-05 11:37:13 +01:00
Florian Hahn
99d6c6d936
[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently
introduced VPIRBasicBlock.

Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.

After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.

This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.

Depends on https://github.com/llvm/llvm-project/pull/92525

PR: https://github.com/llvm/llvm-project/pull/92651
2024-07-05 10:08:42 +01:00
David Green
352a836176
[InstCombine] Canonicalize non-i8 gep of mul to i8 (#96606)
This is a small canonicalization for `gep i32, p, (mul x, C)` -> `gep
i8, p, (mul x, C*4)`, so that the mul can combine both of the constant
multiplications, and we take a small step towards canonicalizing more
geps to i8.

It currently doesn't attempt to check for multiple uses on the mul, but
that should be possible if it sounds better. Let me know what you think
of the idea in general.
2024-06-26 14:25:54 +01:00
Ramkumar Ramachandra
bb0d29a72d
[LV] fix logical error in trunc cost (#91136)
In LoopVectorizationCostModel::getInstructionCost(), when the condition
canTruncateToMinimalBitwidth() is satisfied, for a trunc, the source
type is computed as the smallest type of the source vector and the
destination vector, and the destination type is computed as the largest
type of the instruction and destination type. This is clearly a logical
error, as the original source vector type could be smaller than the
original destination vector type, and the trunc semantics are broken
because we're attempting to widen.

Fixes #47665.
2024-05-24 18:01:58 +01:00
Ramkumar Ramachandra
dc148c9fb8
[LV] add test for #47665, #88802 (#91135) 2024-05-24 10:50:43 +01:00
Nilanjana Basu
c1c5b854ad
[LV] Remove loop trip count threshold for deciding whether to interleave a loop (#67725)
A set of microbenchmarks (https://github.com/llvm/llvm-test-suite/pull/26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.
2024-02-05 17:23:58 -08:00
Jonas Paulsson
62b7e35f10
[SystemZ] Don't assert for i128 vectors in getInterleavedMemoryOpCost() (#78009)
This assert does not seem justified given that the LoopVectorizer can
form interleave groups containing i128 elements where the number of
elements per vector is indeed just one.
2024-01-15 17:31:18 +01:00
Craig Topper
7ec4f6094e
[InstCombine] Infer disjoint flag on Or instructions. (#72912)
The disjoint flag was recently added to IR in #72583

We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.
2023-12-02 14:11:12 -08:00
Craig Topper
03d4a9d94d
[InstCombine] Set disjoint flag when turning Add into Or. (#72702)
The disjoint flag was recently added to IR in #72583
2023-11-27 12:54:11 -08:00
Nikita Popov
2b7c347c7f [LoopVectorize] Convert test to opaque pointers (NFC)
I'm keeping the bitcast in the input here, because without it
we end up introducing a stride 1 assumption and end up testing
a different case.
2023-06-12 14:49:45 +02:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Florian Hahn
35af27c30a
[VPlan] Only create extracts for recurrence exits if there are live-outs.
Move the code to collect live-out earlier and only generate extracts for
exit values if there are any live-outs that use them.

Depends on D147472.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147567
2023-04-10 21:08:34 +01:00
Nikita Popov
9ed2f14c87 [AsmParser] Remove typed pointer auto-detection
IR is now always parsed in opaque pointer mode, unless
-opaque-pointers=0 is explicitly given. There is no automatic
detection of typed pointers anymore.

The -opaque-pointers=0 option is added to any remaining IR tests
that haven't been migrated yet.

Differential Revision: https://reviews.llvm.org/D141912
2023-01-18 09:58:32 +01:00
Nikita Popov
2fab927546 [LoopVectorize] Convert some tests to opaque pointers (NFC)
Check lines for some of these tests were regenerated. The difference
is that with opaque pointers SCEVExpander always emits i8 GEPs,
making the address calculation explicit. This is a known problem
that will be solved long term by making all address calculations
explicit.
2023-01-04 17:25:42 +01:00
Nikita Popov
7d7577256b [LoopVectorize] Convert some tests to opaque pointers (NFC) 2022-12-14 15:16:59 +01:00
Bjorn Pettersson
3528e63d89 [test] Remove duplicate RUN lines in Transform tests 2022-12-08 11:47:16 +01:00
Roman Lebedev
be51fa4580
[NFC] Port all runlines for LoopVectorize pass tests to -passes syntax 2022-12-05 22:17:30 +03:00
Nuno Lopes
a30e77b6f6 fix tests for commit 9df0b254d24eca098 2022-07-23 22:32:30 +01:00
Dávid Bolvanský
872f7000fc Revert "[NFCI] Regenerate SROA/LoopVectorize test checks"
This reverts commit 14e3450fb57305aa9ff3e9e60687b458e43835c9.
2022-04-04 01:15:30 +02:00
Dávid Bolvanský
a113a582b1 [NFCI] Regenerate LoopVectorize test checks 2022-04-03 21:56:24 +02:00
Roman Lebedev
2f80ea7f4f
[NFC][LV] Use different braces in debug output
The analysis passes output function name encapsulated in `'` braces,
but LV uses `"`. Harmonizing this may help in creating an update script
for the LV costmodel test checks.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D121105
2022-03-07 19:32:37 +03:00
Philip Reames
e6ad9ef4e7 [instcombine] Canonicalize constant index type to i64 for extractelement/insertelement
The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order.

I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions.

We chose the canonical type as i64 arbitrarily.  We might consider changing this to something else in the future if we have cause.

Differential Revision: https://reviews.llvm.org/D115387
2021-12-13 16:56:22 -08:00
Philip Reames
1a18de3d0a Autogen a bunch of instcombine and vectorizer tests
Done in advance of D115387.  These are all the ones which my local script could handle, there's a couple more which need manual updates.
2021-12-13 10:41:38 -08:00
Kerry McLaughlin
6f16ee5e14 Revert "[LoopVectorize] Extract the last lane from a uniform store"
This reverts commit 0d748b4d32cbddf58a1ff83f3ff178ec1ad49edc.
This is causing some failures when building Spec2017 with scalable
vectors. Reverting to investigate.
2021-11-10 11:21:19 +00:00
Kerry McLaughlin
0d748b4d32 [LoopVectorize] Extract the last lane from a uniform store
Changes VPReplicateRecipe to extract the last lane from an unconditional,
uniform store instruction. collectLoopUniforms will also add stores to
the list of uniform instructions where Legal->isUniformMemOp is true.

setCostBasedWideningDecision now sets the widening decision for
all uniform memory ops to Scalarize, where previously GatherScatter
may have been chosen for scalable stores.

This fixes an assert ("Cannot yet scalarize uniform stores") in
setCostBasedWideningDecision when we have a loop containing a
uniform i1 store and a scalable VF, which we cannot create a scatter for.

Reviewed By: sdesmalen, david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D112725
2021-11-09 14:43:16 +00:00
Juneyoung Lee
4a8e6ed2f7 [SLP,LV] Use poison constant vector for shufflevector/initial insertelement
This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef.
This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector.
The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ).

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D94061
2021-01-06 11:22:50 +09:00
Florian Hahn
0ea3749b3c
[LV] Set up branch from middle block earlier.
Previously the branch from the middle block to the scalar preheader & exit
was being set-up at the end of skeleton creation in completeLoopSkeleton.
Inserting SCEV or runtime checks may result in LCSSA phis being created,
if they are required. Adjusting branches afterwards may break those
PHIs.

To avoid this, we can instead create the branch from the middle block
to the exit after we created the middle block, so we have the final CFG
before potentially adjusting/creating PHIs.

This fixes a crash for the included test case. For the non-crashing
case, this is almost a NFC with respect to the generated code. The
only change is the order of the predecessors of the involved branch
targets.

Note an assertion was moved from LoopVersioning() to
LoopVersioning::versionLoop. Adjusting the branches means loop-simplify
form may be broken before constructing LoopVersioning. But LV only uses
LoopVersioning to annotate the loop instructions with !noalias metadata,
which does not require loop-simplify form.

This is a fix for an existing issue uncovered by D93317.
2020-12-27 18:21:12 +00:00
Evgeniy Brevnov
d0c95808e5 [LV] Unroll factor is expected to be > 0
LV fails with assertion checking that UF > 0. We already set UF to 1 if it is 0 except the case when IC > MaxInterleaveCount. The fix is to set UF to 1 for that case as well.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87679
2020-10-14 16:48:17 +07:00
Sam Parker
2596da3174 [CostModel] getCFInstrCost in getUserCost.
Have BasicTTI call the base implementation so that both agree on the
default behaviour, which the default being a cost of '1'. This has
required an X86 specific implementation as it seems to be very
reliant on those instructions being free. Changes are also made to
AMDGPU so that their implementations distinguish between cost kinds,
so that the unrolling isn't affected. PowerPC also has its own
implementation to prevent changes to the reg-usage vectorizer test.

The cost model test changes now reflect that ret instructions are not
generally free.

Differential Revision: https://reviews.llvm.org/D79164
2020-06-15 09:28:46 +01:00
Sjoerd Meijer
9529597cf4 Recommit #2: "[LV] Induction Variable does not remain scalar under tail-folding."
This was reverted because of a miscompilation. At closer inspection, the
problem was actually visible in a changed llvm regression test too. This
one-line follow up fix/recommit will splat the IV, which is what we are trying
to avoid if unnecessary in general, if tail-folding is requested even if all
users are scalar instructions after vectorisation. Because with tail-folding,
the splat IV will be used by the predicate of the masked loads/stores
instructions. The previous version omitted this, which caused the
miscompilation. The original commit message was:

If tail-folding of the scalar remainder loop is applied, the primary induction
variable is splat to a vector and used by the masked load/store vector
instructions, thus the IV does not remain scalar. Because we now mark
that the IV does not remain scalar for these cases, we don't emit the vector IV
if it is not used. Thus, the vectoriser produces less dead code.

Thanks to Ayal Zaks for the direction how to fix this.
2020-05-13 13:50:09 +01:00
Benjamin Kramer
f936457f80 Revert "Recommit "[LV] Induction Variable does not remain scalar under tail-folding.""
This reverts commit ae45b4dbe73ffde5fe3119835aa947d5a49635ed. It
causes miscompilations, test case on the mailing list.
2020-05-08 14:49:10 +02:00
Sjoerd Meijer
ae45b4dbe7 Recommit "[LV] Induction Variable does not remain scalar under tail-folding."
With 3 llvm regr tests fixed/updated that I had missed.
2020-05-07 11:52:20 +01:00
Florian Hahn
c491949694 [LV] Pick correct BB as insert point when fixing PHI for FORs.
Currently we fail to pick the right insertion point when
PreviousLastPart of a first-order-recurrence is a PHI node not in the
LoopVectorBody. This can happen when PreviousLastPart is produce in a
predicated block. In that case, we should pick the insertion point in
the BB the PHI is in.

Fixes PR44020.

Reviewers: hsaito, fhahn, Ayal, dorit

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D71071
2019-12-07 19:32:00 +00:00
Eric Christopher
cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher
a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Jonas Paulsson
79f2441eee [SystemZ] Rework getInterleavedMemoryOpCost()
Model this function more closely after the BasicTTIImpl version, with
separate handling of loads and stores. For loads, the set of actually loaded
vectors is checked.

This makes it more readable and just slightly more accurate generally.

Review: Ulrich Weigand
https://reviews.llvm.org/D53071

llvm-svn: 345998
2018-11-02 17:15:36 +00:00
Jonas Paulsson
1f067c94dc [LoopVectorizer] Fix for cost values of memory accesses.
This commit is a combination of two patches:

* "Fix in getScalarizationOverhead()"

   If target returns false in TTI.prefersVectorizedAddressing(), it means the
   address registers will not need to be extracted. Therefore, there should
   be no operands scalarization overhead for a load instruction.

* "Don't pass the instruction pointer from getMemInstScalarizationCost."

   Since VF is always > 1, this is a cost query for an instruction in the
   vectorized loop and it should not be evaluated within the scalar
   context of the instruction.

Review: Ulrich Weigand, Hal Finkel
https://reviews.llvm.org/D52351
https://reviews.llvm.org/D52417

llvm-svn: 345603
2018-10-30 14:34:15 +00:00
Florian Hahn
1086ce2397 [LV] Move InterleaveGroup and InterleavedAccessInfo to VectorUtils.h (NFC)
Move the 2 classes out of LoopVectorize.cpp to make it easier to re-use
them for VPlan outside LoopVectorize.cpp

Reviewers: Ayal, mssimpso, rengolin, dcaballe, mkuper, hsaito, hfinkel, xbolva00

Reviewed By: rengolin, xbolva00

Differential Revision: https://reviews.llvm.org/D49488

llvm-svn: 342027
2018-09-12 08:01:57 +00:00
Hideki Saito
ef380b0fc5 [LV] Fix for PR38110, LV encountered llvm_unreachable()
Summary: truncateToMinimalBitWidths() doesn't handle all Instructions and the worst case is compiler crash via llvm_unreachable(). Fix is to add a case to handle PHINode and changed the worst case to NO-OP (from compiler crash).

Reviewers: sbaranga, mssimpso, hsaito

Reviewed By: hsaito

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49461

llvm-svn: 337861
2018-07-24 22:30:31 +00:00
Ayal Zaks
1f58dda4e4 [LV] Fix PR34248 - recommit D32871 after revert r311304
Original commit r311077 of D32871 was reverted in r311304 due to failures
reported in PR34248.

This recommit fixes PR34248 by restricting the packing of predicated scalars
into vectors only when vectorizing, avoiding doing so when unrolling w/o
vectorizing. Added a test derived from the reproducer of PR34248.

llvm-svn: 311849
2017-08-27 12:55:46 +00:00
Chandler Carruth
bd6dc14230 Revert r311077: [LV] Using VPlan ...
This causes LLVM to assert fail on PPC64 and crash / infloop in other
cases. Filed http://llvm.org/PR34248 with reproducer attached.

llvm-svn: 311304
2017-08-20 23:17:11 +00:00