257 Commits

Author SHA1 Message Date
John Brawn
073e65a8e5
[LoopVectorize] Make needsExtract notice scalarized instructions (#119720)
LoopVectorizationCostModel::needsExtract should recognise instructions
that have been widened by scalarizing as scalar instructions, and thus
not needing an extract when used by later scalarized instructions.

This fixes an incorrect cost calculation in computePredInstDiscount,
where we are adding a scalarization overhead cost when we shouldn't,
though I haven't come up with a test case where it makes a difference.
It will make a difference when the cost model switches to using the cost
kind TCK_CodeSize for optsize, as not doing this causes the test
LoopVectorize/X86/small-size.ll to get worse.
2025-01-02 14:31:36 +00:00
Florian Hahn
2d038caeeb
[VPlan] Remove stray space when printing VPWidenCastRecipe.
printFlags() already takes care of printing a single space if there are
no flags. Remove the extra space when printing a recipe without flags.
2024-12-24 20:23:48 +00:00
Florian Hahn
4ad0fdd163
[VPlan] Remove reverse() of predecessors from VPInstruction::generate.
This was originally done to reduce the diff for the change. Remove it
and update the remaining tests. NFC modulo reordering of incoming
values.

Clean up after https://github.com/llvm/llvm-project/pull/114292.
2024-12-17 20:44:32 +00:00
Florian Hahn
d1dff1dc18
[LV] Remove hard-coded VPValue numbers in test check lines. (NFC)
Make tests independent of VPlan value numbers.
2024-12-12 22:33:00 +00:00
Nikita Popov
f7685af4a5 [InstCombine] Move gep of phi fold into separate function
This makes sure that an early return during this fold doesn't end
up skipping later gep folds.
2024-12-05 15:20:56 +01:00
Nikita Popov
462cb3cd6c
[InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144)
If the gep is nusw (usually via inbounds) and the offset is
non-negative, we can infer nuw.

Proof: https://alive2.llvm.org/ce/z/ihztLy
2024-12-05 14:36:40 +01:00
Florian Hahn
82821254f5
[LV] Use IVUpdateMayOverflow to set HasNUW. (#111758)
If IVUpdateMayOverflow is false, we proved that the induction increment
cannot overflow in the vector loop. This allows setting NUW in some
cases when folding the tail.

PR: https://github.com/llvm/llvm-project/pull/111758
2024-11-28 10:12:41 +00:00
David Sherwood
aeb88f6778
Fix test failures introduced by PR #113697 (#116941)
Don't match the entire floating point debug output since it's prone to
rounding errors depending upon the target.
2024-11-20 09:10:51 +00:00
David Sherwood
3097c60928
[LoopVectorize][NFC] Rewrite tests to check output of vplan cost model (#113697)
Currently it's very difficult to improve the cost model for tail-folded
loops because as soon as you add a VPInstruction::computeCost function
that adds the costs of instructions such as
VPInstruction::ActiveLaneMask
and VPInstruction::ExplicitVectorLength the assert in
LoopVectorizationPlanner::computeBestVF fails for some tests. This is
because the VF chosen by the legacy cost model doesn't match the vplan
cost model. See PR #90191. This assert is currently making it difficult
to improve the cost model.

Hopefully we will be in a position to remove the assert soon, however
in order to do that we have to fix up a whole bunch of tests that rely
upon the legacy cost model output. I've tried my best to update
these tests to use vplan output instead.

There is still work needed for the VF=1 case because the vplan cost
model is not printed out in this case. I've not attempted to fix those
in this patch.
2024-11-19 08:55:39 +00:00
David Green
7665d3f0df [ARM] Extra MVE reduction test cases. NFC 2024-11-12 10:00:57 +00:00
David Green
ab9178e3e7 [ARM] Add a couple of new MVE reduction tests. NFC
Nowadays we generate add(zext(mul(sext, sext)) with nneg zext and the multi-use
test is awkward to get right. This should help our test coverage with the vplan
cost transition.
2024-11-08 14:32:06 +00:00
Paul Walker
38fffa630e
[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548) 2024-11-06 11:53:33 +00:00
Yingwei Zheng
96b14f2ccb
[Reland][InstCombine] Fix FMF propagation in foldSelectIntoOp (#114499)
Relands #114356. Compared to the last version, this patch only merges
poison-generating/nsz flags from the select to fix LV regression in
`llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll`.
2024-11-01 12:22:57 +08:00
David Green
9735c05186
[ValueTracking] Compute KnownFP state from recursive select/phi. (#113686)
Given a recursive phi with select:
 %p = phi [ 0, entry ], [ %sel, loop]
 %sel = select %c, %other, %p

The fp state can be calculated using the knowledge that the select/phi
pair can only be the initial state (0 here) or from %other. This adds a
short-cut into computeKnownFPClass for PHI to detect that the select is
recursive back to the phi, and if so use the state from the other
operand.

This helps to address a regression from #83200.
2024-10-31 07:50:44 +00:00
Florian Hahn
3ec6f805c5
[VPlan] Don't created GEP x, 0 for interleave group pointers.
The GEP with offet 0 is redundant, remove it. This addresses a TODO
from 7f74651837b ((#106431).
2024-10-08 12:08:13 +01:00
Florian Hahn
ea83e1c05a
[LV] Assign cost to all interleave members when not interleaving.
At the moment, the full cost of all interleave group members is assigned
to the instruction at the group's insert position, even if the decision
was to not form an interleave group.

This can lead to inaccurate cost estimates, e.g. if the instruction at
the insert position is dead. If the decision is to not vectorize but
scalarize or scather/gather, then the cost will be to total cost for all
members. In those cases, assign individual the cost per member, to more
closely reflect to choice per instruction.

This fixes a divergence between legacy and VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/108098.
2024-09-11 21:04:34 +01:00
Philip Reames
2c7786e94a
Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770)
This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In general, we prefer 0.0
over -0.0 as the identity value for an fadd. We use that value in
several places, but don't in others. So, let's be consistent and use the
same identity (when nsz allows) everywhere.

This creates a bunch of test churn, but due to 924907bc6, most of that
churn doesn't actually indicate a change in codegen. The exception is
that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd
reductions. Or said differently, it allows the neutral value of an
ordered fadd reduction to be 0.0.
2024-09-03 09:16:37 -07:00
Nikita Popov
a105877646
[InstCombine] Remove some of the complexity-based canonicalization (#91185)
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.

However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.

The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.

For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
2024-08-21 12:02:54 +02:00
David Green
dcd246cbde
[ARM] Add scalar add_sat costs. (#100988)
These can usually generate:
 - qadd / qsub for signed i32 scalars
- uqadd16 / qadd16 / uqsub16 / qsub16 with an extend for signed/unsigned
i8/i16
 - Are expanded to an add + cmp + sel otherwise

This can lead to differences in unrolling etc, but should be a better
cost for the instructions.
2024-08-05 18:56:04 +01:00
Florian Hahn
c8c0b18b5d
[LV] Update tests to not have dead interleave groups.
Update existing tests with dead interleave groups by adding users. This
ensures the tests keep testing what they were intended to test with a
planned change to skip unused instructions in cost computations.
2024-07-21 14:03:40 +01:00
Florian Hahn
99d6c6d936
[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently
introduced VPIRBasicBlock.

Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.

After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.

This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.

Depends on https://github.com/llvm/llvm-project/pull/92525

PR: https://github.com/llvm/llvm-project/pull/92651
2024-07-05 10:08:42 +01:00
Ramkumar Ramachandra
0f111ba790
LoopInfo: introduce Loop::getLocStr; unify debug output (#93051)
Introduce a Loop::getLocStr stolen from LoopVectorize's static function
getDebugLocString in order to have uniform debug output headers across
LoopVectorize, LoopAccessAnalysis, and LoopDistribute. The motivation
for this change is to have UpdateTestChecks recognize the headers and
automatically generate CHECK lines for debug output, with minimal
special-casing.
2024-06-25 13:12:15 +01:00
Florian Hahn
3808ba78de
[VPlan] Model middle block via VPIRBasicBlock. (#95816)
Use VPIRBasicBlock to wrap the middle block and implement patching up
branches in predecessors in VPIRBasicBlock::execute. The IR middle block
is only created after skeleton creation. Initially a regular
VPBasicBlock is created, which will later be replaced by a
VPIRBasicBlock once the middle IR basic block has been created.

Note that this slightly changes the order of instructions created in the
middle block; code generated by recipe execution in the middle block
will now be inserted before the terminator (and in between the compare
to used by the terminator). The original order will be restored in
https://github.com/llvm/llvm-project/pull/92651.


PR: https://github.com/llvm/llvm-project/pull/95816
2024-06-20 13:42:20 +01:00
David Green
46541a3636 [ARM] Add a extra MVE low-trip-count loop. NFC
This makes use of half floats, which makes the masked stores expensive.
2024-05-23 21:50:47 +01:00
David Green
d486a4c29a [ARM] Ensure extra uses are not dead in tail-folding-counting-down.ll. NFC
This might help keep the test valid if vplan is removing dead intructions.
2024-04-29 15:47:24 +01:00
Michele Scandale
09eb9f1136
[InstCombine] Fix for folding select into floating point binary operators. (#83200)
Folding a `select` into a floating point binary operators can only be
done if the result is preserved for both case. In particular, if the
other operand of the `select` can be a NaN, then the transformation
won't preserve the result value.
2024-03-19 09:47:07 -07:00
Nikita Popov
90ba33099c
[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00
Florian Hahn
241fe83704
[VPlan] Introduce ComputeReductionResult VPInstruction opcode. (#70253)
This patch introduces a new ComputeReductionResult opcode to compute the
final reduction result in the middle block. The code from fixReduction
has been moved to ComputeReductionResult, after some earlier cleanup
changes to model parts of fixReduction explicitly elsewhere as needed.

The recipe may be broken down further in the future.

Note that  the phi nodes to merge the reduction result from the trip 
count check and the middle block, to be used as resume value for the
scalar remainder loop are also generated based on 
ComputeReductionResult.

Once we have a VPValue for the reduction result, this can also be
modeled explicitly and moved out of the recipe.
2024-01-04 22:53:18 +00:00
Florian Hahn
f18536d642
[VPlan] Model address separately. (#72164)
Move vector pointer generation to a separate VPVectorPointerRecipe.
This untangles address computation from the memory recipes future
and is also needed to enable explicit unrolling in VPlan.

https://github.com/llvm/llvm-project/pull/72164
2024-01-01 19:51:15 +00:00
Nikita Popov
d77067d08a
[ValueTracking] Add dominating condition support in computeKnownBits() (#73662)
This adds support for using dominating conditions in computeKnownBits()
when called from InstCombine. The implementation uses a
DomConditionCache, which stores which branches may provide information
that is relevant for a given value.

DomConditionCache is similar to AssumptionCache, but does not try to do
any kind of automatic tracking. Relevant branches have to be explicitly
registered and invalidated values explicitly removed. The necessary
tracking is done inside InstCombine.

The reason why this doesn't just do exactly the same thing as
AssumptionCache is that a lot more transforms touch branches and branch
conditions than assumptions. AssumptionCache is an immutable analysis
and mostly gets away with this because only a handful of places have to
register additional assumptions (mostly as a result of cloning). This is
very much not the case for branches.

This change regresses compile-time by about ~0.2%. It also improves
stage2-O0-g builds by about ~0.2%, which indicates that this change results
in additional optimizations inside clang itself.

Fixes https://github.com/llvm/llvm-project/issues/74242.
2023-12-06 14:17:18 +01:00
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
Craig Topper
7ec4f6094e
[InstCombine] Infer disjoint flag on Or instructions. (#72912)
The disjoint flag was recently added to IR in #72583

We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.
2023-12-02 14:11:12 -08:00
Nikita Popov
5918f62301
[InstCombine] Infer zext nneg flag (#71534)
Use KnownBits to infer the nneg flag on zext instructions.

Currently we only set nneg when converting sext -> zext, but don't set
it when we have a zext in the first place. If we want to use it in
optimizations, we should make sure the flag inference is consistent.
2023-11-08 09:34:40 +01:00
David Sherwood
07f0e75b53
[LoopVectorize] Fix bug with code to hoist runtime checks (#70937)
There was a silly mistake in the expandBounds function that was using
the wrong type when calling expandCodeFor and always assuming the stride
is 64 bits. I've added the following test to defend this fix:

Transforms/LoopVectorize/ARM/mve-hoist-runtime-checks.ll
2023-11-02 10:02:50 +00:00
Igor Kirillov
b84977bcc1 Rename test to avoid overlapping with debug output 2023-10-19 12:21:31 +00:00
Florian Hahn
f108c6cdc1
[VPlan] Fold (MUL A, 1) -> A as VPlan2VPlan transform.
Add first VPlan-based recipe simplification to fold (MUL A, 1) -> A.
Among other things, this enables additional simplifications after
applying versioned strides, as follow up to D147783.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D159200
2023-09-18 21:45:34 +01:00
Florian Hahn
96e83d3705
[LV] Use IRBuilder to create and optimize middle-block compare.
Split off from D150398 to avoid builder-related diff changes there.
Using IRBuilder to create ICmps simplifies the result if both operands
are constants.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158332
2023-08-29 11:42:18 +01:00
Florian Hahn
707359ecf5
Recommit "[LV] Re-use existing broadcast value for live-ins."
This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53.

Recommits eea9258648ce with a fix to only erase the instruction from the
first part if it is defined outside the loop. This fixes a
use-after-free error reported.
2023-08-01 15:54:02 +01:00
Nikita Popov
d01aec4c76 [InstCombine] Set dead phi inputs to poison in more cases
Set phi inputs to poison whenever we find a dead edge (either
during initial worklist population or the main InstCombine run),
instead of only doing this for successors of dead blocks.

This means that the phi operand is set to poison even if for
critical edges without an intermediate block.

There are quite a few test changes, because the pattern is fairly
common in vectorizer output, for cases where we know the vectorized
loop will be entered.
2023-08-01 11:53:47 +02:00
Nikita Popov
7c64449e44 [LoopVectorize] Regenerate test checks (NFC)
To reduce spurious diffs in future changes.
2023-08-01 11:30:55 +02:00
Martin Storsjö
245ec675a4 Revert "[LV] Re-use existing broadcast value for live-ins."
This reverts commit eea9258648ce73507f6f85c395de978af659d498.

That commit triggered crashes in the following testcase:

$ cat reduced.c
typedef struct {
  int a[8]
} b;
typedef struct {
  b *c;
  short d
} e;
void f() {
  int g;
  char *h;
  e *i = f;
  short j = i->d;
  int a = i->c->a[0];
  for (;;)
    for (; g < a; g++) {
      *h = j * i->d >> 8;
      h++;
    }
}
$ clang -target aarch64-linux-gnu -w -c -O2 reduced.c
2023-07-25 10:35:41 +03:00
Florian Hahn
eea9258648
[LV] Re-use existing broadcast value for live-ins.
When requesting a vector value for a live-in, we can re-use the
broadcast of the live-in of part 0 for parts > 0.
2023-07-24 11:50:47 +01:00
Nikita Popov
bb3763e497 Revert "[SimplifyCFG] Allow dropping block that only contains ephemeral values"
This reverts commit 20f0c68fd83a0147a8ec1722bd2e848180610288.

https://reviews.llvm.org/D153966#4464594 reports an optimization
regression in Rust.

Additionally this change has caused an unexpected 0.3% compile-time
regression.
2023-06-30 21:24:05 +02:00
Nikita Popov
20f0c68fd8 [SimplifyCFG] Allow dropping block that only contains ephemeral values
Perform the TryToSimplifyUncondBranchFromEmptyBlock() transform if
the block is empty except for ephemeral values. The ephemeral values
will be dropped in that case.

This makes sure that assumes don't block this transforms, as reported
in https://discourse.llvm.org/t/llvm-assume-blocks-optimization/71609.

Differential Revision: https://reviews.llvm.org/D153966
2023-06-30 15:24:01 +02:00
Florian Hahn
d209084720
[VPlan] Replace versioned stride with constant during VPlan opts.
After constructing the initial VPlan, replace VPValues for versioned
strides with their constant counterparts.

Differential Revision: https://reviews.llvm.org/D147783
2023-06-13 08:26:55 +01:00
Nikita Popov
9cf67f6ea0 [LoopVectorize] Convert most tests to opaque pointers (NFC)
The unsized-pointee-crash.ll and zero-sized-pointee-crash.ll tests
have been removed, because these issues are not relevant for opaque
pointers.
2023-06-12 13:10:22 +02:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Nikita Popov
2ec1d0f427 [InstCombine] Don't reassociate GEPs for loop invariance
Since D146813, LICM will reassociate GEPs to expose hoisting
opportunities itself. Don't perform this transform in InstCombine,
where it is fragile because it depends on an optional LoopInfo
analysis.
2023-04-18 12:17:07 +02:00
Nikita Popov
39ca70392b [LV] Regenerate test checks (NFC) 2023-04-18 11:52:36 +02:00
Philip Reames
78ae870f11 {tests] Rerun autogen to reduce a diff [nfc] 2023-03-31 12:47:08 -07:00