115 Commits

Author SHA1 Message Date
Nikita Popov
9f3d1695eb
[SCEVExpander] Preserve gep nuw during expansion (#102133)
When expanding SCEV adds to geps, transfer the nuw flag to the resulting
gep. (Note that this doesn't apply to IV increment GEPs, which go
through a different code path.)
2024-10-02 11:45:00 +02:00
Florian Hahn
53266f73f0
[VPlan] Run DCE after unrolling.
This cleans up a number of dead recipes after unrolling if only their
first or last parts are used. This simplifies a number of tests.

Fixes https://github.com/llvm/llvm-project/issues/109581.
2024-09-22 22:08:46 +01:00
Florian Hahn
8ec406757c
[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842)
This patch implements explicit unrolling by UF  as VPlan transform. In
follow up patches this will allow simplifying VPTransform state (no need
to store unrolled parts) as well as recipe execution (no need to
generate code for multiple parts in an each recipe). It also allows for
more general optimziations (e.g. avoid generating code for recipes that
are uniform-across parts).

It also unifies the logic dealing with unrolled parts in a single place,
rather than spreading it out across multiple places (e.g. VPlan post
processing for header-phi recipes previously.)

In the initial implementation, a number of recipes still take the
unrolled part as additional, optional argument, if their execution
depends on the unrolled part.

The computation for start/step values for scalable inductions changed
slightly. Previously the step would be computed as scalar and then
splatted, now vscale gets splatted and multiplied by the step in a
vector mul.

This has been split off https://github.com/llvm/llvm-project/pull/94339
which also includes changes to simplify VPTransfomState and recipes'
::execute.

The current version mostly leaves existing ::execute untouched and
instead sets VPTransfomState::UF to 1.

A follow-up patch will clean up all references to VPTransformState::UF.

Another follow-up patch will simplify VPTransformState to only store a
single vector value per VPValue.

PR: https://github.com/llvm/llvm-project/pull/95842
2024-09-21 19:47:37 +01:00
Philip Reames
2c7786e94a
Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770)
This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In general, we prefer 0.0
over -0.0 as the identity value for an fadd. We use that value in
several places, but don't in others. So, let's be consistent and use the
same identity (when nsz allows) everywhere.

This creates a bunch of test churn, but due to 924907bc6, most of that
churn doesn't actually indicate a change in codegen. The exception is
that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd
reductions. Or said differently, it allows the neutral value of an
ordered fadd reduction to be 0.0.
2024-09-03 09:16:37 -07:00
Nikita Popov
f044564db1
[InstCombine] Make backedge check in op of phi transform more precise (#106075)
The op of phi transform wants to prevent moving an operation across a
backedge, as this may lead to an infinite combine loop.

Currently, this is done using isPotentiallyReachable(). The problem with
that is that all blocks inside a loop are reachable from each other.
This means that the op of phi transform is effectively completely
disabled for code inside loops, even when it's not actually operating on
a loop phi (just a phi that happens to be in a loop).

Fix this by explicitly computing the backedges inside the function
instead. Do this via RPOT, which is a bit more efficient than using
FindFunctionBackedges() (which does it without any pre-computed
analyses).

For irreducible cycles, there may be multiple possible choices of
backedge, and this just picks one of them. This is still sufficient to
prevent combine loops.

This also removes the last use of LoopInfo in InstCombine -- I'll drop
the analysis in a followup.
2024-09-02 09:09:21 +02:00
Nikita Popov
a105877646
[InstCombine] Remove some of the complexity-based canonicalization (#91185)
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.

However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.

The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.

For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
2024-08-21 12:02:54 +02:00
Nikita Popov
c3c2370c9a [Tests] Regenerate test checks (NFC) 2024-08-06 12:59:55 +02:00
Florian Hahn
9a5a8731e7
[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. (#94760)
This patch introduces a new ResumePhi VPInstruction which creates a phi
in a leaf block of a VPlan. The first use is to create the phi node for
fixed-order recurrence resume values in the scalar preheader.

The VPInstruction takes 2 operands: 1) the incoming value from the
middle-block and a default value to be used for all other incoming
blocks.

In follow-up changes, it will also be used to create phis for reduction
and induction resume values.

Depends on https://github.com/llvm/llvm-project/pull/92651

PR: https://github.com/llvm/llvm-project/pull/94760
2024-07-11 16:08:04 +01:00
Florian Hahn
99d6c6d936
[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently
introduced VPIRBasicBlock.

Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.

After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.

This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.

Depends on https://github.com/llvm/llvm-project/pull/92525

PR: https://github.com/llvm/llvm-project/pull/92651
2024-07-05 10:08:42 +01:00
Noah Goldstein
2632680006 [InstCombine] Canonicalize (gep <not i8> p, (div exact X, C))
If C % sizeof(gep_element_type) is zero, we can canonicalize to `i8` via:
    `(gep i8 p, (div exact X, C / (sizeof(gep_element_type))))`

Closes #96898
2024-07-01 22:22:35 +08:00
David Green
352a836176
[InstCombine] Canonicalize non-i8 gep of mul to i8 (#96606)
This is a small canonicalization for `gep i32, p, (mul x, C)` -> `gep
i8, p, (mul x, C*4)`, so that the mul can combine both of the constant
multiplications, and we take a small step towards canonicalizing more
geps to i8.

It currently doesn't attempt to check for multiple uses on the mul, but
that should be possible if it sounds better. Let me know what you think
of the idea in general.
2024-06-26 14:25:54 +01:00
Florian Hahn
3808ba78de
[VPlan] Model middle block via VPIRBasicBlock. (#95816)
Use VPIRBasicBlock to wrap the middle block and implement patching up
branches in predecessors in VPIRBasicBlock::execute. The IR middle block
is only created after skeleton creation. Initially a regular
VPBasicBlock is created, which will later be replaced by a
VPIRBasicBlock once the middle IR basic block has been created.

Note that this slightly changes the order of instructions created in the
middle block; code generated by recipe execution in the middle block
will now be inserted before the terminator (and in between the compare
to used by the terminator). The original order will be restored in
https://github.com/llvm/llvm-project/pull/92651.


PR: https://github.com/llvm/llvm-project/pull/95816
2024-06-20 13:42:20 +01:00
Florian Hahn
05e1b5340b
[VPlan] Model FOR resume value extraction in VPlan. (#93396)
This patch uses the ExtractFromEnd VPInstruction opcode
to extract the value of a FOR to be used as resume value for the ph in
the scalar loop.

It adds a new live-out that temporarily wraps the FOR phi in the scalar
loop. fixFixedOrderRecurrence will process live outs for fixed order
recurrence phis by creating a new phi node in the scalar preheader, 
using the generated value for the live-out as incoming value from the
middle block and the original start value as incoming value for the
other edge. Creation of the phi in the preheader, as well as updating
the phi in the scalar loop will also be moved to VPlan in the future,
eventually retiring fixFixedOrderRecurrence

Depends on https://github.com/llvm/llvm-project/pull/93395

PR: https://github.com/llvm/llvm-project/pull/93396
2024-06-05 11:18:06 +01:00
Nikita Popov
7c0d52ca91
[ValueTracking] Support dominating known bits condition in and/or (#74728)
This extends computeKnownBits() support for dominating conditions to
also handle and/or conditions. We'll look through either and or or
depending on which edge we're considering.

This change is mainly for the sake of completeness, so we don't start
missing optimizations if SimplifyCFG decides to merge some branches.
2024-02-08 09:47:49 +01:00
Nikita Popov
90ba33099c
[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00
Yingwei Zheng
6681650025
[InstCombine] Revert the signed icmp -> unsigned icmp canonicalization when folding icmp Pred min|max(X, Y), Z (#76685)
This patch tries to flip the signedness of predicates when folding an
unsigned icmp with a signed min/max. It will enable more optimizations
as we canonicalizes a signed icmp into an unsigned icmp when both
operands are known to have the same sign.
Fixes #76672.

Compile-time impact:
http://llvm-compile-time-tracker.com/compare.php?from=949ec83eaf6fa6dbffb94c2ea9c0a4d5efdbd239&to=2deca1aea8a4e13609bab72c522a97d424f0fc2d&stat=instructions:u


|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|-0.00%|+0.01%|+0.05%|-0.12%|-0.01%|-0.03%|-0.00%|

NOTE: We can flip the signedness of predicate if both operands are
negative. But I don't see the benefit of handling these cases.
2024-01-05 14:39:16 +08:00
Florian Hahn
f18536d642
[VPlan] Model address separately. (#72164)
Move vector pointer generation to a separate VPVectorPointerRecipe.
This untangles address computation from the memory recipes future
and is also needed to enable explicit unrolling in VPlan.

https://github.com/llvm/llvm-project/pull/72164
2024-01-01 19:51:15 +00:00
Nikita Popov
d77067d08a
[ValueTracking] Add dominating condition support in computeKnownBits() (#73662)
This adds support for using dominating conditions in computeKnownBits()
when called from InstCombine. The implementation uses a
DomConditionCache, which stores which branches may provide information
that is relevant for a given value.

DomConditionCache is similar to AssumptionCache, but does not try to do
any kind of automatic tracking. Relevant branches have to be explicitly
registered and invalidated values explicitly removed. The necessary
tracking is done inside InstCombine.

The reason why this doesn't just do exactly the same thing as
AssumptionCache is that a lot more transforms touch branches and branch
conditions than assumptions. AssumptionCache is an immutable analysis
and mostly gets away with this because only a handful of places have to
register additional assumptions (mostly as a result of cloning). This is
very much not the case for branches.

This change regresses compile-time by about ~0.2%. It also improves
stage2-O0-g builds by about ~0.2%, which indicates that this change results
in additional optimizations inside clang itself.

Fixes https://github.com/llvm/llvm-project/issues/74242.
2023-12-06 14:17:18 +01:00
Nikita Popov
f0faff8b9b [LoopVectorize] Regenerate test checks (NFC) 2023-11-28 15:50:27 +01:00
Craig Topper
03d4a9d94d
[InstCombine] Set disjoint flag when turning Add into Or. (#72702)
The disjoint flag was recently added to IR in #72583
2023-11-27 12:54:11 -08:00
Philip Reames
3f2ed812f0
[InstCombine] Infer nneg on zext when forming from non-negative sext (#70706)
Builds on #67982 which recently introduced the nneg flag on a zext
instruction. InstCombine is one of our largest canonicalizers of zext
from non-negative sext instructions, so set the flag there.
2023-10-30 12:09:43 -07:00
Dmitriy Smirnov
e13bed4c5f [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP
This patch tries to canonicalise add + gep to gep + gep.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D155688
2023-10-06 12:29:06 +01:00
Florian Hahn
96e83d3705
[LV] Use IRBuilder to create and optimize middle-block compare.
Split off from D150398 to avoid builder-related diff changes there.
Using IRBuilder to create ICmps simplifies the result if both operands
are constants.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158332
2023-08-29 11:42:18 +01:00
Florian Hahn
707359ecf5
Recommit "[LV] Re-use existing broadcast value for live-ins."
This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53.

Recommits eea9258648ce with a fix to only erase the instruction from the
first part if it is defined outside the loop. This fixes a
use-after-free error reported.
2023-08-01 15:54:02 +01:00
Nikita Popov
d01aec4c76 [InstCombine] Set dead phi inputs to poison in more cases
Set phi inputs to poison whenever we find a dead edge (either
during initial worklist population or the main InstCombine run),
instead of only doing this for successors of dead blocks.

This means that the phi operand is set to poison even if for
critical edges without an intermediate block.

There are quite a few test changes, because the pattern is fairly
common in vectorizer output, for cases where we know the vectorized
loop will be entered.
2023-08-01 11:53:47 +02:00
Martin Storsjö
245ec675a4 Revert "[LV] Re-use existing broadcast value for live-ins."
This reverts commit eea9258648ce73507f6f85c395de978af659d498.

That commit triggered crashes in the following testcase:

$ cat reduced.c
typedef struct {
  int a[8]
} b;
typedef struct {
  b *c;
  short d
} e;
void f() {
  int g;
  char *h;
  e *i = f;
  short j = i->d;
  int a = i->c->a[0];
  for (;;)
    for (; g < a; g++) {
      *h = j * i->d >> 8;
      h++;
    }
}
$ clang -target aarch64-linux-gnu -w -c -O2 reduced.c
2023-07-25 10:35:41 +03:00
Florian Hahn
eea9258648
[LV] Re-use existing broadcast value for live-ins.
When requesting a vector value for a live-in, we can re-use the
broadcast of the live-in of part 0 for parts > 0.
2023-07-24 11:50:47 +01:00
Nikita Popov
745cfa3449 [InstCombine] Compute known bits for multi-use add/sub
We were failing to set the known bits for add/sub in the multi-use
case, resulting in odd behavioral differences depending on the
number of uses. Noticed while adding a consistency assertion.

The test changes are essentially a revert to the state before
d6498ab. These changes are not really desirable, but if we don't
want them, that needs to be handled as part of the heuristic for
demanded constant shrinking, not by artifically suppressing the
known bits in one specific case.
2023-05-17 17:50:00 +02:00
Noah Goldstein
d840391401 [ValueTracking] Add logic for isKnownNonZero(smin/smax X, Y)
For `smin` if either `X` or `Y` is negative, the result is non-zero.
For `smax` if either `X` or `Y` is strictly positive, the result is
non-zero.

For both if `X != 0` and `Y != 0` the result is non-zero.

Alive2 Link:
    https://alive2.llvm.org/ce/z/7yvbgN
    https://alive2.llvm.org/ce/z/zizbvq

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D149417
2023-04-30 10:06:46 -05:00
Florian Hahn
35af27c30a
[VPlan] Only create extracts for recurrence exits if there are live-outs.
Move the code to collect live-out earlier and only generate extracts for
exit values if there are any live-outs that use them.

Depends on D147472.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147567
2023-04-10 21:08:34 +01:00
Paul Walker
eae26b6640 [IRBuilder] Use canonical i64 type for insertelement index used by vector splats.
Instcombine prefers this canonical form (see getPreferredVectorIndex),
as does IRBuilder when passing the index as an integer so we may as
well use the prefered form from creation.

NOTE: All test changes are mechanical with nothing else expected
beyond a change of index type from i32 to i64.

Differential Revision: https://reviews.llvm.org/D140983
2023-01-11 14:08:06 +00:00
Florian Hahn
68469a80cb
[LV] Disable runtime unrolling for vectorized loops.
This patch adds metadata to disable runtime unrolling to the vectorized
loop. If runtime unrolling/interleaving is considered profitable, LV
will interleave the loop directly. There should be no need to perform
runtime unrolling at a later stage.

Note that we already add metadata to disable runtime unrolling to the
scalar loop after vectorization.

The additional unrolling unnecessarily increases code size and compile
time. In addition to that we have several bug reports of unncessary
runtime unrolling for vectorized loops, e.g. PR40961

Compile-time improvements:

  NewPM-O3: -1.04%
  NewPM-ReleaseThinLTO: -0.59%
  NewPM-ReleaseLTO-g: -0.97%

https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u

Fixes #40306.

Reviewed By: lebedev.ri, nikic

Differential Revision: https://reviews.llvm.org/D115261
2023-01-06 10:56:17 +00:00
Roman Lebedev
4def99e642
[InstCombine] Try to fold not into cmp iff other users of cmp are freely invertible
There is still some such patterns that require collaboration
of folds to handle,that we don't currently do.
2022-12-19 00:24:28 +03:00
Nikita Popov
5b40015063 [LoopVectorize] Convert some tests to opaque pointers (NFC)
For these tests update_test_checks.py had to be rerun.
2022-12-14 15:27:31 +01:00
Roman Lebedev
be51fa4580
[NFC] Port all runlines for LoopVectorize pass tests to -passes syntax 2022-12-05 22:17:30 +03:00
Simon Pilgrim
09cb9fdef9 [InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635)
Alive2: https://alive2.llvm.org/ce/z/sZ6wwS

As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero.

Differential Revision: https://reviews.llvm.org/D134172
2022-09-20 16:44:41 +01:00
Sanjay Patel
d6498abc24 [InstCombine] remove multi-use add demanded constant fold
This was originally part of D133788. There are no visible
regressions. All of the diffs show a large unsigned constant
becoming a small negative constant. This should be better
for analysis (and slightly less compile-time) and codegen.
2022-09-18 14:23:43 -04:00
Philip Reames
4c4c0d2c06 [LV] Use safe-divisor lowering for fixed vectors if profitable
This extends the safe-divisor widening scheme recently added for scalable vectors to handle fixed vectors as well.

Differential Revision: https://reviews.llvm.org/D132591
2022-09-08 09:15:54 -07:00
Florian Hahn
422cf99161
[VPlan] Only generate single instr for loads uniform across all parts.
VPReplicateRecipe::isUniform actually means uniform-per-parts, hence a
scalar instruction is generated per-part.

This is a potential alternative D132892. For now the current patch only
catches cases where the address is trivially invariant (defined outside
VPlan), while D132892 catches any address that is considered invariant
by SCEV AFAICT.

It should be possible to hoist fully invariant recipes feeding loads out
of the vector loop region as well, but in practice LICM should do that
already.

This version of the patch artificially limits this to loads to make it
easier to compare, but this restriction should be easily liftable.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133019
2022-09-08 14:27:58 +01:00
Philip Reames
b20104f644 [LV] Update a test which appears to have been editted without regen [nfc] 2022-08-24 11:05:49 -07:00
Nuno Lopes
9df0b254d2 [NFC] Switch a few uses of undef to poison as placeholders for unreachable code 2022-07-23 21:50:11 +01:00
Philip Reames
be25f52fec [LV] Autogen several tests for ease of update in upcoming change 2022-07-20 07:17:51 -07:00
Nikita Popov
356d47ccb9 [ValueTracking] Handle and/or on RHS of isImpliedCondition()
isImpliedCondition() currently handles and/or on the LHS, but not
on the RHS, resulting in asymmetric behavior. This patch adds two
new implication rules:

 * LHS ==> (RHS1 || RHS2) if LHS ==> RHS1 or LHS ==> RHS2
 * LHS ==> !(RHS1 && RHS2) if LHS ==> !RHS1 or LHS ==> !RHS2

Differential Revision: https://reviews.llvm.org/D125551
2022-05-16 16:30:26 +02:00
Nikita Popov
0c00dbb975 [LoopVectorize] Regenerate test checks (NFC) 2022-05-13 16:41:48 +02:00
Dávid Bolvanský
872f7000fc Revert "[NFCI] Regenerate SROA/LoopVectorize test checks"
This reverts commit 14e3450fb57305aa9ff3e9e60687b458e43835c9.
2022-04-04 01:15:30 +02:00
Dávid Bolvanský
a113a582b1 [NFCI] Regenerate LoopVectorize test checks 2022-04-03 21:56:24 +02:00
Andrew Wei
0af3e6a22d [InstCombine] Sink instructions with multiple users in a successor block.
This patch tries to sink instructions when they are only used in a successor block.

This is a further enhancement patch based on Anna's commit:
D109700, which allows sinking an instruction having multiple uses in a single user.

In this patch, sink instructions with multiple users in a single successor block will be supported.
It could fix a known issue from rust:
  https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610

Reviewed By: nikic, reames

Differential Revision: https://reviews.llvm.org/D121585
2022-03-18 11:53:45 +08:00
Florian Hahn
95f76bff1c
[LV] Create & use VPScalarIVSteps for all scalar users.
This patch is a follow-up to D115953. It updates optimizeInductions
to also introduce new VPScalarIVStepsRecipes if an IV has both vector
and scalar uses.

It updates all uses that only need scalar values to use the newly
created recipe for the scalar steps.

This completes untangling of VPWidenIntOrFpInductionRecipe
code-generation. Now the recipe *only* creates the widened vector
values, as it says on the tin.

The code to genereate IR has been moved directly to
VPWidenIntOrFpInductionRecipe::execute.

Note that the recipe has been updated to hold a reference to
ScalarEvolution, which is needed to expand the step, until we can place
the corresponding SCEV expansion in the pre-header.

Depends on D120827.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D120828
2022-03-13 17:15:24 +00:00
Nikita Popov
26748bb15a [InstCombine] Slightly relax one-use check in abs canonicalization
Treat the icmp and sub symmetrically, and require that one of them
has one use, not the icmp in particular. This could be further
relaxed in the abs (but not nabs) case to not check one-use at
all.
2022-03-01 15:06:41 +01:00
Nikita Popov
7c080e4649 [LoopVectorize] Regenerate test checks (NFC) 2022-03-01 15:01:14 +01:00