986 Commits

Author SHA1 Message Date
Piotr Fusik
1a44a53cd5
[LSR][NFC] Use range-based for (#113889) 2024-11-05 07:11:23 +01:00
Kazu Hirata
94f9cbbe49
[Scalar] Remove unused includes (NFC) (#114645)
Identified with misc-include-cleaner.
2024-11-02 08:32:26 -07:00
Youngsuk Kim
caa32e6d6f
[llvm][LSR] Fix where invariant on ScaledReg & Scale is violated (#112576)
Comments attached to the `ScaledReg` field of `struct Formula` explains
that, `ScaledReg` must be non-null when `Scale` is non-zero.

This fixes up a code path where this invariant is violated. Also, add an
assert to ensure this invariant holds true.

Without this patch, compiler aborts with the attached test case.

Fixes #76504
2024-10-17 10:47:44 -04:00
Orlando Cazalet-Hyams
7506872afc
[DebugInfo][LSR] Fix assertion failure salvaging IV with offset > 64 bits wide (#110979)
Fixes #110494
2024-10-03 11:47:08 +01:00
Mehdi Amini
6c7a3f80e7
Fix LLVM_ENABLE_ABI_BREAKING_CHECKS macro check: use #if instead of #ifdef (#110938)
This macros is always defined: either 0 or 1. The correct pattern is to
use #if.

Re-apply #110185 with more fixes for debug build with the ABI breaking
checks disabled.
2024-10-03 01:24:14 +02:00
Sergey Kachkov
1f2a634c44 Reland "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107380)
Motivating example: https://godbolt.org/z/eb97zrxhx
Here we have 2 induction variables in the loop: one is corresponding to
i variable (add rdx, 4), the other - to res (add rax, 2). The second
induction variable can be removed by rewriteLoopExitValues() method
(final value of res at loop exit is unroll_iter * -2); however, this
doesn't happen because we have duplicated LCSSA phi nodes at loop exit:
```
; Preheader:
for.body.preheader.new:                           ; preds = %for.body.preheader
  %unroll_iter = and i64 %N, -4
  br label %for.body

; Loop:
for.body:                                         ; preds = %for.body, %for.body.preheader.new
  %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ]
  %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ]
  %inc.3 = add nuw i64 %i.07, 4
  %lsr.iv.next = add nsw i64 %lsr.iv, -2
  %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3
  br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7

; Exit blocks
for.end.loopexit.unr-lcssa.loopexit:              ; preds = %for.body
  %inc.3.lcssa = phi i64 [ %inc.3, %for.body ]
  %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ]
  %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ]
  br label %for.end.loopexit.unr-lcssa
```
rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses:
one in LCSSA phi node, the other - in induction phi node. Here we have 3
uses of this value because of duplicated lcssa nodes, so the transform
doesn't apply and leads to an extra add operation inside the loop. The
proposed solution is to accumulate inserted instructions that will
require LCSSA form update into SetVector and then call
formLCSSAForInstructions for this SetVector once, so the same
instructions don't process twice.

Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation
when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are
moved into separate duplicated-phis.ll test with explicit x86 target requirement
to fix bots which are not building this target.
2024-09-09 16:14:51 +03:00
dyung
2bf551e600
Revert "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107666)
Reverts llvm/llvm-project#107380

Change is causing the test preserve-lcssa.ll to fail on at least 2 build
bots:
- https://lab.llvm.org/buildbot/#/builders/190/builds/5231
- https://lab.llvm.org/buildbot/#/builders/161/builds/1855
2024-09-06 19:54:26 -07:00
Sergey Kachkov
2cb4d1b1bd
[LSR] Do not create duplicated PHI nodes while preserving LCSSA form (#107380)
Motivating example: https://godbolt.org/z/eb97zrxhx
Here we have 2 induction variables in the loop: one is corresponding to
i variable (add rdx, 4), the other - to res (add rax, 2). The second
induction variable can be removed by rewriteLoopExitValues() method
(final value of res at loop exit is unroll_iter * -2); however, this
doesn't happen because we have duplicated LCSSA phi nodes at loop exit:
```
; Preheader:
for.body.preheader.new:                           ; preds = %for.body.preheader
  %unroll_iter = and i64 %N, -4
  br label %for.body

; Loop:
for.body:                                         ; preds = %for.body, %for.body.preheader.new
  %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ]
  %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ]
  %inc.3 = add nuw i64 %i.07, 4
  %lsr.iv.next = add nsw i64 %lsr.iv, -2
  %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3
  br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7

; Exit blocks
for.end.loopexit.unr-lcssa.loopexit:              ; preds = %for.body
  %inc.3.lcssa = phi i64 [ %inc.3, %for.body ]
  %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ]
  %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ]
  br label %for.end.loopexit.unr-lcssa
```
rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses:
one in LCSSA phi node, the other - in induction phi node. Here we have 3
uses of this value because of duplicated lcssa nodes, so the transform
doesn't apply and leads to an extra add operation inside the loop. The
proposed solution is to accumulate inserted instructions that will
require LCSSA form update into SetVector and then call
formLCSSAForInstructions for this SetVector once, so the same
instructions don't process twice.
2024-09-06 18:39:47 +03:00
Nikita Popov
7660981402 [LSR] Use computeConstantDifference()
This API is faster than getMinusSCEV() and a SCEVConstant cast.
2024-08-28 12:20:59 +02:00
Philip Reames
27a62ec72a
[LSR] Split the -lsr-term-fold transformation into it's own pass (#104234)
This transformation doesn't actually use any of the internal state of
LSR and recomputes all information from SCEV.  Splitting it out makes
it easier to test.
    
Note that long term I would like to write a version of this transform
which *is* integrated with LSR's solver, but if that happens, we'll
just delete the extra pass.
    
Integration wise, I switched from using TTI to using a pass configuration
variable.  This seems slightly more idiomatic, and means we don't run
the extra logic on any target other than RISCV.
2024-08-17 18:34:23 -07:00
Benjamin Maxwell
7fad04e94b
[LSR] Fix matching vscale immediates (#100080)
Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have
> 2 operands. Previously, the vscale immediate matching did not check
the number of operands of the `SCEVMulExpr`, so would ignore any
operands after the first two.

This led to incorrect codegen (and results) for ArmSME in IREE
(https://github.com/iree-org/iree), which sometimes addresses things
that are a `vscale * vscale` multiple away. The test added with this
change shows an example reduced from IREE. The second write should
be offset from the first `16 * vscale * vscale` (* 4 bytes), however,
previously LSR dropped the second vscale and instead offset the write by
`#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes).
2024-07-24 10:06:34 +01:00
Shan Huang
d83d09facd
[DebugInfo][LoopStrengthReduce] Fix missing debug location updates (#97519)
Fix #97510 .

Note that, for the new phi instruction `NewPH`, which replaces the old
phi `PH` and the cast `ShadowUse`, I choose to propagate the debug
location of `PH` to it, because the cast is eliminated according to the
optimization semantics.
2024-07-15 09:44:18 +08:00
Kazu Hirata
2f55e55101
[Transforms] Use range-based for loops (NFC) (#98725) 2024-07-14 13:44:50 -07:00
Graham Hunter
4311b14e9c
[LSR] Recognize vscale-relative immediates (#88124)
Extends LoopStrengthReduce to recognize immediates multiplied by vscale, and query the current target for whether they are legal offsets for memory operations or adds.
2024-07-01 09:23:31 +01:00
Nikita Popov
2d209d964a
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...

`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
2024-06-27 16:38:15 +02:00
David Green
c7308d405d
[LSR][AArch64] Optimize chain generation based on legal addressing modes (#94453)
LSR will generate chains of related instructions with a known increment
between them. With SVE, in the case of the test case, this can include
increments like 'vscale * 16 + 8'. The idea of this patch is if we have
a '+8' increment already calculated in the chain, we can generate a
(legal) '+ vscale*16' addressing mode from it, allowing us to use the
'[x16, #1, mul vl]' addressing mode instructions.

In order to do this we keep track of the known 'bases' when generating
chains in GenerateIVChain, checking for each if the accumulated
increment expression from the base neatly folds into a legal addressing
mode. If they do not we fall back to the existing LeftOverExpr, whether
it is legal or not.

This is mostly orthogonal to #88124, dealing with the generation of
chains as opposed to rest of LSR. The existing vscale addressing mode
work has greatly helped compared to the last time I looked at this,
allowing us to check that the addressing modes are indeed legal.
2024-06-10 20:35:33 +01:00
Alex Bradbury
5a20141539
[LSR] Provide TTI hook to enable dropping solutions deemed to be unprofitable (#89924)
<https://reviews.llvm.org/D126043> introduced a flag to drop solutions
if deemed unprofitable. As noted there, introducing a TTI hook enables
backends to individually opt into this behaviour.

This will be used by #89927.
2024-06-05 14:40:58 +02:00
Philip Reames
baca93fc83 [LSR] Tweak debug output to always print initial cost 2024-05-14 13:34:20 -07:00
Graham Hunter
2e8d815596
[TTI] Support scalable offsets in getScalingFactorCost (#88113)
Part of the work to support vscale-relative immediates in LSR.
2024-05-10 11:22:11 +01:00
Stephen Tozer
ffd08c7759
[RemoveDIs][NFC] Rename DPValue -> DbgVariableRecord (#85216)
This is the major rename patch that prior patches have built towards.
The DPValue class is being renamed to DbgVariableRecord, which reflects
the updated terminology for the "final" implementation of the RemoveDI
feature. This is a pure string substitution + clang-format patch. The
only manual component of this patch was determining where to perform
these string substitutions: `DPValue` and `DPV` are almost exclusively
used for DbgRecords, *except* for:

- llvm/lib/target, where 'DP' is used to mean double-precision, and so
appears as part of .td files and in variable names. NB: There is a
single existing use of `DPValue` here that refers to debug info, which
I've manually updated.
- llvm/tools/gold, where 'LDPV' is used as a prefix for symbol
visibility enums.

Outside of these places, I've applied several basic string
substitutions, with the intent that they only affect DbgRecord-related
identifiers; I've checked them as I went through to verify this, with
reasonable confidence that there are no unintended changes that slipped
through the cracks. The substitutions applied are all case-sensitive,
and are applied in the order shown:

```
  DPValue -> DbgVariableRecord
  DPVal -> DbgVarRec
  DPV -> DVR
```

Following the previous rename patches, it should be the case that there
are no instances of any of these strings that are meant to refer to the
general case of DbgRecords, or anything other than the DPValue class.
The idea behind this patch is therefore that pure string substitution is
correct in all cases as long as these assumptions hold.
2024-03-19 20:07:07 +00:00
Stephen Tozer
2e865353ed
[RemoveDIs][NFC] Move DPValue::filter -> filterDbgVars (#85208)
This patch changes DPValue::filter to be a non-member method
filterDbgVars. There are two reasons for this: firstly, the name of
DPValue is about to change to DbgVariableRecord, which will result in
every `for` loop that uses DPValue::filter to require a line break. This
is a small thing, but it makes the rename patch more difficult to
review, and is just generally more awkward for what is a fairly common
loop. Secondly, the intent is to later break up the DPValue class into
subclasses, at which point it would be better to have a non-member
function that allows template arguments for the cases we want to filter
with greater specificity.
2024-03-14 12:19:15 +00:00
Nikita Popov
beba307c5b [LSR] Clear SCEVExpander before deleting phi nodes
Fixes https://github.com/llvm/llvm-project/issues/84709.
2024-03-12 16:24:10 +01:00
Stephen Tozer
15f3f446c5
[RemoveDIs][NFC] Rename common interface functions for DPValues->DbgRecords (#84793)
As part of the effort to rename the DbgRecord classes, this patch
renames the widely-used functions that operate on DbgRecords but refer
to DbgValues or DPValues in their names to refer to DbgRecords instead;
all such functions are defined in one of `BasicBlock.h`,
`Instruction.h`, and `DebugProgramInstruction.h`.

This patch explicitly does not change the names of any comments or
variables, except for where they use the exact name of one of the
renamed functions. The reason for this is reviewability; this patch can
be trivially examined to determine that the only changes are direct
string substitutions and any results from clang-format responding to the
changed line lengths. Future patches will cover renaming variables and
comments, and then renaming the classes themselves.
2024-03-12 14:53:13 +00:00
Jeremy Morse
2fe81edef6 [NFC][RemoveDIs] Insert instruction using iterators in Transforms/
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.

There are two general flavours of update:
 * Almost all call-sites just call getIterator on an instruction
 * Several make use of an existing iterator (scenarios where the code is
   actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.

Noteworthy changes:
 * FindInsertedValue now takes an optional iterator rather than an
   instruction pointer, as we need to always insert with iterators,
 * I've added a few iterator-taking versions of some value-tracking and
   DomTree methods -- they just unwrap the iterator. These are purely
   convenience methods to avoid extra syntax in some passes.
 * A few calls to getNextNode become std::next instead (to keep in the
   theme of using iterators for positions),
 * SeparateConstOffsetFromGEP has it's insertion-position field changed.
   Noteworthy because it's not a purely localised spelling change.

All this should be NFC.
2024-03-05 15:12:22 +00:00
Patrick O'Neill
8d6e867eb2
[LSR][term-fold] Ensure the simple recurrence is from the current loop (#83085)
If the phi node found by matchSimpleRecurrence is not from the current
loop, then isAlmostDeadIV panics. With this patch we bail out early.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

---------

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2024-03-04 16:40:40 -08:00
Nikita Popov
72763521c3 [LSR] Clear SCEVExpander before calling DeleteDeadPHIs
To avoid an assertion failure when an AssertingVH is removed,
as reported in:
https://github.com/llvm/llvm-project/pull/82362#issuecomment-1960067147

Also remove an unnecessary use of SCEVExpanderCleaner.
2024-02-22 22:50:26 +01:00
Orlando Cazalet-Hyams
ababa96475
[RemoveDIs][NFC] Introduce DbgRecord base class [1/3] (#78252)
Patch 1 of 3 to add llvm.dbg.label support to the RemoveDIs project. The
patch stack adds a new base class

    -> 1. Add DbgRecord base class for DPValue and the not-yet-added
          DPLabel class.
       2. Add the DPLabel class.
       3. Enable dbg.label conversion and add support to passes.

Patches 1 and 2 are NFC.

In the near future we also will rename DPValue to DbgVariableRecord and
DPLabel to DbgLabelRecord, at which point we'll overhaul the function
names too. The name DPLabel keeps things consistent for now.
2024-02-20 16:00:55 +00:00
Philip Reames
28865da374
[LSR][term-fold] Adjust expansion budget based on trip count (#80304)
Follow up to https://github.com/llvm/llvm-project/pull/74747

This change extends the previously added fixed expansion threshold by
scaling down the cost allowed for an expansion for a loop with either a
small known trip count or a profile which indicates the trip count is
likely small. The goal here is to improve code generation for a loop
nest where the outer loop has a high trip count, and the inner loop runs
only a handful of iterations.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2024-02-02 08:57:26 -08:00
Philip Reames
f264da4322
[lsr][term-fold] Restrict transform to low cost expansions (#74747)
This is a follow up to an item I noted in my submission comment for
e947f95. I don't have a real world example where this is triggering
unprofitably, but avoiding the transform when we estimate the loop to be
short running from profiling seems quite reasonable. It's also now come
up as a possibility in a regression twice in two days, so I'd like to
get this in to close out the possibility if nothing else.

The original review dropped the threshold for short trip count loops. I
will return to that in a separate review if this lands.
2024-01-31 14:48:20 -08:00
Craig Topper
3dea0aa8f4
[LSR] Fix incorrect comment. NFC (#79207) 2024-01-23 17:57:34 -08:00
Stephen Tozer
7c53e9f667
[RemoveDIs][DebugInfo] Add support for DPValues to LoopStrengthReduce (#78706)
This patch trivially extends support for DbgValueInst recovery to
DPValues in LoopStrengthReduce; they are handled identically, so this is
mostly done by reusing the DbgValueInst code (using templates or
auto-parameter lambdas to reduce actual code duplication).
2024-01-22 18:59:19 +00:00
Philip Reames
f5dd70c582
[LSR] Require non-zero step when considering wrap around for term folding (#77809)
The term folding logic needs to prove that the induction variable does
not cycle through the same set of values so that testing for the value
of the IV on the exiting iteration is guaranteed to trigger only on that
iteration. The prior code checked the no-self-wrap property on the IV,
but this is insufficient as a zero step is trivially no-self-wrap per
SCEV's definition but does repeat the same series of values.

In the current form, this has the effect of basically disabling lsr's
term-folding for all non-constant strides. This is still a net
improvement as we've disabled term-folding entirely, so being able to
enable it for constant strides is still a net improvement.

As future work, there's two SCEV weakness worth investigating.

First sext (or i32 %a, 1) to i64 does not return true for
isKnownNonZero. This is because we check only the unsigned range in that
query. We could either do query pushdown, or check the signed range as
well. I tried the second locally and it has very broad impact - i.e. we
have a bunch of missing optimizations here.

Second, zext (or i32 %a, 1) to i64 as the increment to the IV in
expensive_expand_short_tc causes the addrec to no longer be provably
no-self-wrap. I didn't investigate this so it might be necessary, but
the loop structure is such that I find this result surprising.
2024-01-11 10:07:17 -08:00
Craig Topper
d8ddcae547 [LSR] Fix typo in debug message where backspace escape was used instead of new line. 2023-12-24 10:35:27 -08:00
Joshua Cao
bd382032f6
[BBUtils][NFC] Delete SplitLandingPadPredecessors with DT (#73406)
Function is marked for deprecation. There is only one consumer which is
converted to use DomTreeUpdater.
2023-12-02 11:33:43 -08:00
Simon Pilgrim
3246a32d3f Fix MSVC "not all control paths return a value" warning. NFC. 2023-11-30 10:07:01 +00:00
Philip Reames
e947f95337 [LSR][TTI][RISCV] Enable terminator folding for RISC-V
If looking for a miscompile revert candidate, look here!

The transform being enabled prefers comparing to a loop invariant
exit value for a secondary IV over using an otherwise dead primary
IV.  This increases register pressure (by requiring the exit value
to be live through the loop), but reduces the number of instructions
within the loop by one.

On RISC-V which has a large number of scalar registers, this is
generally a profitable transform.  We loose the ability to use a beqz
on what is typically a count down IV, and pay the cost of computing
the exit value on the secondary IV in the loop preheader, but save
an add or sub in the loop body.  For anything except an extremely
short running loop, or one with extreme register pressure, this is
profitable.  On spec2017, we see a 0.42% geomean improvement in
dynamic icount, with no individual workload regressing by more than
0.25%.

Code size wise, we trade a (possibly compressible) beqz and a (possibly
compressible) addi for a uncompressible beq.  We also add instructions
in the preheader.  Net result is a slight regression overall, but
neutral or better inside the loop.

Previous versions of this transform had numerous cornercase correctness
bugs.  All of them ones I can spot by inspection have been fixed, and I
have run this through all of spec2017, but there may be further issues
lurking.  Adding uses to an IV is a fraught thing to do given poison
semantics, so this transform is somewhat inherently risky.

This patch is a reworked version of D134893 by @eop.  That patch has
been abandoned since May, so I picked it up, reworked it a bit, and
am landing it.
2023-11-29 12:04:06 -08:00
Nikita Popov
e408f70524 [LSR] Avoid use of ConstantExpr::getCast() (NFC)
Use the constant folding API instead, which must succeed as we're
working on a ConstantInt.
2023-11-02 09:48:04 +01:00
David Green
54e5de08d4
[ARM][LSR] Exclude uses outside the loop when favoring postinc. (#67090)
Extra uses for variables outside the loop can mess with the generation
of postinc variables. This patch alters the collection of loop invariant
fixups in LSR when the target is optimizing for PostInc, to exclude the
collection of these extra uses. It is expected that the variable can be
rematerialized, which will lead to a more optimal sequence of
instructions in the loop.
2023-09-25 10:09:36 +01:00
Nikita Popov
d35e5afc87 [LSR] Simplify type check for opaque pointers (NFC)
For pointer types, checking the address space is the same as
type equality now, so we no longer need the special case.
2023-09-22 10:23:04 +02:00
Bjorn Pettersson
db456dc6ba [LSR] Drop support for typed pointers
The opaque pointers are already "canonicalized". So remove the
redundant/obsolete code.
2023-09-07 16:37:45 +02:00
Nuno Lopes
d8e2821170 [LSR] Use poison instead of undef as placeholder [NFC]
This value is patcher afterwards, and only used temporarily during dbg info construction
2023-07-23 15:57:21 +01:00
Nikita Popov
ddb46abd3c [LSR] Don't consider users of constant outside loop
In CollectLoopInvariantFixupsAndFormulae(), LSR looks at users
outside the loop. E.g. if we have an addrec based on %base, and
%base is also used outside the loop, then we have to keep it in a
register anyway, which may make it more profitable to use
%base + %idx style addressing.

This reasoning doesn't hold up when the base is a constant, because
the constant can be rematerialized. The lsr-memcpy.ll test regressed
when enabling opaque pointers, because inttoptr (i64 6442450944 to ptr)
now also has a use outside the loop (previously it didn't due to a
pointer type difference), and that extra "use" results in worse use
of addressing modes in the loop. However, the use outside the loop
actually gets rematerialized, so the alleged register saving does
not occur.

The same reasoning also applies to other types of constants, such
as global variable references.

Differential Revision: https://reviews.llvm.org/D155073
2023-07-13 12:22:38 +02:00
Florian Hahn
7f5b15ad15
[LSR] Move normalization check to normalizeForPostIncUse.
Move the logic added in 3a57152d85e1 to normalizeForPostIncUse to catch
additional un-invertable cases. This fixes another mis-compile pointed
out by @peixin in D153004.
2023-07-04 11:56:51 +01:00
Florian Hahn
3a57152d85
[LSR] Return nullptr from getExpr if the result isn't invertible.
getExpr is missing a check to make sure the result is invertible.
This can lead to incorrect results, so return nullptr in those cases
like in other places in IVUsers.

Fixes #62660.

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D153202
2023-06-22 19:10:48 +01:00
Luke Lau
438cc10b8e [IR] Add getAccessType to Instruction
There are multiple places in the code where the type of memory being accessed from an instruction needs to be obtained, including an upcoming patch to improve GEP cost modeling. This deduplicates the logic between them. It's not strictly NFC as EarlyCSE/LoopStrengthReduce may catch more intrinsics now.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D150583
2023-06-21 16:17:25 +01:00
Florian Hahn
dae5cd73cb
Recommit "[LSR] Consider post-inc form when creating extends/truncates."
This reverts the revert commit 1797ab36efc9c90c921cd725831f8c3f6a7125a2.

The recommitted version now checks the PostIncLoopSets for all fixups
and returns nullptr if the result doesn't match for all fixups.
2023-06-19 17:57:06 +01:00
Florian Hahn
1797ab36ef
Revert "[LSR] Consider post-inc form when creating extends/truncates."
This reverts commit abfeda5af329b5889db709ff74506e20e0b569e9.
and fe19036e1266d2a90b44725c82b898134906e4c3.

The added assertion triggers during clang bootstrap builds. Revert while
I investigate.
2023-06-17 17:58:41 +01:00
Florian Hahn
abfeda5af3
[LSR] Consider post-inc form when creating extends/truncates.
GenerateTruncates at the moment creates extends/truncates for post-inc
uses of normalized expressions. For example, if an add rec of the form
{1,+,-1} is used outside the loop, the normalized form will use {1,+,-1}
instead of {0,+,-1}. When naively sign-extending the normalized
expression, it will get extended incorrectly to {1,+,-1} for the wider
type, if the backedge-taken count of the loop is 1.

To address this, the patch updates GenerateTruncates to check if the
LSRUse contains any fixups with PostIncLoops. If that's the case, first
de-normalize the expression, then perform the extend/truncate, then
normalize again.

There may be other places where similar checks are needed and the helper
can be generalized for those cases. I'd not be surprised if other subtle
mis-compiles are caused by this.

Fixes #38847.
Fixes #58039.
Fixes #62852.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D153004
2023-06-17 09:58:37 +01:00
Nikita Popov
143ed21b26 Revert "[LCSSA] Remove unused ScalarEvolution argument (NFC)"
This reverts commit 5362a0d859d8e96b3f7c0437b7866e17a818a4f7.

In preparation for reverting a dependent revision.
2023-06-05 16:45:38 +02:00
Nikita Popov
d5c56c5162 [SCEVExpander] Remember phi nodes inserted by LCSSA construction
SCEVExpander keeps track of all instructions it inserted. However,
it currently misses some phi nodes created during LCSSA construction.
Fix this by collecting these into another argument.

This also removes the IRBuilder argument, which was added for
essentially the same purpose, but only handles the root LCSSA nodes,
not those inserted by SSAUpdater.

This was reported as a regression on D149344, but the reduced test
case also reproduces without it.

Differential Revision: https://reviews.llvm.org/D150681
2023-05-25 09:34:19 +02:00