1022 Commits

Author SHA1 Message Date
Florian Hahn
f8734a5e10
[SCEV] Introduce SCEVUse, use it instead of const SCEV * (NFCI). (#91961)
This patch introduces SCEVUse, which is a tagged pointer containing the
used const SCEV *, plus extra bits to store NUW/NSW flags that are only
valid at the specific use.

This was suggested by @nikic as an alternative
to https://github.com/llvm/llvm-project/pull/90742.

This patch just updates most SCEV infrastructure to operate on SCEVUse
instead of const SCEV *. It does not introduce any code that makes use
of the use-specific flags yet which I'll share as follow-ups.

Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=ee34eb6edccdebc2a752ffecdde5faae6b0d5593&to=5a7727d7819414d2acbc5b6ab740f0fc2363e842&stat=instructions%3Au
2026-03-13 16:23:06 +00:00
Alexis Engelke
94da4039cb
[Analysis][NFC] Drop use of BranchInst (#186374)
Largely straight-forward replacement.
2026-03-13 13:42:19 +00:00
Florian Hahn
e8908215de
[LSR] Support SCEVPtrToAddr in SCEVDbgValueBuilder.
Allow SCEVPtrToAddr as cast in assertion in SCEVDbgValueBuilder.
SCEVPtrToAddr is handled similarly to SCEVPtrToInt.

Fixes a crash with debug info after bd40d1de9c9ee, which started to
generate ptrtoaddr instead of ptrtoint expressions.
2026-02-07 14:02:45 +00:00
Austin Jiang
e6cdfb75ac
Fix typos and spelling errors across codebase (#156270)
Corrected various spelling mistakes such as 'occurred', 'receiver',
'initialized', 'length', and others in comments, variable names,
function names, and documentation throughout the project. These
changes improve code readability and maintain consistency in naming
and documentation.

Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
2026-01-13 11:52:46 -05:00
Rahul Joshi
7d96b39c4f
[NFC][LLVM] Adopt ListSeparator/interleaved in more places (#172909)
Adopt `ListSeparator` and `interleaved` in various places instead of
manual code to print separators between loop iterations.
2026-01-12 12:18:01 -08:00
Nikita Popov
8fd85ba9e6 [LLVM] Temporarily allow implicit truncation in some places
Split out from https://github.com/llvm/llvm-project/pull/171456.

This explicitly allows implicit truncation in a number of places,
prior to switching the default. This limits the scope of the
initial change.
2026-01-05 09:52:57 +01:00
Ramkumar Ramachandra
85fafd5db0
[SCEVExp] Get DL from SE, strip constructor arg (NFC) (#171823) 2025-12-11 14:26:47 +00:00
Nikita Popov
6960b633ee [LSR] Use getSigned() for negated immediate 2025-12-09 16:19:36 +01:00
John Brawn
ccd4e7b1ed
[LSR] Make OptimizeLoopTermCond able to handle some non-cmp conditions (#165590)
Currently OptimizeLoopTermCond can only convert a cmp instruction to
using a postincrement induction variable, which means it can't handle
predicated loops where the termination condition comes from
get_active_lane_mask. Relax this restriction so that we can handle any
kind of instruction, though only if it's the instruction immediately
before the branch (except for possibly an extractelement).
2025-12-03 15:28:46 +00:00
John Brawn
2ad71745cd
[LSR] Insert the transformed IV increment in the user block (#169515)
Currently we try to hoist the transformed IV increment instruction to
the header block to help with generation of postincrement instructions,
but this only works if the user instruction is also in the header. We
should instead be trying to insert it in the same block as the user.
2025-12-02 17:15:00 +00:00
John Brawn
53e7443e0c
[LSR] Don't count conditional loads/store as enabling pre/post-index (#159573)
When a load/store is conditionally executed in a loop it isn't a
candidate for pre/post-index addressing, as the increment of the address
would only happen on those loop iterations where the load/store is
executed.

Detect this and only discount the AddRec cost when the load/store is
unconditional.
2025-10-30 13:53:15 +00:00
John Brawn
8fab81121e
[LSR] Add an addressing mode that considers all addressing modes (#158110)
The way that loops strength reduction works is that the target has to
upfront decide whether it wants its addressing to be preindex,
postindex, or neither. This choice affects:
 * Which potential solutions we generate
* Whether we consider a pre/post index load/store as costing an AddRec
or not.

None of these choices are a good fit for either AArch64 or ARM, where
both preindex and postindex addressing are typically free:
* If we pick None then we count pre/post index addressing as costing one
addrec more than is correct so we don't pick them when we should.
* If we pick PreIndexed or PostIndexed then we get the correct cost for
that addressing type, but still get it wrong for the other and also
exclude potential solutions using offset addressing that could have less
cost.

This patch adds an "all" addressing mode that causes all potential
solutions to be generated and counts both pre and postindex as having
AddRecCost of zero. Unfortuntely this reveals problems elsewhere in how
we calculate the cost of things that need to be fixed before we can make
use of it.
2025-09-16 11:46:54 +01:00
Kazu Hirata
8b8b0f197f
[Scalar] Remove an unnecessary cast (NFC) (#150474)
getOperand() already returns Value *.
2025-07-24 15:50:00 -07:00
Nikita Popov
5f531827a4
[LSR] Do not consider uses in lifetime intrinsics (#149492)
We should ignore uses of pointers in lifetime intrinsics, as these are
not actually materialized in the final code, so don't affect register
pressure or anything else LSR needs to model.
    
Handling these only results in peculiar rewrites where additional
intermediate GEPs are introduced.
2025-07-18 16:13:00 +02:00
Jeremy Morse
c9d8b68676
[DebugInfo] Suppress lots of users of DbgValueInst (#149476)
This is another prune of dead code -- we never generate debug intrinsics
nowadays, therefore there's no need for these codepaths to run.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2025-07-18 11:31:52 +01:00
Jeremy Morse
57a5f9c47e
[DebugInfo][RemoveDIs] Suppress getNextNonDebugInfoInstruction (#144383)
There are no longer debug-info instructions, thus we don't need this
skipping. Horray!
2025-07-15 15:34:10 +01:00
John Brawn
f8c2c4f161
[LSR] Account for hardware loop instructions (#147958)
A hardware loop instruction combines a subtract, compare with zero, and
branch. We currently account for the compare and branch being combined
into one in Cost::RateFormula, as part of more general handling for
compare-branch-zero, but don't account for the subtract, leading to
suboptimal decisions in some cases.

Fix this in Cost::RateRegister by noticing when we have such a subtract
and discounting the AddRecCost in such a case.
2025-07-14 16:48:54 +01:00
Shan Huang
089106fdfb
[DebugInfo][LoopStrengthReduce] Salvage the debug value of the dead cmp instruction (#147241)
Fix #147238
2025-07-14 09:45:37 +08:00
Ramkumar Ramachandra
b7059ebafe
[LSR] Strip dead code (NFC) (#146109)
Nested AddRec is already rejected by the handling in pushSCEV().
2025-07-03 13:37:08 +01:00
Ramkumar Ramachandra
04cd0f2702
[LSR] Clean up code using SCEVPatternMatch (NFC) (#145556) 2025-06-28 11:41:53 +01:00
Jeremy Morse
9eb0020555
[DebugInfo][RemoveDIs] Remove a swathe of debug-intrinsic code (#144389)
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.

This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.

Co-authored-by: Nikita Popov <github@npopov.com>
2025-06-17 15:55:14 +01:00
John Brawn
a54712c8ec
[LSR] Make canHoistIVInc allow non-integer types (#143707)
canHoistIVInc was made to only allow integer types to avoid a crash in
isIndexedLoadLegal/isIndexedStoreLegal due to them failing an assertion
in getValueType (or rather in MVT::getVT which gets called from that)
when passed a struct type. Adjusting these functions to pass
AllowUnknown=true to getValueType means we don't get an assertion
failure (MVT::Other is returned which TLI->isIndexedLoadLegal should
then return false for), meaning we can remove this check for integer
type.
2025-06-16 15:23:40 +01:00
Kazu Hirata
f3867f900f
[llvm] Use *Map::try_emplace (NFC) (#143321)
- try_emplace(Key) is shorter than insert(std::make_pair(Key, 0)).
- try_emplace performs value initialization without value parameters.
- We overwrite values on successful insertion anyway.
2025-06-08 16:18:46 -07:00
Kazu Hirata
89308de4b0
[llvm] Value-initialize values with *Map::try_emplace (NFC) (#141522)
try_emplace value-initializes values, so we do not need to pass
nullptr to try_emplace when the value types are raw pointers or
std::unique_ptr<T>.
2025-05-26 15:13:02 -07:00
Florian Hahn
bc0c4db5d9
[SCEV] Add dedicated AffineAddRec matcher + loop matchers (NFC). (#141141)
Add dedicated m_scev_AffineAddRec matcher with 
complementing m_Loop() and m_SpecificLoop matchers.

PR: https://github.com/llvm/llvm-project/pull/141141
2025-05-25 08:40:31 +01:00
Kazu Hirata
0ef8ef66cc
[Transforms] Remove unused includes (NFC) (#141357)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-24 09:37:43 -07:00
Kazu Hirata
fe6290ef5b
[llvm] Use *Map::try_emplace (NFC) (#140843)
try_emplace can default-construct values, so we do not need to do so
on our own.  Plus, try_emplace(Key) is much shorter than
insert(std::make_pair(Key, Value()).
2025-05-21 01:11:01 -07:00
Ramkumar Ramachandra
61d3ad963c
[SCEVPatternMatch] Introduce m_scev_AffineAddRec (#140377)
Introduce m_scev_AffineAddRec to match affine AddRecs, a class_match for
SCEVConstant, and demonstrate their utility in LSR and SCEV. While at
it, rename m_Specific to m_scev_Specific for clarity.
2025-05-19 12:02:07 +01:00
Jon Chesterfield
9c60431b67
[NFC] Add a specialization of DenseMapInfo for SmallVector (#140380)
Equivalent to the three existing uses I found which were all pointers.
Implementing the general pattern so SmallVector<int> etc will work as
well.

Added to the SmallVector.h header as opposed to DenseMapInfo.h following
the StringRef.h and SmallBitVector.h prior art.

Noticed while writing an unrelated patch which currently wants a map
from small vectors to other things and cleaner to generalise than add
another specialisation to said patch.
2025-05-17 19:13:30 +01:00
Sergei Barannikov
cedeef6707
[LSR] Replace casts with an equivalent std::as_const (NFC) (#138980)
The casts / `std::as_const` are used here to select `const` overload of
`begin()`/`end()` so that the type of the returned iterator matches the
type of `J`, which is `const_iterator`.
2025-05-08 13:36:37 +03:00
David Green
98b6f8dc69
[CostModel] Remove optional from InstructionCost::getValue() (#135596)
InstructionCost is already an optional value, containing an Invalid
state that can be checked with isValid(). There is little point in
returning another optional from getValue(). Most uses do not make use of
it being a std::optional, dereferencing the value directly (either
isValid has been checked previously or the Cost is assumed to be valid).
The one case that does in AMDGPU used value_or which has been replaced
by a isValid() check.
2025-04-23 07:46:27 +01:00
Kazu Hirata
b01e25deba
[llvm] Call hash_combine_range with ranges (NFC) (#136511) 2025-04-20 16:36:03 -07:00
Kazu Hirata
0dcc201ac4
[Transforms] Use *Set::insert_range (NFC) (#132056)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch replaces:

  Dest.insert(Src.begin(), Src.end());

with:

  Dest.insert_range(Src);

This patch does not touch custom begin like succ_begin for now.
2025-03-19 15:35:01 -07:00
Jeremy Morse
34b139594a
[NFC][DebugInfo] Switch more call-sites to using iterator-insertion (#124283)
To finalise the "RemoveDIs" work removing debug intrinsics, we're
updating call sites that insert instructions to use iterators instead.
This set of changes are those where it's not immediately obvious that
just calling getIterator to fetch an iterator is correct, and one or two
places where more than one line needs to change.

Overall the same rule holds though: iterators generated for the start of
a block such as getFirstNonPHIIt need to be passed into insert/move
methods without being unwrapped/rewrapped, everything else can use
getIterator.
2025-01-27 16:44:14 +00:00
Jeremy Morse
e14962a39c
[NFC][DebugInfo] Use iterators for instruction insertion in more places (#124291)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators.
This patch changes some more complex call-sites, those crossing file
boundaries and where I've had to perform some minor rewrites.
2025-01-27 15:25:17 +00:00
Jeremy Morse
8e70273509
[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583)
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.

This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.

We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
2025-01-24 10:53:11 +00:00
Piotr Fusik
1a44a53cd5
[LSR][NFC] Use range-based for (#113889) 2024-11-05 07:11:23 +01:00
Kazu Hirata
94f9cbbe49
[Scalar] Remove unused includes (NFC) (#114645)
Identified with misc-include-cleaner.
2024-11-02 08:32:26 -07:00
Youngsuk Kim
caa32e6d6f
[llvm][LSR] Fix where invariant on ScaledReg & Scale is violated (#112576)
Comments attached to the `ScaledReg` field of `struct Formula` explains
that, `ScaledReg` must be non-null when `Scale` is non-zero.

This fixes up a code path where this invariant is violated. Also, add an
assert to ensure this invariant holds true.

Without this patch, compiler aborts with the attached test case.

Fixes #76504
2024-10-17 10:47:44 -04:00
Orlando Cazalet-Hyams
7506872afc
[DebugInfo][LSR] Fix assertion failure salvaging IV with offset > 64 bits wide (#110979)
Fixes #110494
2024-10-03 11:47:08 +01:00
Mehdi Amini
6c7a3f80e7
Fix LLVM_ENABLE_ABI_BREAKING_CHECKS macro check: use #if instead of #ifdef (#110938)
This macros is always defined: either 0 or 1. The correct pattern is to
use #if.

Re-apply #110185 with more fixes for debug build with the ABI breaking
checks disabled.
2024-10-03 01:24:14 +02:00
Sergey Kachkov
1f2a634c44 Reland "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107380)
Motivating example: https://godbolt.org/z/eb97zrxhx
Here we have 2 induction variables in the loop: one is corresponding to
i variable (add rdx, 4), the other - to res (add rax, 2). The second
induction variable can be removed by rewriteLoopExitValues() method
(final value of res at loop exit is unroll_iter * -2); however, this
doesn't happen because we have duplicated LCSSA phi nodes at loop exit:
```
; Preheader:
for.body.preheader.new:                           ; preds = %for.body.preheader
  %unroll_iter = and i64 %N, -4
  br label %for.body

; Loop:
for.body:                                         ; preds = %for.body, %for.body.preheader.new
  %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ]
  %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ]
  %inc.3 = add nuw i64 %i.07, 4
  %lsr.iv.next = add nsw i64 %lsr.iv, -2
  %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3
  br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7

; Exit blocks
for.end.loopexit.unr-lcssa.loopexit:              ; preds = %for.body
  %inc.3.lcssa = phi i64 [ %inc.3, %for.body ]
  %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ]
  %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ]
  br label %for.end.loopexit.unr-lcssa
```
rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses:
one in LCSSA phi node, the other - in induction phi node. Here we have 3
uses of this value because of duplicated lcssa nodes, so the transform
doesn't apply and leads to an extra add operation inside the loop. The
proposed solution is to accumulate inserted instructions that will
require LCSSA form update into SetVector and then call
formLCSSAForInstructions for this SetVector once, so the same
instructions don't process twice.

Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation
when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are
moved into separate duplicated-phis.ll test with explicit x86 target requirement
to fix bots which are not building this target.
2024-09-09 16:14:51 +03:00
dyung
2bf551e600
Revert "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107666)
Reverts llvm/llvm-project#107380

Change is causing the test preserve-lcssa.ll to fail on at least 2 build
bots:
- https://lab.llvm.org/buildbot/#/builders/190/builds/5231
- https://lab.llvm.org/buildbot/#/builders/161/builds/1855
2024-09-06 19:54:26 -07:00
Sergey Kachkov
2cb4d1b1bd
[LSR] Do not create duplicated PHI nodes while preserving LCSSA form (#107380)
Motivating example: https://godbolt.org/z/eb97zrxhx
Here we have 2 induction variables in the loop: one is corresponding to
i variable (add rdx, 4), the other - to res (add rax, 2). The second
induction variable can be removed by rewriteLoopExitValues() method
(final value of res at loop exit is unroll_iter * -2); however, this
doesn't happen because we have duplicated LCSSA phi nodes at loop exit:
```
; Preheader:
for.body.preheader.new:                           ; preds = %for.body.preheader
  %unroll_iter = and i64 %N, -4
  br label %for.body

; Loop:
for.body:                                         ; preds = %for.body, %for.body.preheader.new
  %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ]
  %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ]
  %inc.3 = add nuw i64 %i.07, 4
  %lsr.iv.next = add nsw i64 %lsr.iv, -2
  %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3
  br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7

; Exit blocks
for.end.loopexit.unr-lcssa.loopexit:              ; preds = %for.body
  %inc.3.lcssa = phi i64 [ %inc.3, %for.body ]
  %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ]
  %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ]
  br label %for.end.loopexit.unr-lcssa
```
rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses:
one in LCSSA phi node, the other - in induction phi node. Here we have 3
uses of this value because of duplicated lcssa nodes, so the transform
doesn't apply and leads to an extra add operation inside the loop. The
proposed solution is to accumulate inserted instructions that will
require LCSSA form update into SetVector and then call
formLCSSAForInstructions for this SetVector once, so the same
instructions don't process twice.
2024-09-06 18:39:47 +03:00
Nikita Popov
7660981402 [LSR] Use computeConstantDifference()
This API is faster than getMinusSCEV() and a SCEVConstant cast.
2024-08-28 12:20:59 +02:00
Philip Reames
27a62ec72a
[LSR] Split the -lsr-term-fold transformation into it's own pass (#104234)
This transformation doesn't actually use any of the internal state of
LSR and recomputes all information from SCEV.  Splitting it out makes
it easier to test.
    
Note that long term I would like to write a version of this transform
which *is* integrated with LSR's solver, but if that happens, we'll
just delete the extra pass.
    
Integration wise, I switched from using TTI to using a pass configuration
variable.  This seems slightly more idiomatic, and means we don't run
the extra logic on any target other than RISCV.
2024-08-17 18:34:23 -07:00
Benjamin Maxwell
7fad04e94b
[LSR] Fix matching vscale immediates (#100080)
Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have
> 2 operands. Previously, the vscale immediate matching did not check
the number of operands of the `SCEVMulExpr`, so would ignore any
operands after the first two.

This led to incorrect codegen (and results) for ArmSME in IREE
(https://github.com/iree-org/iree), which sometimes addresses things
that are a `vscale * vscale` multiple away. The test added with this
change shows an example reduced from IREE. The second write should
be offset from the first `16 * vscale * vscale` (* 4 bytes), however,
previously LSR dropped the second vscale and instead offset the write by
`#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes).
2024-07-24 10:06:34 +01:00
Shan Huang
d83d09facd
[DebugInfo][LoopStrengthReduce] Fix missing debug location updates (#97519)
Fix #97510 .

Note that, for the new phi instruction `NewPH`, which replaces the old
phi `PH` and the cast `ShadowUse`, I choose to propagate the debug
location of `PH` to it, because the cast is eliminated according to the
optimization semantics.
2024-07-15 09:44:18 +08:00
Kazu Hirata
2f55e55101
[Transforms] Use range-based for loops (NFC) (#98725) 2024-07-14 13:44:50 -07:00
Graham Hunter
4311b14e9c
[LSR] Recognize vscale-relative immediates (#88124)
Extends LoopStrengthReduce to recognize immediates multiplied by vscale, and query the current target for whether they are legal offsets for memory operations or adds.
2024-07-01 09:23:31 +01:00