462 Commits

Author SHA1 Message Date
Mingjie Xu
227edfb2f4
[CodeGenPrepare][NFC] Reland: Update the dominator tree instead of rebuilding it (#179040)
The original differential revision is https://reviews.llvm.org/D153638

Reverted in
f5b5a30858
because of causing a clang crash.

This patch relands it with the crash fixed. Call `DTU->flush()` in each
iteration of `while (MadeChange)`
loop, flush all awaiting BasicBlocks deletion, and prevent iterator
invalidation.
2026-03-31 09:01:11 +08:00
nataliakokoromyti
fa0071baab
[CodeGenPrepare] Fix infinite loop with same-type bitcasts (#176694)
OptimizeNoopCopyExpression was sinking same-type bitcasts (e.g. bitcast
i32 to i32) which would then be reintroduced by optimizePhiType, causing
an infinite loop.

Fix by adding a check (PhiTy == ConvertTy) in optimizePhiType to skip
the conversion when types are already identical.

Fixes #176688.
2026-01-22 09:29:08 +01:00
Aiden Grossman
e2d7cd685d
[IR] Make dead_on_return attribute optionally sized
This patch makes the dead_on_return parameter attribute optionally require a number
of bytes to be passed in to specify the number of bytes known to be dead
upon function return/unwind. This is aimed at enabling annotating the
this pointer in C++ destructors with dead_on_return in clang. We need
this to handle cases like the following:

```
struct X {
  int n;
  ~X() {
    this[n].n = 0;
  }
};
void f() {
  X xs[] = {42, -1};
}
```

Where we only certain that sizeof(X) bytes are dead upon return of ~X.
Otherwise DSE would be able to eliminate the store in ~X which would not
be correct.

This patch only does the wiring within IR. Future patches will make
clang emit correct sizing information and update DSE to only delete
stores to objects marked dead_on_return that are provably in bounds of
the number of bytes specified to be dead_on_return.

Reviewers: nikic, alinas, antoniofrighetto

Pull Request: https://github.com/llvm/llvm-project/pull/171712
2026-01-21 08:22:05 -08:00
David Green
a4975a8089
[CGP][AArch64] Do not sink instructions that might read/write memory. (#176182)
The test case's call instruction was being sank past the point where the
memory
it accessed was valid. Add a check that CGP does not try to sink
instruction that
might be invalid to move.

Fixes #176095
2026-01-18 14:18:25 +08:00
Nikita Popov
7f6afc499f [CGP] Use getSigned() for scale during address sinking
The scale is a signed quantity.

This avoids an assertion failure with github.com/llvm/llvm-project/pull/171456.
2026-01-06 17:23:53 +01:00
willmafh
2eb8ee137f
[NFC] Delete unnecessary apostrophe at the end of its (#173974) 2026-01-04 20:02:40 +08:00
Mingjie Xu
fac9472593
[IR] Reland Optimize PHINode::removeIncomingValue() and PHINode::removeIncomingValueIf() to use the swapping strategy. (#174274)
Reland #171963, #172639 and #173444, they are reverted in
86b9f90b9574b3a7d15d28a91f6316459dcfa046 because of introducing
non-determinism in compiles.
The non-determinism has been fixed in
9b8addffa70cee5b2acc5454712d9cf78ce45710.
2026-01-04 09:24:53 +08:00
Walter Lee
86b9f90b95
Revert 159f1c048e08a8780d92858cfc80e723c90235e3 (#173893)
This causes non-determinism in compiles.

From nikic: "FYI the non-determinism is also visible on
llvm-opt-benchmark. Maybe repeatedly running test cases from
299446d99f
could reproduce the issue..."

Also revert dependent 796fafeff92fe5d2d20594859e92607116e30a16 and
e135447bda617125688b71d33480d131d1076a72.
2025-12-29 20:23:13 -05:00
Mingjie Xu
159f1c048e
[IR] Optimize PHINode::removeIncomingValue() by swapping removed incoming value with the last incoming value. (#171963)
Current implementation uses `std::copy` to shift all incoming values
after the removed index. This patch optimizes
`PHINode::removeIncomingValue()` by replacing the linear shift of
incoming values with a swap-with-last strategy.

After this change, the relative order of incoming values after removal
is not preserved.

This improves compile-time for PHI nodes with many predecessors.

Depends:
https://github.com/llvm/llvm-project/pull/171955
https://github.com/llvm/llvm-project/pull/171956
https://github.com/llvm/llvm-project/pull/171960
https://github.com/llvm/llvm-project/pull/171962
2025-12-17 19:44:01 +08:00
Yingwei Zheng
59e601a3d5
[CodeGenPrepare] Don't simplify incomplete expression tree in AddrModeCombine (#164628)
Since new select/phi instructions may construct loops, the expression
tree to be simplified may still be incomplete (i.e., it may contain
select with dummy values or phi without incoming values). This patch
removes the call to simplifyInstruction for now, as it doesn't break
existing tests.

Original PR: https://reviews.llvm.org/D36073
Fix the crash reported in
https://github.com/llvm/llvm-project/pull/163453#issuecomment-3429922732.
2025-10-25 16:47:32 +08:00
paperchalice
3c2dae6919
[test][Transforms] Remove unsafe-fp-math uses part 1 (NFC) (#164742)
Post cleanup for #164534.
2025-10-23 11:39:11 +08:00
Nikita Popov
573ca36753
[IR] Replace alignment argument with attribute on masked intrinsics (#163802)
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.

This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)

It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
2025-10-20 08:50:09 +00:00
Vladimir Radosavljevic
be7f85168d
[CGP] Fix missing sign extension for base offset in optimizeMemoryInst (#161377)
If we have integers larger than 64-bit we need to explicitly sign extend
them, otherwise we will get wrong zero extended values.
2025-10-10 10:52:52 +00:00
Craig Topper
4be1099607
[RISCV] Improve fixed vector handling in isCtpopFast. (#158380)
Previously we considered fixed vectors fast if Zvbb or Zbb is
enabled. Zbb only helps if the vector type will end up being
scalarized.
2025-09-16 09:47:09 -07:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Paul Walker
be4a739a7f [LLVM][CDP] Move AArch64 test into AArch64 directory. 2025-08-05 11:04:59 +00:00
Paul Walker
94d374ab6c
[LLVM][CGP] Allow finer control for sinking compares. (#151366)
Compare sinking is selectable based on the result of
hasMultipleConditionRegisters. This function is too coarse grained by
not taking into account the differences between scalar and vector
compares. This PR extends the interface to take an EVT to allow finer
control.
    
The new interface is used by AArch64 to disable sinking of scalable
vector compares, but with isProfitableToSinkOperands updated to maintain
the cases that are specifically tested.
2025-08-05 11:43:41 +01:00
Yingwei Zheng
2d0ca09305
[CodeGenPrepare] Make sure that AddOffset is also a loop invariant (#150625)
Closes https://github.com/llvm/llvm-project/issues/150611.
2025-07-26 00:23:56 +08:00
Philip Reames
48ef55ce3e [CGP] Update tests to use autogen scripts, and refresh check lines
Reducing manual update work required for an upcoming change.
2025-07-03 11:36:33 -07:00
Evgenii Kudriashov
5ffdd9480d
[CodeGenPrepare] Filter out unrecreatable addresses from memory optimization (#143566)
Follow up on #139303
2025-06-28 23:30:03 +02:00
Philip Reames
0ef27186c9 [tests] Additional coverage for gather/scatter address optimizations 2025-06-26 11:50:57 -07:00
Florian Hahn
dde30a4731
[CGP] Bail out if (Base|Scaled)Reg does not dominate insert point. (#142949)
(Base|Scaled)Reg may not dominate the chosen insert point, if there are
multiple uses of the address. Bail out if that's the case, otherwise we
will generate invalid IR.

In some cases, we could probably adjust the insert point or hoist the
(Base|Scaled)Reg.

Fixes https://github.com/llvm/llvm-project/issues/142830.

PR: https://github.com/llvm/llvm-project/pull/142949
2025-06-06 12:38:30 +01:00
weiguozhi
59c6d70ed8
[CodeGenPrepare] Make sure instruction get from SunkAddrs is before MemoryInst (#139303)
Function optimizeBlock may do optimizations on a block for multiple
times. In the first iteration of the loop, MemoryInst1 may generate a
sunk instruction and store it into SunkAddrs. In the second iteration of
the loop, MemoryInst2 may use the same address and then it can reuse the
sunk instruction stored in SunkAddrs, but MemoryInst2 may be before
MemoryInst1 and the corresponding sunk instruction. In order to avoid
use before def error, we need to find appropriate insert position for the
 sunk instruction.

Fixes #138208.
2025-05-15 09:27:25 -07:00
Orlando Cazalet-Hyams
60d0bc1fae
Propagate DebugLocs on phis in BreakCriticalEdges (#133492)
The pull request discusses whether this change is needed or not. We leant
towards "it can't hurt" on the basis that it's at worst slightly unecessary
(but not incorret).

The motivation for the patch came from reviewing code duplication sites to
update for Key Instructions, finding this, trying to generate a test case and
seeing the DebugLocs aren't propagated.
2025-05-08 13:01:48 +01:00
Sergei Barannikov
5080a0251f
[CodeGenPrepare] Unfold slow ctpop when used in power-of-two test (#102731)
DAG combiner already does this transformation, but in some cases it does
not have a chance because either CodeGenPrepare or SelectionDAGBuilder
move icmp to a different basic block.

https://alive2.llvm.org/ce/z/ARzh99

Fixes #94829

Pull Request: https://github.com/llvm/llvm-project/pull/102731
2025-04-23 08:54:10 +03:00
Nikita Popov
20507a9e95
[Verifier][CGP] Allow integer argument to dbg_declare (#134803)
Relaxes the newly added verifier rule to also allow an integer argument
to dbg_declare, which is interpreted as a pointer. Adjust CGP to deal with
it gracefully.

Fixes https://github.com/llvm/llvm-project/issues/134523.
Alternative to https://github.com/llvm/llvm-project/pull/134601.
2025-04-10 12:29:56 +02:00
Jeremy Morse
792a6f8119
[RemoveDIs] Remove "try-debuginfo-iterators..." test flags (#130298)
These date back to when the non-intrinsic format of variable locations
was still being tested and was behind a compile-time flag, so not all
builds / bots would correctly run them. The solution at the time, to get
at least some test coverage, was to have tests opt-in to non-intrinsic
debug-info if it was built into LLVM.

Nowadays, non-intrinsic format is the default and has been on for more
than a year, there's no need for this flag to exist.

(I've downgraded the flag from "try" to explicitly requesting
non-intrinsic format in some places, so that we can deal with tests that
are explicitly about non-intrinsic format in their own commit).
2025-03-14 15:50:49 +00:00
Mingming Liu
5399782508
[IR] Generalize Function's {set,get}SectionPrefix to GlobalObjects, the base class of {Function, GlobalVariable, IFunc} (#125757)
This is a split of https://github.com/llvm/llvm-project/pull/125756
2025-02-06 14:51:13 -08:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Stephen Tozer
822f74a911
[Clang] Cleanup docs and comments relating to -fextend-variable-liveness (#124767)
This patch contains a number of changes relating to the above flag;
primarily it updates comment references to the old flag names,
"-fextend-lifetimes" and "-fextend-this-ptr" to refer to the new names,
"-fextend-variable-liveness[={all,this}]". These changes are all NFC.

This patch also removes the explicit -fextend-this-ptr-liveness flag
alias, and shortens the help-text for the main flag; these are both
changes that were meant to be applied in the initial PR (#110000), but
due to some user-error on my part they were not included in the merged
commit.
2025-01-28 18:25:32 +00:00
David Sherwood
346185c42c
[AArch64] Improve codegen of vectorised early exit loops (#119534)
Once PR #112138 lands we are able to start vectorising more loops
that have uncountable early exits. The typical loop structure
looks like this:

vector.body:
  ...
  %pred = icmp eq <2 x ptr> %wide.load, %broadcast.splat
  ...
  %or.reduc = tail call i1 @llvm.vector.reduce.or.v2i1(<2 x i1> %pred)
  %iv.cmp = icmp eq i64 %index.next, 4
  %exit.cond = or i1 %or.reduc, %iv.cmp
  br i1 %exit.cond, label %middle.split, label %vector.body

middle.split:
  br i1 %or.reduc, label %found, label %notfound

found:
  ret i64 1

notfound:
  ret i64 0

The problem with this is that %or.reduc is kept live after the loop,
and since this is a boolean it typically requires making a copy of
the condition code register. For AArch64 this requires an additional
cset instruction, which is quite expensive for a typical find loop
that only contains 6 or 7 instructions.

This patch attempts to improve the codegen by sinking the reduction
out of the loop to the location of it's user. It's a lot cheaper to
keep the predicate alive if the type is legal and has lots of
registers for it. There is a potential downside in that a little
more work is required after the loop, but I believe this is worth
it since we are likely to spend most of our time in the loop.
2025-01-06 13:17:14 +00:00
Yingwei Zheng
6568ceb9fa
[CodeGenPrepare] Drop nsw flags in optimizeLoadExt (#118180)
Alive2: https://alive2.llvm.org/ce/z/pMcD7q
Closes https://github.com/llvm/llvm-project/issues/118172.
2024-12-01 11:25:31 +08:00
Lee Wei
1ca64c5fb7
[llvm] Remove br i1 undef from some regression tests [NFC] (#115691)
This PR aims to remove undefined behavior from tests under the directory
`llvm/transforms/CodegenPrepare, ConstantHoisting, Coroutines` etc.
2024-11-11 12:56:31 +00:00
Paul Walker
38fffa630e
[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548) 2024-11-06 11:53:33 +00:00
goldsteinn
1e072ae289
[CGP] [CodeGenPrepare] Folding urem with loop invariant value plus offset (#104724)
This extends the existing fold:

```
for(i = Start; i < End; ++i)
   Rem = (i nuw+- IncrLoopInvariant) u% RemAmtLoopInvariant;
```
 ->
```
Rem = (Start nuw+- IncrLoopInvariant) % RemAmtLoopInvariant;
for(i = Start; i < End; ++i, ++rem)
   Rem = rem == RemAmtLoopInvariant ? 0 : Rem;
```

To work with a non-zero `IncrLoopInvariant`.

This is a common usage in cases such as:

```
for(i = 0; i < N; ++i)
    if ((i + 1) % X) == 0)
        do_something_occasionally_but_not_first_iter();
```

Alive2 w/ i4/unrolled 6x (needs to be ran locally due to timeout):
https://alive2.llvm.org/ce/z/6tgyN3

Exhaust proof over all uint8_t combinations in C++:
https://godbolt.org/z/WYa561388
2024-10-31 09:14:33 -05:00
Antonio Frighetto
d79c4c1119 [CGP] Regenerate revert-constant-ptr-propagation-on-calls.ll test (NFC)
Multiple buildbots were previously failing.
2024-09-02 09:55:43 +02:00
Antonio Frighetto
e4e0dfb0c2 [CGP] Undo constant propagation of pointers across calls
It may be profitable to revert SCCP propagation of C++ static values,
if such constants are pointers, in order to avoid redundant pointer
computation, since the method returning the constant is non-removable.
2024-09-02 09:33:23 +02:00
Antonio Frighetto
ed6d9f6d2a [CGP] Introduce test for PR102926 (NFC) 2024-09-02 09:33:23 +02:00
Stephen Tozer
3d08ade7bd
[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149)
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:

https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850

This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).

Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
2024-08-29 17:53:32 +01:00
Simon Pilgrim
f673882323
[X86] Allow speculative BSR/BSF instructions on targets with CMOV (#102885)
Currently targets without LZCNT/TZCNT won't speculate with BSR/BSF instructions in case they have a zero value input, meaning we always insert a test+branch for the zero-input case.

This patch proposes we allow speculation if the target has CMOV, and perform a branchless select instead to handle the zero input case. This will predominately help x86-64 targets where we haven't set any particular cpu target. We already always perform BSR/BSF instructions if we were lowering a CTLZ/CTTZ_ZERO_UNDEF instruction.
2024-08-22 11:11:00 +01:00
Sjoerd Meijer
70e8c982d0
[AArch64] Bail out for scalable vecs in areExtractShuffleVectors (#105484)
The added test triggers the following assert in `areExtractShuffleVectors`
that is called from `shouldSinkOperands`:

Assertion `(!isScalable() || isZero()) && "Request for a fixed element count on a scalable object"' failed.

I don't think scalable types can be extract shuffles, so bail early if
this is the case.
2024-08-21 15:27:09 +01:00
Noah Goldstein
e4c67ba67e Recommit "[CodeGenPrepare] Folding urem with loop invariant value"
Was missing remainder on `Start` value.

Also changed logic as as nikic suggested (getting loop from `PN`
instead of `Rem`). The prior impl increased the complexity of the code
and made debugging it more difficult.

Closes #104877
2024-08-20 09:17:49 -07:00
Noah Goldstein
9b25ad818c [CodeGenPrepare][X86] Add tests for fixing urem transform; NFC 2024-08-20 09:17:49 -07:00
Noah Goldstein
731ae694a3 Revert "[CodeGenPrepare] Folding urem with loop invariant value"
This reverts commit c64ce8bf283120fd145a57d0e61f9697f719139d.

Seems to be causing stage2 failures on buildbots. Reverting while I
investigate.
2024-08-18 20:36:35 -07:00
Noah Goldstein
c64ce8bf28 [CodeGenPrepare] Folding urem with loop invariant value
```
for(i = Start; i < End; ++i)
   Rem = (i nuw+ IncrLoopInvariant) u% RemAmtLoopInvariant;
```
 ->
```
Rem = (Start nuw+ IncrLoopInvariant) % RemAmtLoopInvariant;
for(i = Start; i < End; ++i, ++rem)
   Rem = rem == RemAmtLoopInvariant ? 0 : Rem;
```

In its current state, only if `IncrLoopInvariant` and `Start` both
being zero.

Alive2 seemed unable to prove this (see:
https://alive2.llvm.org/ce/z/ATGDp3 which is clearly wrong but still
checks out...) so wrote an exhaustive test here:
https://godbolt.org/z/WYa561388

Closes #96625
2024-08-18 15:58:24 -07:00
Noah Goldstein
f16125a13c [CodeGenPrepare][X86] Add tests for folding urem with loop invariant value; NFC 2024-08-18 15:58:24 -07:00
David Green
0e124537aa
[AArch64] Sink operands to fmuladd. (#102297)
A fmuladd can be treated as a fma when sinking operands to the
intrinsic, similar to D126234.

Addresses a small part of #102195
2024-08-09 11:48:37 +01:00
Fangrui Song
8ea31db272 [CodeGenPrepare] Use MapVector to stabilize iteration order
DenseMap iteration order is not guaranteed to be deterministic.

Without the change,
llvm/test/Transforms/CodeGenPrepare/X86/statepoint-relocate.ll would
fail when `combineHashValue` changes (#95970).

Fixes: dba7329ebb0dbe1fabb3faaedfd31da3b8bd611d
2024-06-18 17:19:51 -07:00
Stephen Tozer
094572701d
[RemoveDIs] Print IR with debug records by default (#91724)
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records. 

If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.

For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
2024-06-14 15:07:27 +01:00
Nikita Popov
d10b76552f
[ConstantFold] Remove notional over-indexing fold (#93697)
The data-layout independent constant folding currently has some rather
gnarly code for canonicalizing GEP indices to reduce "notional
overindexing", and then infers inbounds based on that canonicalization.

Now that we canonicalize to i8 GEPs, this canonicalization is
essentially useless, as we'll discard it as soon as the GEP hits the
data-layout aware constant folder anyway. As such, I'd like to remove
this code entirely.

This shouldn't have any impact on optimization capabilities.
2024-05-30 08:36:44 +02:00