453 Commits

Author SHA1 Message Date
Yingwei Zheng
59e601a3d5
[CodeGenPrepare] Don't simplify incomplete expression tree in AddrModeCombine (#164628)
Since new select/phi instructions may construct loops, the expression
tree to be simplified may still be incomplete (i.e., it may contain
select with dummy values or phi without incoming values). This patch
removes the call to simplifyInstruction for now, as it doesn't break
existing tests.

Original PR: https://reviews.llvm.org/D36073
Fix the crash reported in
https://github.com/llvm/llvm-project/pull/163453#issuecomment-3429922732.
2025-10-25 16:47:32 +08:00
paperchalice
3c2dae6919
[test][Transforms] Remove unsafe-fp-math uses part 1 (NFC) (#164742)
Post cleanup for #164534.
2025-10-23 11:39:11 +08:00
Nikita Popov
573ca36753
[IR] Replace alignment argument with attribute on masked intrinsics (#163802)
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.

This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)

It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
2025-10-20 08:50:09 +00:00
Vladimir Radosavljevic
be7f85168d
[CGP] Fix missing sign extension for base offset in optimizeMemoryInst (#161377)
If we have integers larger than 64-bit we need to explicitly sign extend
them, otherwise we will get wrong zero extended values.
2025-10-10 10:52:52 +00:00
Craig Topper
4be1099607
[RISCV] Improve fixed vector handling in isCtpopFast. (#158380)
Previously we considered fixed vectors fast if Zvbb or Zbb is
enabled. Zbb only helps if the vector type will end up being
scalarized.
2025-09-16 09:47:09 -07:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Paul Walker
be4a739a7f [LLVM][CDP] Move AArch64 test into AArch64 directory. 2025-08-05 11:04:59 +00:00
Paul Walker
94d374ab6c
[LLVM][CGP] Allow finer control for sinking compares. (#151366)
Compare sinking is selectable based on the result of
hasMultipleConditionRegisters. This function is too coarse grained by
not taking into account the differences between scalar and vector
compares. This PR extends the interface to take an EVT to allow finer
control.
    
The new interface is used by AArch64 to disable sinking of scalable
vector compares, but with isProfitableToSinkOperands updated to maintain
the cases that are specifically tested.
2025-08-05 11:43:41 +01:00
Yingwei Zheng
2d0ca09305
[CodeGenPrepare] Make sure that AddOffset is also a loop invariant (#150625)
Closes https://github.com/llvm/llvm-project/issues/150611.
2025-07-26 00:23:56 +08:00
Philip Reames
48ef55ce3e [CGP] Update tests to use autogen scripts, and refresh check lines
Reducing manual update work required for an upcoming change.
2025-07-03 11:36:33 -07:00
Evgenii Kudriashov
5ffdd9480d
[CodeGenPrepare] Filter out unrecreatable addresses from memory optimization (#143566)
Follow up on #139303
2025-06-28 23:30:03 +02:00
Philip Reames
0ef27186c9 [tests] Additional coverage for gather/scatter address optimizations 2025-06-26 11:50:57 -07:00
Florian Hahn
dde30a4731
[CGP] Bail out if (Base|Scaled)Reg does not dominate insert point. (#142949)
(Base|Scaled)Reg may not dominate the chosen insert point, if there are
multiple uses of the address. Bail out if that's the case, otherwise we
will generate invalid IR.

In some cases, we could probably adjust the insert point or hoist the
(Base|Scaled)Reg.

Fixes https://github.com/llvm/llvm-project/issues/142830.

PR: https://github.com/llvm/llvm-project/pull/142949
2025-06-06 12:38:30 +01:00
weiguozhi
59c6d70ed8
[CodeGenPrepare] Make sure instruction get from SunkAddrs is before MemoryInst (#139303)
Function optimizeBlock may do optimizations on a block for multiple
times. In the first iteration of the loop, MemoryInst1 may generate a
sunk instruction and store it into SunkAddrs. In the second iteration of
the loop, MemoryInst2 may use the same address and then it can reuse the
sunk instruction stored in SunkAddrs, but MemoryInst2 may be before
MemoryInst1 and the corresponding sunk instruction. In order to avoid
use before def error, we need to find appropriate insert position for the
 sunk instruction.

Fixes #138208.
2025-05-15 09:27:25 -07:00
Orlando Cazalet-Hyams
60d0bc1fae
Propagate DebugLocs on phis in BreakCriticalEdges (#133492)
The pull request discusses whether this change is needed or not. We leant
towards "it can't hurt" on the basis that it's at worst slightly unecessary
(but not incorret).

The motivation for the patch came from reviewing code duplication sites to
update for Key Instructions, finding this, trying to generate a test case and
seeing the DebugLocs aren't propagated.
2025-05-08 13:01:48 +01:00
Sergei Barannikov
5080a0251f
[CodeGenPrepare] Unfold slow ctpop when used in power-of-two test (#102731)
DAG combiner already does this transformation, but in some cases it does
not have a chance because either CodeGenPrepare or SelectionDAGBuilder
move icmp to a different basic block.

https://alive2.llvm.org/ce/z/ARzh99

Fixes #94829

Pull Request: https://github.com/llvm/llvm-project/pull/102731
2025-04-23 08:54:10 +03:00
Nikita Popov
20507a9e95
[Verifier][CGP] Allow integer argument to dbg_declare (#134803)
Relaxes the newly added verifier rule to also allow an integer argument
to dbg_declare, which is interpreted as a pointer. Adjust CGP to deal with
it gracefully.

Fixes https://github.com/llvm/llvm-project/issues/134523.
Alternative to https://github.com/llvm/llvm-project/pull/134601.
2025-04-10 12:29:56 +02:00
Jeremy Morse
792a6f8119
[RemoveDIs] Remove "try-debuginfo-iterators..." test flags (#130298)
These date back to when the non-intrinsic format of variable locations
was still being tested and was behind a compile-time flag, so not all
builds / bots would correctly run them. The solution at the time, to get
at least some test coverage, was to have tests opt-in to non-intrinsic
debug-info if it was built into LLVM.

Nowadays, non-intrinsic format is the default and has been on for more
than a year, there's no need for this flag to exist.

(I've downgraded the flag from "try" to explicitly requesting
non-intrinsic format in some places, so that we can deal with tests that
are explicitly about non-intrinsic format in their own commit).
2025-03-14 15:50:49 +00:00
Mingming Liu
5399782508
[IR] Generalize Function's {set,get}SectionPrefix to GlobalObjects, the base class of {Function, GlobalVariable, IFunc} (#125757)
This is a split of https://github.com/llvm/llvm-project/pull/125756
2025-02-06 14:51:13 -08:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Stephen Tozer
822f74a911
[Clang] Cleanup docs and comments relating to -fextend-variable-liveness (#124767)
This patch contains a number of changes relating to the above flag;
primarily it updates comment references to the old flag names,
"-fextend-lifetimes" and "-fextend-this-ptr" to refer to the new names,
"-fextend-variable-liveness[={all,this}]". These changes are all NFC.

This patch also removes the explicit -fextend-this-ptr-liveness flag
alias, and shortens the help-text for the main flag; these are both
changes that were meant to be applied in the initial PR (#110000), but
due to some user-error on my part they were not included in the merged
commit.
2025-01-28 18:25:32 +00:00
David Sherwood
346185c42c
[AArch64] Improve codegen of vectorised early exit loops (#119534)
Once PR #112138 lands we are able to start vectorising more loops
that have uncountable early exits. The typical loop structure
looks like this:

vector.body:
  ...
  %pred = icmp eq <2 x ptr> %wide.load, %broadcast.splat
  ...
  %or.reduc = tail call i1 @llvm.vector.reduce.or.v2i1(<2 x i1> %pred)
  %iv.cmp = icmp eq i64 %index.next, 4
  %exit.cond = or i1 %or.reduc, %iv.cmp
  br i1 %exit.cond, label %middle.split, label %vector.body

middle.split:
  br i1 %or.reduc, label %found, label %notfound

found:
  ret i64 1

notfound:
  ret i64 0

The problem with this is that %or.reduc is kept live after the loop,
and since this is a boolean it typically requires making a copy of
the condition code register. For AArch64 this requires an additional
cset instruction, which is quite expensive for a typical find loop
that only contains 6 or 7 instructions.

This patch attempts to improve the codegen by sinking the reduction
out of the loop to the location of it's user. It's a lot cheaper to
keep the predicate alive if the type is legal and has lots of
registers for it. There is a potential downside in that a little
more work is required after the loop, but I believe this is worth
it since we are likely to spend most of our time in the loop.
2025-01-06 13:17:14 +00:00
Yingwei Zheng
6568ceb9fa
[CodeGenPrepare] Drop nsw flags in optimizeLoadExt (#118180)
Alive2: https://alive2.llvm.org/ce/z/pMcD7q
Closes https://github.com/llvm/llvm-project/issues/118172.
2024-12-01 11:25:31 +08:00
Lee Wei
1ca64c5fb7
[llvm] Remove br i1 undef from some regression tests [NFC] (#115691)
This PR aims to remove undefined behavior from tests under the directory
`llvm/transforms/CodegenPrepare, ConstantHoisting, Coroutines` etc.
2024-11-11 12:56:31 +00:00
Paul Walker
38fffa630e
[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548) 2024-11-06 11:53:33 +00:00
goldsteinn
1e072ae289
[CGP] [CodeGenPrepare] Folding urem with loop invariant value plus offset (#104724)
This extends the existing fold:

```
for(i = Start; i < End; ++i)
   Rem = (i nuw+- IncrLoopInvariant) u% RemAmtLoopInvariant;
```
 ->
```
Rem = (Start nuw+- IncrLoopInvariant) % RemAmtLoopInvariant;
for(i = Start; i < End; ++i, ++rem)
   Rem = rem == RemAmtLoopInvariant ? 0 : Rem;
```

To work with a non-zero `IncrLoopInvariant`.

This is a common usage in cases such as:

```
for(i = 0; i < N; ++i)
    if ((i + 1) % X) == 0)
        do_something_occasionally_but_not_first_iter();
```

Alive2 w/ i4/unrolled 6x (needs to be ran locally due to timeout):
https://alive2.llvm.org/ce/z/6tgyN3

Exhaust proof over all uint8_t combinations in C++:
https://godbolt.org/z/WYa561388
2024-10-31 09:14:33 -05:00
Antonio Frighetto
d79c4c1119 [CGP] Regenerate revert-constant-ptr-propagation-on-calls.ll test (NFC)
Multiple buildbots were previously failing.
2024-09-02 09:55:43 +02:00
Antonio Frighetto
e4e0dfb0c2 [CGP] Undo constant propagation of pointers across calls
It may be profitable to revert SCCP propagation of C++ static values,
if such constants are pointers, in order to avoid redundant pointer
computation, since the method returning the constant is non-removable.
2024-09-02 09:33:23 +02:00
Antonio Frighetto
ed6d9f6d2a [CGP] Introduce test for PR102926 (NFC) 2024-09-02 09:33:23 +02:00
Stephen Tozer
3d08ade7bd
[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149)
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:

https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850

This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).

Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
2024-08-29 17:53:32 +01:00
Simon Pilgrim
f673882323
[X86] Allow speculative BSR/BSF instructions on targets with CMOV (#102885)
Currently targets without LZCNT/TZCNT won't speculate with BSR/BSF instructions in case they have a zero value input, meaning we always insert a test+branch for the zero-input case.

This patch proposes we allow speculation if the target has CMOV, and perform a branchless select instead to handle the zero input case. This will predominately help x86-64 targets where we haven't set any particular cpu target. We already always perform BSR/BSF instructions if we were lowering a CTLZ/CTTZ_ZERO_UNDEF instruction.
2024-08-22 11:11:00 +01:00
Sjoerd Meijer
70e8c982d0
[AArch64] Bail out for scalable vecs in areExtractShuffleVectors (#105484)
The added test triggers the following assert in `areExtractShuffleVectors`
that is called from `shouldSinkOperands`:

Assertion `(!isScalable() || isZero()) && "Request for a fixed element count on a scalable object"' failed.

I don't think scalable types can be extract shuffles, so bail early if
this is the case.
2024-08-21 15:27:09 +01:00
Noah Goldstein
e4c67ba67e Recommit "[CodeGenPrepare] Folding urem with loop invariant value"
Was missing remainder on `Start` value.

Also changed logic as as nikic suggested (getting loop from `PN`
instead of `Rem`). The prior impl increased the complexity of the code
and made debugging it more difficult.

Closes #104877
2024-08-20 09:17:49 -07:00
Noah Goldstein
9b25ad818c [CodeGenPrepare][X86] Add tests for fixing urem transform; NFC 2024-08-20 09:17:49 -07:00
Noah Goldstein
731ae694a3 Revert "[CodeGenPrepare] Folding urem with loop invariant value"
This reverts commit c64ce8bf283120fd145a57d0e61f9697f719139d.

Seems to be causing stage2 failures on buildbots. Reverting while I
investigate.
2024-08-18 20:36:35 -07:00
Noah Goldstein
c64ce8bf28 [CodeGenPrepare] Folding urem with loop invariant value
```
for(i = Start; i < End; ++i)
   Rem = (i nuw+ IncrLoopInvariant) u% RemAmtLoopInvariant;
```
 ->
```
Rem = (Start nuw+ IncrLoopInvariant) % RemAmtLoopInvariant;
for(i = Start; i < End; ++i, ++rem)
   Rem = rem == RemAmtLoopInvariant ? 0 : Rem;
```

In its current state, only if `IncrLoopInvariant` and `Start` both
being zero.

Alive2 seemed unable to prove this (see:
https://alive2.llvm.org/ce/z/ATGDp3 which is clearly wrong but still
checks out...) so wrote an exhaustive test here:
https://godbolt.org/z/WYa561388

Closes #96625
2024-08-18 15:58:24 -07:00
Noah Goldstein
f16125a13c [CodeGenPrepare][X86] Add tests for folding urem with loop invariant value; NFC 2024-08-18 15:58:24 -07:00
David Green
0e124537aa
[AArch64] Sink operands to fmuladd. (#102297)
A fmuladd can be treated as a fma when sinking operands to the
intrinsic, similar to D126234.

Addresses a small part of #102195
2024-08-09 11:48:37 +01:00
Fangrui Song
8ea31db272 [CodeGenPrepare] Use MapVector to stabilize iteration order
DenseMap iteration order is not guaranteed to be deterministic.

Without the change,
llvm/test/Transforms/CodeGenPrepare/X86/statepoint-relocate.ll would
fail when `combineHashValue` changes (#95970).

Fixes: dba7329ebb0dbe1fabb3faaedfd31da3b8bd611d
2024-06-18 17:19:51 -07:00
Stephen Tozer
094572701d
[RemoveDIs] Print IR with debug records by default (#91724)
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records. 

If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.

For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
2024-06-14 15:07:27 +01:00
Nikita Popov
d10b76552f
[ConstantFold] Remove notional over-indexing fold (#93697)
The data-layout independent constant folding currently has some rather
gnarly code for canonicalizing GEP indices to reduce "notional
overindexing", and then infers inbounds based on that canonicalization.

Now that we canonicalize to i8 GEPs, this canonicalization is
essentially useless, as we'll discard it as soon as the GEP hits the
data-layout aware constant folder anyway. As such, I'd like to remove
this code entirely.

This shouldn't have any impact on optimization capabilities.
2024-05-30 08:36:44 +02:00
wanglei
9d4f7f44b6 [test][LoongArch] Add -mattr=+d option. NFC
Because most of tests assume target-abi=`lp64d`, adding the
corresponding feature is reasonable.

rg -l loongarch -g '!*.s' | xargs sed -i '/mtriple=loongarch/ {/-mattr=/!{/target-abi/! s/mtriple=loongarch.. /&-mattr=+d /}}'
2024-05-14 20:23:04 +08:00
Yingwei Zheng
ab12bba0aa
[CGP] Drop poison-generating flags after hoisting (#90382)
See the following case:
```
define i8 @src1(i8 %x) {
entry:
  %cmp = icmp eq i8 %x, -1
  br i1 %cmp, label %exit, label %if.then

if.then:
  %inc = add nuw nsw i8 %x, 1
  br label %exit

exit:
  %retval = phi i8 [ %inc, %if.then ], [ -1, %entry ]
  ret i8 %retval
}

define i8 @tgt1(i8 %x) {
entry:
  %inc = add nuw nsw i8 %x, 1
  %0 = icmp eq i8 %inc, 0
  br i1 %0, label %exit, label %if.then

if.then:                                          ; preds = %entry
  br label %exit

exit:                                             ; preds = %if.then, %entry
  %retval = phi i8 [ %inc, %if.then ], [ -1, %entry ]
  ret i8 %retval
}
```
`optimizeBranch` converts `icmp eq X, -1` into cmp to zero on RISC-V and
hoists the add into the entry block. Poison-generating flags should be
dropped as they don't still hold.

Proof: https://alive2.llvm.org/ce/z/sP7mvK
Fixes https://github.com/llvm/llvm-project/issues/90380
2024-04-29 15:51:49 +08:00
Alex Bradbury
1c8410a67d
[CodeGenPrepare] Preserve flags (such as nsw/nuw) in SinkCast (#89904)
As demonstrated in the test change, when deciding to sink a trunc we
were losing its flags. This patch moves to cloning the original
instruction instead.
2024-04-25 15:05:07 +01:00
Alex Bradbury
2554a85c03 [CodeGenPrepare][test] Add test for sinking of truncs demonstrating nsw/nuw are dropped
SinkCast creates a new cast with the same type and inputs, which drops
the nsw/nuw flags.

Reviewed as part of <https://github.com/llvm/llvm-project/pull/89904>
but split out so I can land the test separately.
2024-04-25 15:01:55 +01:00
Matthias Braun
652bcf685c
CodeGenPrepare: Add support for llvm.threadlocal.address address-mode sinking (#87844)
Depending on the TLSMode many thread-local accesses on x86 can be
expressed by adding a %fs: segment register to an addressing mode. Even
if there are mutliple users of a `llvm.threadlocal.address` intrinsic it
is generally not worth sharing the value in a register but instead fold
the %fs access into multiple addressing modes.

Hence this changes CodeGenPrepare to duplicate the
`llvm.threadlocal.address` intrinsic as necessary.

Introduces a new `TargetLowering::addressingModeSupportsTLS` callback
that allows targets to indicate whether TLS accesses can be part of an
addressing mode.

This is fixing a performance problem, as this folding of TLS-accesses
into multiple addressing modes happened naturally before the
introduction of the `llvm.threadlocal.address` intrinsic, but regressed
due to `SelectionDAG` keeping things in registers when accessed across
basic blocks, so CodeGenPrepare needs to duplicate to mitigate this. We
see a ~0.5% recovery in a codebase with heavy TLS usage (HHVM).

This fixes most of #87437
2024-04-17 12:48:02 -07:00
wanglei
8e4b0890a6
[LoongArch] Return true from shouldConsiderGEPOffsetSplit (#88371)
If not performing gep splits can prevent important optimizations, such
as preventing the element indices / member offsets from being         
(partially) folded into load/store instruction immediates.
2024-04-15 09:01:04 +08:00
wanglei
5fc8a190b3 [LoongArch] Pre commit test for #88371. NFC 2024-04-12 17:57:28 +08:00
Yingwei Zheng
38a44bdc93
[CodeGenPrepare] Reverse the canonicalization of isInf/isNanOrInf (#81572)
In commit
2b582440c1,
we canonicalize the isInf/isNanOrInf idiom into fabs+fcmp for better
analysis/codegen (See also the discussion in
https://github.com/llvm/llvm-project/pull/76338).

This patch reverses the fabs+fcmp to `is.fpclass`. If the `is.fpclass`
is not supported by the target, it will be expanded by TLI.

Fixes the regression introduced by
2b582440c1
and
https://github.com/llvm/llvm-project/pull/80414#issuecomment-1936374206.
2024-03-18 18:27:45 +08:00
Stephen Tozer
d128448efd Revert "Reapply "[RemoveDIs] Print non-intrinsic debug info in textual IR output (#79281)""
Reverted due to some test failures on some buildbots.

https://lab.llvm.org/buildbot/#/builders/67/builds/14669

This reverts commit aa436493ab7ad4cf323b0189c15c59ac9dc293c7.
2024-02-27 10:17:24 +00:00