1101 Commits

Author SHA1 Message Date
Florian Hahn
700b77b5e5
[InstCombine] Don't sink if it would require dropping deref assumptions. (#166945)
Currently sinking assumes in instcombine drops assumes if they would
prevent sinking. Removing dereferenceable assumptions earlier on can
inhibit vectorization of early-exit loops in practice.

Special-case deferenceable assumptions so that they block sinking. This
can be combined with a separate change to drop dereferencebale
assumptions after vectorization: https://clang.godbolt.org/z/jGqcx3sbs

PR: https://github.com/llvm/llvm-project/pull/166945
2025-11-09 21:08:46 +00:00
Andreas Jonson
b9ea93cd5c
[InstCombine] Fold operation into select, when one operand is zext of select's condition (#166816)
Proof https://alive2.llvm.org/ce/z/oCQyTG
2025-11-08 15:22:32 +01:00
Gábor Spaits
628d53aba5
[InstCombine] Enable FoldOpIntoSelect and foldOpIntoPhi when the Op's other parameter is non-const (#166102)
This patch enables `FoldOpIntoSelect` and `foldOpIntoPhi` for the cases
when Op's second parameter is a non-constant.
It doesn't seem to bring significant improvements, but the compile
time impact is neglegable.
2025-11-05 10:04:32 +01:00
Nikita Popov
de9e18dc75
[InstCombine] Handle ptrtoaddr in gep of pointer sub fold (#164818)
This extends the `ptradd x, ptrtoint(y) - ptrtoint(x)` to `y`
InstCombine fold to support ptrtoaddr. In the case where x and y have
the same underlying object, this is handled by InstSimplify already. If
the underlying object may differ, the replacement can only be performed
if provenance does not matter.

For pointers with non-address bits we need to be careful here, because
the pattern will return a pointer with the non-address bits of x and the
address bits of y. As such, uses in ptrtoaddr are safe to replace, but
uses in ptrtoint are not. Whether uses in icmp are safe to replace
depends on the outcome of the pending discussion on icmp semantics (I'll
adjust this in https://github.com/llvm/llvm-project/pull/163936 if/when
that lands).
2025-10-27 09:16:48 +01:00
Benjamin Maxwell
8ebec3d551
[InstCombine] Constant fold binops through vector.insert (#164624)
This patch improves constant folding through `llvm.vector.insert`. It
does not change anything for fixed-length vectors (which can already be
folded to ConstantVectors for these cases), but folds scalable vectors
that otherwise would not be folded.

These folds preserve the destination vector (which could be undef or
poison), giving targets more freedom in lowering the operations.
2025-10-24 09:03:40 +01:00
Benjamin Maxwell
c80495c1b0
[InstCombine] Allow folding cross-lane operations into PHIs/selects (#164388)
Previously, cross-lane operations were disallowed here, but they are
only problematic if the `select` condition is a vector, as the input of
the operation is not simply one of the arms of the phi/select.
2025-10-23 09:27:57 +01:00
Nikita Popov
af4367a6db
[InstCombine] Skip foldFBinOpOfIntCastsFromSign for vector ops (#162804)
Converting a vector float op into a vector int op may be non-profitable,
especially for targets where the float op for a given type is legal, but
the integer op is not.

We could of course also try to address this via a reverse transform in
the backend, but I don't think it's worth the bother, given that vectors
were never the intended use case for this transform in the first place.

Fixes https://github.com/llvm/llvm-project/issues/162749.
2025-10-13 10:13:34 +02:00
Mircea Trofin
8aa49974df
[NFC][InstCombine] Make use of unknown profile info clear in the API name (#162766)
Making the choice more clear from the API name, otherwise it'd be very easy for one to just "not bother" with the `MDFrom`​, especially since it is optional and follows the optional `Name`​ - but this time we'd have a harder time detecting it's effectivelly dropped metadata.
2025-10-10 09:33:45 -07:00
Cullen Rhodes
100db53856
[InstCombine][nfc] Remove dead invoke inst check in foldOpIntoPhi (#161871)
There's a check above the pred block is terminated with an unconditional
branch, so this code is unreachable.
2025-10-08 09:44:57 +01:00
Rahul Joshi
9ebf1e9175
[NFC][InstCombine] Fix namespace usage in InstCombine (#161902) 2025-10-04 05:28:24 -07:00
Nicolai Hähnle
11a4b2d950
Cleanup the LLVM exported symbols namespace (#161240)
There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.

While doing this, I noticed some other variables in the global namespace
and moved them as well.
2025-10-01 15:32:07 -07:00
Yingwei Zheng
73d9974c91
[InstCombine] Avoid self-replacing in getUndefReplacement (#161500)
Self-replacing has a different meaning in InstCombine. It will replace
all uses with poison.
Closes https://github.com/llvm/llvm-project/issues/161492.
2025-10-01 22:02:30 +08:00
mikael-nilsson-arm
69586331e8
[InstCombine] Opt phi(freeze(undef), C) -> phi(C, C) (#161181)
Try to choose a value for freeze that enables the PHI to be replaced
with its input constants if they are equal.
2025-10-01 10:13:42 +02:00
Alan Zhao
045e09f22b
[InstCombine] Set !prof metadata on Selects identified by add.ll test (#158743)
These select instructions are created from non-branching instructions,
so their branch weights are unknown.

Tracking issue: #147390
2025-09-29 20:37:06 +00:00
Alan Zhao
b5afe416c7
[InstCombine] Preserve profile data with select instructions and binary operators (#158375)
Tracking issue: #147390
2025-09-15 20:12:08 +00:00
Nikita Popov
a301e1a895
[InstCombine] Split GEPs with multiple non-zero offsets (#151333)
Split GEPs that have more than one non-zero offset into two GEPs. This
is in preparation for the ptradd migration, which can only represent
such GEPs.

This also enables CSE and LICM of the common base.
2025-09-10 16:51:58 +02:00
Hongyu Chen
75b0c89e62
[InstCombine][VectorCombine][NFC] Unify uses of lossless inverse cast (#156597)
This patch addresses
https://github.com/llvm/llvm-project/pull/155216#discussion_r2297724663.
This patch adds a helper function to put the inverse cast on constants,
with cast flags preserved(optional).
Follow-up patches will add trunc/ext handling on VectorCombine and flags
preservation on InstCombine.
2025-09-08 13:30:06 +00:00
Nikita Popov
349523e26b
[InstCombine] Merge constant offset geps across variable geps (#156326)
Fold:

    %gep1 = ptradd %p, C1
    %gep2 = ptradd %gep1, %x
    %res = ptradd %gep2, C2

To:

    %gep = ptradd %gep, %x
    %res = ptradd %gep, C1+C2

An alternative to this would be to generally canonicalize constant
offset GEPs to the right. I found the results of doing that somewhat
mixed, so I'm going for this more obviously beneficial change for now.

Proof for flag preservation on reassociation:
https://alive2.llvm.org/ce/z/gmpAMg
2025-09-03 10:47:32 +02:00
Kazu Hirata
a36a4019d3
[InstCombine] Remove unnecessary casts (NFC) (#156394)
These variables are already non const.
2025-09-01 23:12:07 -07:00
Nikita Popov
055bfc0271
[InstCombine] Strip leading zero indices from GEP (#155415)
GEPs are often in the form `gep [N x %T], ptr %p, i64 0, i64 %idx`.
Canonicalize these to `gep %T, ptr %p, i64 %idx`.

This enables transforms that only support one GEP index to work and
improves CSE.

Various transforms were recently hardened to make sure they still work
without the leading index.
2025-09-01 09:58:11 +02:00
Nikita Popov
f8f6965cee
[InstCombine] Allow freezing multiple operands (#154336)
InstCombine tries to convert `freeze(inst(op))` to `inst(freeze(op))`.
Currently, this is limited to the case where a single operand needs to
be frozen, and all other operands are guaranteed non-poison.

This patch allows the transform even if multiple operands need to be
frozen. The existing limitation makes sure that we do not increase the
total number of freezes, but it also means that that we may fail to
eliminate freezes (via poison flag dropping) and may prevent
optimizations (as analysis generally can't look past freeze). Overall, I
believe that aggressively pushing freezes upwards is more beneficial
than harmful.

This is the middle-end version of #145939 in DAGCombine (which is
currently reverted for SDAG-specific reasons).
2025-08-25 12:58:39 +02:00
Nikita Popov
09dc08b707 [InstCombine] Handle repeated users in foldOpIntoPhi()
If the phi is used multiple times in the same user, it will appear
multiple times in users(), in which case make_early_inc_range()
is insufficient to prevent iterator invalidation.

Fixes the issue reported at:
https://github.com/llvm/llvm-project/pull/151115#issuecomment-3141542852
2025-08-01 11:07:06 +02:00
Nikita Popov
d4d2d7d785
[InstCombine] Preserve nuw in canonicalizeGEPOfConstGEPI8() (#151533)
Proof: https://alive2.llvm.org/ce/z/4j8U3f
2025-08-01 09:40:03 +02:00
Nikita Popov
a71909156e
[InstCombine] Set flags when canonicalizing GEP indices (#151516)
When truncating set nsw/nuw based on nusw/nuw. When extending, use zext
nneg if nusw+nuw.

Proof: https://alive2.llvm.org/ce/z/JA2Yzr
2025-07-31 15:58:04 +02:00
Nikita Popov
16d73839b1
[InstCombine] Support folding intrinsics into phis (#151115)
Call foldOpIntoPhi() for speculatable intrinsics. We already do this for
FoldOpIntoSelect().

Among other things, this partially subsumes
https://github.com/llvm/llvm-project/pull/149858.
2025-07-31 12:32:37 +02:00
Nikita Popov
385fe30ee0
[InstCombine] Strip trailing zero GEP indices (#151338)
Zero indices at the end do not change the GEP offset and can be removed.

(Doing the same at the start requires adjusting the source element
type.)
2025-07-30 17:55:00 +02:00
Nikita Popov
8a09adc22a
[InstCombine] Split GEPs with multiple variable indices (#137297)
Split GEPs that have more than one variable index into two. This is in
preparation for the ptradd migration, which will not support multi-index
GEPs.

This also enables the split off part to be CSEd and LICMed.
2025-07-30 12:54:06 +02:00
Jeremy Morse
c9ceb9b75f
[DebugInfo] Remove intrinsic-flavours of findDbgUsers (#149816)
This is one of the final remaining debug-intrinsic specific codepaths
out there, and pieces of cross-LLVM infrastructure to do with debug
intrinsics.
2025-07-21 17:49:25 +01:00
Nikita Popov
a216702406
[InstCombine] Merge one-use GEP offsets during expansion (#147263)
When expanding a GEP chain, if there is a chain of one-use GEPs followed
by a multi-use GEP, rewrite the multi-use GEP to include the one-use
GEPs offsets.

This means the offsets from the one-use GEPs can be reused by the offset
expansion without additional cost (from computing them again with a
different reassociation).
2025-07-21 10:49:38 +02:00
Jeremy Morse
c9d8b68676
[DebugInfo] Suppress lots of users of DbgValueInst (#149476)
This is another prune of dead code -- we never generate debug intrinsics
nowadays, therefore there's no need for these codepaths to run.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2025-07-18 11:31:52 +01:00
Jeremy Morse
2a1869b981
[DebugInfo] Shave even more users of DbgVariableIntrinsic from LLVM (#149136)
At this stage I'm just opportunistically deleting any code using
debug-intrinsic types, largely adjacent to calls to findDbgUsers. I'll
get to deleting that in probably one or more two commits.
2025-07-18 08:25:10 +01:00
Jeremy Morse
5328c732a4
[DebugInfo] Strip more debug-intrinsic code from local utils (#149037)
SROA and a few other facilities use generic-lambdas and some overloaded
functions to deal with both intrinsics and debug-records at the same time.
As part of stripping out intrinsic support, delete a swathe of this code
from things in the Utils directory.

This is a large diff, but is mostly about removing functions that were
duplicated during the migration to debug records. I've taken a few
opportunities to replace comments about "intrinsics" with "records",
and replace generic lambdas with plain lambdas (I believe this makes
it more readable).

All of this is chipping away at intrinsic-specific code until we get to
removing parts of findDbgUsers, which is the final boss -- we can't
remove that until almost everything else is gone.
2025-07-16 14:13:53 +01:00
Cullen Rhodes
7392a546bb
[InstCombine] Treat identical operands as one in pushFreezeToPreventPoisonFromPropagating (#145348)
To push a freeze through an instruction, only one operand may produce
poison. However, this currently fails for identical operands which are
treated as separate. This patch fixes this by treating them as a single
operand.
2025-07-16 13:35:40 +01:00
Jeremy Morse
5b8c15c6e7
[DebugInfo] Remove getPrevNonDebugInstruction (#148859)
With the advent of intrinsic-less debug-info, we no longer need to
scatter calls to getPrevNonDebugInstruction around the codebase. Remove
most of them -- there are one or two that have the "SkipPseudoOp" flag
turned on, however they don't seem to be in positions where skipping
anything would be reasonable.
2025-07-16 11:41:32 +01:00
Nikita Popov
2dc44b3a7b
[InstCombine] Fix multi-use handling for multi-GEP rewrite (#146689)
If we're expanding offsets for a chain of GEPs in RewriteGEPs mode, we
should also rewrite GEPs that have one-use themselves, but are kept
alive by a multi-use GEP later in the chain.

For the sake of simplicity, I've changed this to just skip the one-use
condition entirely (which will perform an unnecessary rewrite of a no
longer used GEP, but shouldn't otherwise matter).
2025-07-02 16:45:27 +02:00
Ricardo Jesus
84c849e85b
[InstCombine] Combine interleaved recurrences. (#143878)
Combine sequences such as:
```llvm
  %pn1 = phi [init1, %BB1], [%op1, %BB2]
  %pn2 = phi [init2, %BB1], [%op2, %BB2]
  %op1 = binop %pn1, constant1
  %op2 = binop %pn2, constant2
  %rdx = binop %op1, %op2
```
Into:
```llvm
  %phi_combined = phi [init_combined, %BB1], [%op_combined, %BB2]
  %rdx_combined = binop %phi_combined, constant_combined
```

This allows us to simplify interleaved reductions, for example as
introduced by the loop vectorizer.

The anecdotal example for this is the loop below:
```c
float foo() {
  float q = 1.f;
  for (int i = 0; i < 1000; ++i)
    q *= .99f;
  return q;
}
```
Which currently gets lowered explicitly such as (on AArch64,
interleaved by four):
```gas
.LBB0_1:
  fmul    v0.4s, v0.4s, v1.4s
  fmul    v2.4s, v2.4s, v1.4s
  fmul    v3.4s, v3.4s, v1.4s
  fmul    v4.4s, v4.4s, v1.4s
  subs    w8, w8, #32
  b.ne    .LBB0_1
```
But with this patch lowers trivially:
```gas
foo:
  mov     w8, #5028
  movk    w8, #14389, lsl #16
  fmov    s0, w8
  ret
```
2025-07-01 09:54:38 +01:00
Kazu Hirata
2a35414e98
[Transforms] Use range-based for loops (NFC) (#145252)
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-06-25 10:08:26 -07:00
Philip Reames
a84891698a
[instcombine] Scalarize operands of vector geps if possible (#145402)
If we have a gep with vector indices which were splats (either constants
or shuffles), prefer the scalar form of the index. If all operands are
scalarizable, then prefer a scalar gep with splat following.

This does loose some information about undef/poison lanes, but I'm not
sure that's significant versus the number of downstream transformations
which get confused by having to manual scalarize operands.
2025-06-24 09:48:00 -07:00
Jameson Nash
940ff110d7
[InstCombine] fix hwasan mistake in "remove dead loads" (#145057)
Detected by CI after #143958.
2025-06-20 12:22:59 -04:00
Jameson Nash
96ab74bf17
[InstCombine] remove undef loads, such as memcpy from undef (#143958)
Extend `isAllocSiteRemovable` to be able to check if the ModRef info
indicates the alloca is only Ref or only Mod, and be able to remove it
accordingly. It seemed that there were a surprising number of
benchmarks with this pattern which weren't getting optimized previously
(due to MemorySSA walk limits). There were somewhat more existing tests
than I'd like to have modified which were simply doing exactly this
pattern (and thus relying on undef memory). Claude code contributed the
new tests (and found an important typo that I'd made).

This implements the discussion in
https://github.com/llvm/llvm-project/pull/143782#discussion_r2142720376.
2025-06-20 10:32:31 -04:00
Philip Reames
b5aaf9d988
[InstCombine] Implement vp.reverse reordering/elimination through binop/unop (#143963)
This simply copies the structure of the vector.reverse patterns from
just above, and reimplements them for the vp.reverse intrinsics when the
mask is all ones and the EVLs exactly match.

Its unfortunate that we have three different ways to represent a reverse
(shuffle, vector.reverse, and vp.reverse) but I don't see an obvious way
to remove any them because the semantics are slightly different.

This significantly improves vectorization in TSVC_2's s112 and s1112
loops when using EVL tail folding.
2025-06-18 08:53:45 -07:00
Jeremy Morse
9eb0020555
[DebugInfo][RemoveDIs] Remove a swathe of debug-intrinsic code (#144389)
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.

This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.

Co-authored-by: Nikita Popov <github@npopov.com>
2025-06-17 15:55:14 +01:00
Konstantin Bogdanov
07fa6d1d90
[InstCombine] Avoid folding select(umin(X, Y), X) with min/max values in false arm (#143020)
Fixes https://github.com/llvm/llvm-project/issues/139050.

This patch adds a check to avoid folding min/max reduction into select, which may block loop vectorization.

The issue is that the following snippet:
```
declare i8 @llvm.umin.i8(i8, i8)

define i8 @masked_min_fold_bug(i8 %acc, i8 %val, i8 %mask) {
; CHECK-LABEL: @masked_min_fold_bug(
; CHECK:       %cond = icmp eq i8 %mask, 0
; CHECK:       %masked_val = select i1 %cond, i8 %val, i8 255
; CHECK:       call i8 @llvm.umin.i8(i8 %acc, i8 %masked_val)
;
  %cond = icmp eq i8 %mask, 0
  %masked_val = select i1 %cond, i8 %val, i8 255
  %res = call i8 @llvm.umin.i8(i8 %acc, i8 %masked_val)
  ret i8 %res
}
```

is being optimized to the following code, which can not be vectorized
later.
```
declare i8 @llvm.umin.i8(i8, i8) #0

define i8 @masked_min_fold_bug(i8 %acc, i8 %val, i8 %mask) {
  %cond = icmp eq i8 %mask, 0
  %1 = call i8 @llvm.umin.i8(i8 %acc, i8 %val)
  %res = select i1 %cond, i8 %1, i8 %acc
  ret i8 %res
}

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
```

Expected:
```
declare i8 @llvm.umin.i8(i8, i8) #0

define i8 @masked_min_fold_bug(i8 %acc, i8 %val, i8 %mask) {
  %cond = icmp eq i8 %mask, 0
  %masked_val = select i1 %cond, i8 %val, i8 -1
  %res = call i8 @llvm.umin.i8(i8 %acc, i8 %masked_val)
  ret i8 %res
}

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
```

https://godbolt.org/z/cYMheKE5r
2025-06-14 14:32:54 +08:00
Nikita Popov
2019553a0b [InstCombine] Extract EmitGEPOffsets() helper (NFC)
Extract a reusable helper for emitting a sum of multiple GEP
offsets.
2025-06-13 12:34:18 +02:00
Stephen Tozer
a08a831515
[DLCov][NFC] Propagate annotated DebugLocs through transformations (#138047)
Part of the coverage-tracking feature, following #107279.

In order for DebugLoc coverage testing to work, we firstly have to set
annotations for intentionally-empty DebugLocs, and secondly we have to
ensure that we do not drop these annotations as we propagate DebugLocs
throughout compilation. As the annotations exist as part of the DebugLoc
class, and not the underlying DILocation, they will not survive a
DebugLoc->DILocation->DebugLoc roundtrip. Therefore this patch modifies
a number of places in the compiler to propagate DebugLocs directly
rather than via the underlying DILocation. This has no effect on the
output of normal builds; it only ensures that during coverage builds, we
do not drop incorrectly annotations and therefore create false
positives.

The bulk of these changes are in replacing
DILocation::getMergedLocation(s) with a DebugLoc equivalent, and in
changing the IRBuilder to store a DebugLoc directly rather than storing
DILocations in its general Metadata array. We also use a new function,
`DebugLoc::orElse`, which selects the "best" DebugLoc out of a pair
(valid location > annotated > empty), preferring the current DebugLoc on
a tie - this encapsulates the existing behaviour at a few sites where we
_may_ assign a DebugLoc to an existing instruction, while extending the
logic to handle annotation DebugLocs at the same time.
2025-06-12 14:06:27 +01:00
Acthinks Yang
ca4dfca5c7
[InstCombine] Relax guard against FP min/max in select fold (#143144)
FCmp's commutativity predicates do not work with min/max semantics

Closes #142711
2025-06-07 14:26:43 +08:00
Ramkumar Ramachandra
b40e4ceaa6
[ValueTracking] Make Depth last default arg (NFC) (#142384)
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
2025-06-03 17:12:24 +01:00
Luke Lau
9262e37d8c
[InstCombine] Fold shuffled intrinsic operands with constant operands (#141300)
We currently pull shuffles through binops and intrinsics, which is an
important canonical form for VectorCombine to be able to scalarize
vector sequences. But while binops can be folded with a constant
operand, intrinsics currently require all operands to be shufflevectors.

This extends intrinsic folding to be in line with regular binops by
reusing the constant "unshuffling" logic.

As far as I can tell the list of currently folded intrinsics don't
require any special UB handling.

This change in combination with #138095 and #137823 fixes the following
C:

```c
void max(int *x, int *y, int n) {
  for (int i = 0; i < n; i++)
    x[i] += *y > 42 ? *y : 42;
}
```

Not using the splatted vector form on RISC-V with `-O3 -march=rva23u64`:

```asm
	vmv.s.x	v8, a4
	li	a4, 42
	vmax.vx	v10, v8, a4
	vrgather.vi	v8, v10, 0
.LBB0_9:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
	vl2re32.v	v10, (a5)
	vadd.vv	v10, v10, v8
	vs2r.v	v10, (a5)
```

i.e., it now generates

```asm
        li	a6, 42
        max	a6, a4, a6
.LBB0_9:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
	vl2re32.v	v8, (a5)
	vadd.vx	v8, v8, a6
	vs2r.v	v8, (a5)
```
2025-05-28 10:57:08 +01:00
Luke Lau
8f9549c414
[InstCombine] Refactor fixed and scalable binop shuffle combine. NFCI (#141287)
This extracts the logic that works out the "unshuffled" constant when
pulling shuffle vectors out of binary ops, so the same combine can be
generic over fixed and scalable vectors.

The plan is to reuse this helper to do the same canonicalization on
intrinsics too.
2025-05-24 15:16:16 +01:00
Luke Lau
4b4699a13c
[InstCombine] Don't cover up poison elements for shifts when folding shuffles thru binops (#141303)
As noted in the TODO, we don't need to cover up the poison elements
placed in the unused lanes for shifts, since it's not UB unlike div/rem.

New poison elements are only introduced in cases like

ShMask = <1,1,2,2> and C = <5,5,6,6> --> NewC = <poison,5,6,poison>

And the resulting shuffle won't use the poison lanes.
2025-05-24 13:47:18 +01:00