9439 Commits

Author SHA1 Message Date
Kerry McLaughlin
c34cba0413
[AArch64][SME] Lower aarch64.sme.cnts* to vscale when in streaming mode (#154305)
In streaming mode, both the @llvm.aarch64.sme.cnts and @llvm.aarch64.sve.cnt
intrinsics are equivalent. For SVE, cnt* is lowered in instCombineIntrinsic
to @llvm.sme.vscale(). This patch lowers the SME intrinsic similarly when
in streaming-mode.
2025-08-20 09:48:36 +01:00
AZero13
f94290cbff
[ValueTracking][GlobalISel] UCMP and SCMP cannot create undef or poison (#154404)
They cannot make poison or undef, same for IR. They can only make -1, 0,
or 1

Alive 2: https://alive2.llvm.org/ce/z/--Jd78
2025-08-20 08:41:27 +09:00
David Green
a7df02f83c
[InstCombine] Make strlen optimization more resilient to different gep types. (#153623)
This makes the optimization in optimizeStringLength for strlen(gep
@glob, %x) -> sub endof@glob, %x a little more resilient, and maybe a
bit more correct for geps with non-array types.
2025-08-19 10:37:17 +01:00
zGoldthorpe
82caa251d4
[InstCombine] Fold integer unpack/repack patterns through ZExt (#153583)
This patch explicitly enables the InstCombiner to fold integer
unpack/repack patterns such as

```llvm
define i64 @src_combine(i32 %lower, i32 %upper) {
  %base = zext i32 %lower to i64

  %u.0 = and i32 %upper, u0xff
  %z.0 = zext i32 %u.0 to i64
  %s.0 = shl i64 %z.0, 32
  %o.0 = or i64 %base, %s.0

  %r.1 = lshr i32 %upper, 8
  %u.1 = and i32 %r.1, u0xff
  %z.1 = zext i32 %u.1 to i64
  %s.1 = shl i64 %z.1, 40
  %o.1 = or i64 %o.0, %s.1

  %r.2 = lshr i32 %upper, 16
  %u.2 = and i32 %r.2, u0xff
  %z.2 = zext i32 %u.2 to i64
  %s.2 = shl i64 %z.2, 48
  %o.2 = or i64 %o.1, %s.2

  %r.3 = lshr i32 %upper, 24
  %u.3 = and i32 %r.3, u0xff
  %z.3 = zext i32 %u.3 to i64
  %s.3 = shl i64 %z.3, 56
  %o.3 = or i64 %o.2, %s.3

  ret i64 %o.3
}
; =>
define i64 @tgt_combine(i32 %lower, i32 %upper) {
  %base = zext i32 %lower to i64
  %upper.zext = zext i32 %upper to i64
  %s.0 = shl nuw i64 %upper.zext, 32
  %o.3 = or disjoint i64 %s.0, %base
  ret i64 %o.3
}
```

Alive2 proofs: [YAy7ny](https://alive2.llvm.org/ce/z/YAy7ny)
2025-08-15 12:48:32 -06:00
Vedant Paranjape
44df9826f3
[InstCombine] Propagate invariant.load metadata across unpacked loads (#152186)
For loads that operate on aggregate type, instcombine unpacks the loads.
It does not preserve the invariant.load metadata. This patch fixes that,
it looks for the metadata in the parent load and attaches the metadata
to the unpacked loads.

```
%struct.double2 = type { double, double }
%struct.double1 = type { double }

define %struct.double2 @func1(ptr %a) {
  %1 = load %struct.double2, ptr %a, align 16, !invariant.load !1
  ret %struct.double2 %1
}

!1 = !{}
```
Reproducer: https://godbolt.org/z/hcY8MMvYh
2025-08-14 10:08:26 -07:00
Pavel Skripkin
30144226a4
[llvm] [InstCombine] fold "icmp eq (X + (V - 1)) & -V, X" to "icmp eq (and X, V - 1), 0" (#152851)
This fold optimizes 

```llvm
define i1 @src(i32 %num, i32 %val) {
  %mask = add i32 %val, -1
  %neg = sub nsw i32 0, %val

  %num.biased = add i32 %num, %mask
  %_2.sroa.0.0 = and i32 %num.biased, %neg
  %_0 = icmp eq i32 %_2.sroa.0.0, %num
  ret i1 %_0
}
```
to
```llvm
define i1 @tgt(i32 %num, i32 %val) {
  %mask = add i32 %val, -1
  %tmp = and i32 %num, %mask
  %ret = icmp eq i32 %tmp, 0
  ret i1 %ret
}
```

For power-of-two `val`.

Observed in real life for following code

```rust
pub fn is_aligned(num: usize) -> bool {
    num.next_multiple_of(1 << 12) == num
}
```
which verifies that num is aligned to 4096.

Alive2 proof https://alive2.llvm.org/ce/z/QisECm
2025-08-14 10:23:03 +03:00
Macsen Casaus
6fa13d79cf
[InstCombine] Add additional known bits info for self multiply (#151202)
Closes #151043 
https://alive2.llvm.org/ce/z/JyMSk8
2025-08-11 10:52:04 +02:00
Szymon Piotr Milczek
fd41700962
[InstCombine] visitShuffleVectorInst assert with vector of pointers fix. (#152341)
In visitShuffleVectorInst there's an if block that's meant to turn
shufflevector followed by bitcast into extractelement where possible.

It assumes that there will never be bitcasts performed on vectors of ptr
as such operations are almost always illegal, and ptrtoint instructions
should be used instead.

There is however an edge case where a bitcast instruction can be
performed on a vector of type `<1 x ptr>` to turn it into type `ptr`

In this edge case, the code initializes the variable `VecBitWidth` to 0.
Then, when iterating over users that are bitcasts, an attempt is made to
create a vector of size 0, which triggers and assert.

This commit changes initialization of `VecBitWidth` to use datalayout to
find the the size of the vector instead of getPrimitiveSizeInBits method
which results in 0 for ptr and vectors of ptr.
2025-08-08 15:23:02 +02:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Antonio Frighetto
e977b28c37
[InstCombine] Match intrinsic recurrences when known to be hoisted
For value-accumulating recurrences of kind:
```
  %umax.acc = phi i8 [ %umax, %backedge ], [ %a, %entry ]
  %umax = call i8 @llvm.umax.i8(i8 %umax.acc, i8 %b)
```
The binary intrinsic may be simplified into an intrinsic with init
value and the other operand, if the latter is loop-invariant:
```
  %umax = call i8 @llvm.umax.i8(i8 %a, i8 %b)
```

Proofs: https://alive2.llvm.org/ce/z/ea2cVC.

Fixes: https://github.com/llvm/llvm-project/issues/145875.
2025-08-08 09:31:50 +02:00
Pedro Lobo
2bbc614713
[InstCombine] Support offsets in memset to load forwarding (#151924)
Adds support for load offsets when performing `memset` load forwarding.
2025-08-05 17:09:06 +01:00
Stanislav Mekhanoshin
a153e83e41
[AMDGPU] gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 codegen (#152036) 2025-08-04 19:16:34 -07:00
Paul Walker
1406058cba
[LLVM][InstCombine] Extend masked_gather's demanded elt analysis. (#151732)
Add support for other Constant types for the mask operand.
2025-08-04 14:05:04 +01:00
Paul Walker
fb4a8f67b9
[LLVM][InstCombine] foldICmpEquality: Compare APInt values rather than addresses. (#151726) 2025-08-04 13:54:44 +01:00
Abhishek Kaushik
30728eb26b
[Reland][ValueTracking] Improve Bitcast handling to match SDAG (#145223)
Fixes #125228

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-04 14:51:03 +05:30
Nikita Popov
86727fe9a1
[IR] Allow poison argument to lifetime markers (#151148)
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.

It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.

Fixes https://github.com/llvm/llvm-project/issues/151119.
2025-08-04 10:02:04 +02:00
David Green
d9971be83e
[InstCombine] Make foldCmpLoadFromIndexedGlobal more resilient to non-array geps. (#150639)
My understanding is that gep [n x i8] and gep i8 can be treated
equivalently - the array type conveys no extra information and could be
removed. This goes through foldCmpLoadFromIndexedGlobal and tries to
make it work for non-array gep types, so long as the index type still
matches the array being loaded.
2025-08-03 10:19:42 +01:00
Nikita Popov
09dc08b707 [InstCombine] Handle repeated users in foldOpIntoPhi()
If the phi is used multiple times in the same user, it will appear
multiple times in users(), in which case make_early_inc_range()
is insufficient to prevent iterator invalidation.

Fixes the issue reported at:
https://github.com/llvm/llvm-project/pull/151115#issuecomment-3141542852
2025-08-01 11:07:06 +02:00
Kerry McLaughlin
e170676351
[Instcombine] Combine extractelement from a vector_extract at index 0 (#151491)
Extracting any element from a subvector starting at index 0 is
equivalent to extracting from the original vector, i.e.
  extract_elt(vector_extract(x, 0), y) -> extract_elt(x, y)
2025-08-01 09:54:43 +01:00
Nikita Popov
d4d2d7d785
[InstCombine] Preserve nuw in canonicalizeGEPOfConstGEPI8() (#151533)
Proof: https://alive2.llvm.org/ce/z/4j8U3f
2025-08-01 09:40:03 +02:00
Nikita Popov
a71909156e
[InstCombine] Set flags when canonicalizing GEP indices (#151516)
When truncating set nsw/nuw based on nusw/nuw. When extending, use zext
nneg if nusw+nuw.

Proof: https://alive2.llvm.org/ce/z/JA2Yzr
2025-07-31 15:58:04 +02:00
Nikita Popov
16d73839b1
[InstCombine] Support folding intrinsics into phis (#151115)
Call foldOpIntoPhi() for speculatable intrinsics. We already do this for
FoldOpIntoSelect().

Among other things, this partially subsumes
https://github.com/llvm/llvm-project/pull/149858.
2025-07-31 12:32:37 +02:00
zGoldthorpe
71d6762309
[InstCombine] Added pattern for recognising the construction of packed integers. (#147414)
This patch extends the instruction combiner to simplify the construction
of a packed scalar integer from a vector type, such as:
```llvm
target datalayout = "e"

define i32 @src(<4 x i8> %v) {
  %v.0 = extractelement <4 x i8> %v, i32 0
  %z.0 = zext i8 %v.0 to i32

  %v.1 = extractelement <4 x i8> %v, i32 1
  %z.1 = zext i8 %v.1 to i32
  %s.1 = shl i32 %z.1, 8
  %x.1 = or i32 %z.0, %s.1

  %v.2 = extractelement <4 x i8> %v, i32 2
  %z.2 = zext i8 %v.2 to i32
  %s.2 = shl i32 %z.2, 16
  %x.2 = or i32 %x.1, %s.2

  %v.3 = extractelement <4 x i8> %v, i32 3
  %z.3 = zext i8 %v.3 to i32
  %s.3 = shl i32 %z.3, 24
  %x.3 = or i32 %x.2, %s.3

  ret i32 %x.3
}

; ===============

define i32 @tgt(<4 x i8> %v) {
  %x.3 = bitcast <4 x i8> %v to i32
  ret i32 %x.3
}
```

Alive2 proofs (little-endian):
[YKdMeg](https://alive2.llvm.org/ce/z/YKdMeg)
Alive2 proofs (big-endian):
[vU6iKc](https://alive2.llvm.org/ce/z/vU6iKc)
2025-07-30 10:58:49 -06:00
Nikita Popov
385fe30ee0
[InstCombine] Strip trailing zero GEP indices (#151338)
Zero indices at the end do not change the GEP offset and can be removed.

(Doing the same at the start requires adjusting the source element
type.)
2025-07-30 17:55:00 +02:00
Nikita Popov
36961202fb [InstSimplify] Regenerate test checks (NFC)
Change the function name so that UTC works properly. Also move the
test into the InstCombine directory, as that's the pass that's
actually being tested.
2025-07-30 15:14:46 +02:00
Nikita Popov
730d05b0eb [InstCombine] Avoid tmp var conflicts in test (NFC) 2025-07-30 14:51:01 +02:00
Nikita Popov
3e5f1a66bd [InstCombine] Generate test checks (NFC) 2025-07-30 14:47:39 +02:00
Nikita Popov
8a09adc22a
[InstCombine] Split GEPs with multiple variable indices (#137297)
Split GEPs that have more than one variable index into two. This is in
preparation for the ptradd migration, which will not support multi-index
GEPs.

This also enables the split off part to be CSEd and LICMed.
2025-07-30 12:54:06 +02:00
Antonio Frighetto
11a959bf98 [InstCombine] Introduce test for PR149858 (NFC) 2025-07-29 11:44:00 +02:00
Adar Dagan
1afb42bc10
[InstCombine] Let shrinkSplatShuffle act on vectors of different lengths (#148593)
shrinkSplatShuffle in InstCombine would only move truncs up through
shuffles if those shuffles inputs had the exact same type as their
output, this PR weakens this constraint to only requiring that the
scalar type of the input and output match.
2025-07-28 13:00:37 +02:00
Pedro Lobo
d9952a7a5f
[InstCombine] Propagate neg nsw when folding abs(-x) to abs(x) (#150460)
We can propagate the nsw in the neg to abs, as `-x` is only poison if x
== INT_MIN.
2025-07-24 21:45:37 +01:00
Nikita Popov
be6bed4dc6 [InstCombine] Remove instructions before+after unreachable at same time
There is no need to first remove the instructions before and then
the ones after in two different worklist iterations. We don't need
to worry about change reporting here, as the functions do that
themselves.

This avoids the issue in #150338, but not really in a principled
way. It's possible that we will have to allow poison arguments
to lifetime.start/lifetime.end again if this turns out to be a
recurring problem.
2025-07-24 11:10:22 +02:00
Nikita Popov
f0f3194e19
[InstCombine] Fold icmp of gep chains (#146714)
This extends https://github.com/llvm/llvm-project/pull/144065 to the
general case of an icmp between two GEP chains that have a common base.
2025-07-23 17:08:34 +02:00
Nikita Popov
1e24b53534
[InstCombine] Add limit for expansion of gep chains (#147065)
When converting gep subtraction / comparison to offset subtraction /
comparison, avoid expanding very long multi-use gep chains.
2025-07-23 09:47:53 +02:00
Changpeng Fang
d6094370cb
AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-21 10:09:42 -07:00
Nikita Popov
92c55a315e
[IR] Only allow lifetime.start/end on allocas (#149310)
lifetime.start and lifetime.end are primarily intended for use on
allocas, to enable stack coloring and other liveness optimizations. This
is necessary because all (static) allocas are hoisted into the entry
block, so lifetime markers are the only way to convey the actual
lifetimes.

However, lifetime.start and lifetime.end are currently *allowed* to be
used on non-alloca pointers. We don't actually do this in practice, but
just the mere fact that this is possible breaks the core purpose of the
lifetime markers, which is stack coloring of allocas. Stack coloring can
only work correctly if all lifetime markers for an alloca are
analyzable.

* If a lifetime marker may operate on multiple allocas via a select/phi,
we don't know which lifetime actually starts/ends and handle it
incorrectly (https://github.com/llvm/llvm-project/issues/104776).
* Stack coloring operates on the assumption that all lifetime markers
are visible, and not, for example, hidden behind a function call or
escaped pointer. It's not possible to change this, as part of the
purpose of lifetime markers is that they work even in the presence of
escaped pointers, where simple use analysis is insufficient.

I don't think there is any way to have coherent semantics for lifetime
markers on allocas, while also permitting them on arbitrary pointer
values.

This PR restricts lifetimes to operate on allocas only. As a followup, I
will also drop the size argument, which is superfluous if we always
operate on an alloca. (This change also renders various code handling
lifetime markers on non-alloca dead. I plan to clean up that kind of
code after dropping the size argument as well.)

In practice, I've only found a few places that currently produce
lifetimes on non-allocas:

* CoroEarly replaces the promise alloca with the result of an intrinsic,
which will later be replaced back with an alloca. I think this is the
only place where there is some legitimate loss of functionality, but I
don't think this is particularly important (I don't think we'd expect
the promise in a coroutine to admit useful lifetime optimization.)
* SafeStack moves unsafe allocas onto a separate frame. We can safely
drop lifetimes here, as SafeStack performs its own stack coloring.
* Similar for AddressSanitizer, it also moves allocas into separate
memory.
* LSR sometimes replaces the lifetime argument with a GEP chain of the
alloca (where the offsets ultimately cancel out). This is just
unnecessary. (Fixed separately in
https://github.com/llvm/llvm-project/pull/149492.)
* InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast
of an alloca. I don't think this is necessary.
2025-07-21 15:04:50 +02:00
Nikita Popov
a216702406
[InstCombine] Merge one-use GEP offsets during expansion (#147263)
When expanding a GEP chain, if there is a chain of one-use GEPs followed
by a multi-use GEP, rewrite the multi-use GEP to include the one-use
GEPs offsets.

This means the offsets from the one-use GEPs can be reused by the offset
expansion without additional cost (from computing them again with a
different reassociation).
2025-07-21 10:49:38 +02:00
kissholic
baf2953097
Optimize fptrunc(x)>=C1 --> x>=C2 (#99475)
Fix https://github.com/llvm/llvm-project/issues/85265#issue-2186848949
2025-07-19 17:52:06 +09:00
Prabhu Rajasekaran
921c6dbeca
[llvm] Introduce callee_type metadata
Introduce `callee_type` metadata which will be attached to the indirect
call instructions.

The `callee_type` metadata will be used to generate `.callgraph` section
described in this RFC:
https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html

Reviewers: morehouse, petrhosek, nikic, ilovepi

Reviewed By: nikic, ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/87573
2025-07-18 14:40:54 -07:00
Ryan Buchner
1f1fd07c32
[InstCombine] Optimize (select %x, op(%x), 0) to op(%x) for operations where op(0) == 0 (#147605)
Currently this optimization only occurs for `mul`, but this generalizes
that for any operation that has a fixed point of `0`.

There is similar logic within `EarlyCSE` pass, but that is stricter in
terms of `poison` propagation so will not optimize for many operations.

Alive2 Proofs:
`and`:
https://alive2.llvm.org/ce/z/RraasX ; base-case
https://alive2.llvm.org/ce/z/gzfFTX ; commuted-case
https://alive2.llvm.org/ce/z/63XaoX ; compare against undef
https://alive2.llvm.org/ce/z/MVRVNd ; select undef
https://alive2.llvm.org/ce/z/2bsoYG ; vector
https://alive2.llvm.org/ce/z/xByeX- ; vector compare against undef
https://alive2.llvm.org/ce/z/zNdzmZ ; vector select undef

`fshl`:
https://alive2.llvm.org/ce/z/U3_PG3 ; base-case
https://alive2.llvm.org/ce/z/BWCnxT ; compare against undef
https://alive2.llvm.org/ce/z/8HGAE_ ; select undef
; vector times out

`fshr`:
https://alive2.llvm.org/ce/z/o6F47G ; base-case
https://alive2.llvm.org/ce/z/fVnBXy ; compare against undef
https://alive2.llvm.org/ce/z/suymYJ ; select undef
; vector times out

`umin`:
https://alive2.llvm.org/ce/z/GGMqf6 ; base-case
https://alive2.llvm.org/ce/z/6cx5-k ; commuted-case
https://alive2.llvm.org/ce/z/W5d9tz ; compare against undef
https://alive2.llvm.org/ce/z/nKbaUn ; select undef
https://alive2.llvm.org/ce/z/gxEGqc ; vector
https://alive2.llvm.org/ce/z/_SDpi_ ; vector compare against undef

`sdiv`:
https://alive2.llvm.org/ce/z/5XGs3q

`srem`:
https://alive2.llvm.org/ce/z/vXAnQM

`udiv`:
https://alive2.llvm.org/ce/z/e6_8Ug

`urem`:
https://alive2.llvm.org/ce/z/VmM2SL

`shl`:
https://alive2.llvm.org/ce/z/aCZr3u ; Argument with range
https://alive2.llvm.org/ce/z/YgDy8C ; Instruction with known bits
https://alive2.llvm.org/ce/z/6pIxR6 ; Constant

`lshr`:
https://alive2.llvm.org/ce/z/WCCBej

`ashr:
https://alive2.llvm.org/ce/z/egV4TR

---------

Co-authored-by: Ryan Buchner <rbuchner@ventanamicro.com>
Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
2025-07-16 19:42:41 -07:00
Cullen Rhodes
7392a546bb
[InstCombine] Treat identical operands as one in pushFreezeToPreventPoisonFromPropagating (#145348)
To push a freeze through an instruction, only one operand may produce
poison. However, this currently fails for identical operands which are
treated as separate. This patch fixes this by treating them as a single
operand.
2025-07-16 13:35:40 +01:00
Pierre van Houtryve
f223411e2e
[InstCombine]PtrReplacer: Correctly handle select with unavailable operands (#148829)
The testcase I added previously failed because a SelectInst with invalid
operands was created (one side `addrspace(4)`, the other
`addrspace(5)`).
PointerReplacer needs to dig deeper if the true and/or false
instructions of the select are not available.

Fixes SWDEV-542957
2025-07-16 09:32:05 +02:00
Ross Kirsling
b1a93cfc32
[InstCombine] foldOpIntoPhi should apply to icmp with non-constant operand (#147676)
Alive2: https://alive2.llvm.org/ce/z/4MeCzA
Fixes #146263.
2025-07-16 10:03:25 +09:00
Ahmed Bougacha
77bcab835a
[InstCombine] Combine ptrauth intrin. callee into same-key bundle. (#94707)
Try to optimize a call to the result of a ptrauth intrinsic, potentially
into the ptrauth call bundle:
    call(ptrauth.resign(p)), ["ptrauth"()] ->  call p, ["ptrauth"()]
    call(ptrauth.sign(p)),   ["ptrauth"()] ->  call p

as long as the key/discriminator are the same in sign and auth-bundle,
and we don't change the key in the bundle (to a potentially-invalid
key.)

Generating a plain call to a raw unauthenticated pointer is generally
undesirable, but if we ended up seeing a naked ptrauth.sign in the first
place, we already have suspicious code. Unauthenticated calls are also
easier to spot than naked signs, so let the indirect call shine.


Note that there is an arguably unsafe extension to this, where we don't
bother checking that the key in bundle and intrinsic are the same (and
also allow folding away an auth into a bundle.)

This can end up generating calls with a bundle that has an invalid key
(which an informed frontend wouldn't have otherwise done), which can be
problematic. The C that generates that is straightforward but arguably
unreasonable. That wouldn't be an issue if we were to bite the bullet
and make these fully AArch64-specific, allowing key knowledge to be
embedded here.
2025-07-15 14:39:53 -07:00
Ahmed Bougacha
42d2ae1034
[InstCombine] Combine ptrauth constant callee into bundle. (#94706)
Try to optimize a call to a ptrauth constant, into its ptrauth bundle:
    call(ptrauth(f)), ["ptrauth"()] ->  call f
as long as the key/discriminator are the same in constant and bundle.
2025-07-15 13:37:07 -07:00
Florian Hahn
08a8e1c6b6
[InstCombine] Move extends across identity shuffles. (#146901)
Add a new fold to instcombine to move SExt/ZExt across identity
shuffles, applying the cast after the shuffle. This sinks extends and
can enable more general additional folding of both shuffles (and
related instructions) and extends. If backends prefer splitting up doing
casts first, the extends can be hoisted again in VectorCombine for
example.

A larger example is included in the load_i32_zext_to_v4i32. The wider
extend is easier to compute an accurate cost for and targets (like
AArch64) can lower a single wider extend more efficiently than multiple
separate extends.

This is a generalization of a VectorCombine version
(https://github.com/llvm/llvm-project/pull/141109) as suggested by
@preames.

PR: https://github.com/llvm/llvm-project/pull/146901
2025-07-14 21:01:03 +01:00
Benjamin Maxwell
43a9ec2ecd
[AArch64][SME] Instcombine llvm.aarch64.sme.in.streaming.mode() (#147930)
This can fold away in functions with known streaming modes.
2025-07-13 13:20:20 +01:00
Marius Kamp
9544bb5c29
[InstCombine] Fold umul.overflow(x, c1) | (x*c1 > c2) to x > c2/c1 (#147327)
The motivation of this pattern is to check whether the product of a
variable and a constant would be mathematically (i.e., as integer
numbers instead of bit vectors) greater than a given constant bound. The
pattern appears to occur when compiling several Rust projects (it seems
to originate from the `smallvec` crate but I have not checked this
further).

Unless `c1` is `0`, we can transform this pattern into `x > c2/c1` with
all operations working on unsigned integers. Due to undefined behavior
when an element of a non-splat vector is `0`, the transform is only
implemented for scalars and splat vectors.

Alive proof: https://alive2.llvm.org/ce/z/LawTkm

Closes #142674
2025-07-11 10:52:13 +02:00
Nikita Popov
889854bef1
[InstCombine] Avoid unprofitable add with remainder transform (#147319)
If C1 is 1 and we're working with a power of two divisor, this will end
up replacing the `and` for the remainder with a multiply and a longer
dependency chain.

Fixes https://github.com/llvm/llvm-project/issues/147176.
2025-07-08 14:29:31 +02:00
Jeffrey Byrnes
0da9aacf48
[InstCombine] Extend bitmask mul combine to handle independent operands (#142503)
This extends https://github.com/llvm/llvm-project/pull/136013 to capture
cases where the combineable bitmask muls are nested under multiple
or-disjoints.

This PR is meant for commits starting at
8c403c912046505ffc10378560c2fc48f214af6a

op1 = or-disjoint mul(and (X, C1), D) , reg1
op2 = or-disjoint mul(and (X, C2), D) , reg2
out = or-disjoint op1, op2

->

temp1 = or-disjoint reg1, reg2
out = or-disjoint mul(and (X, (C1 + C2)), D), temp1


Case1: https://alive2.llvm.org/ce/z/dHApyV
Case2: https://alive2.llvm.org/ce/z/Jz-Nag
Case3: https://alive2.llvm.org/ce/z/3xBnEV
2025-07-07 13:50:42 -07:00