1165 Commits

Author SHA1 Message Date
Peter Collingbourne
75bb30ddbf
Move {load,store}(llvm.protected.field.ptr) lowering to InstCombine.
The previous position of llvm.protected.field.ptr lowering for loads
and stores was problematic as it not only inhibited optimizations such
as DSE (as stores to a llvm.protected.field.ptr were not considered to
must-alias stores to the non-protected.field pointer) but also required
changes to other optimization passes to avoid transformations that would
reduce PFP coverage.

Address this by moving the load/store part of the lowering to
InstCombine, where it will run earlier than the PFP-breaking and
AA-relying transformations. The deactivation symbol, null comparison
and EmuPAC parts of the lowering remain in PreISelLowering.

Now that the transformation inhibitions are no longer needed, remove them
(i.e. partially revert #151649, and revert #182976).

This change resulted in a 2.4% reduction in Fleetbench .text size and
the following improvements to PFP performance overhead for BM_PROTO_Arena
on various microarchitectures:

                    before   after
  Apple M2 Ultra     3.5%    3.3%
  Google Axion C4A   3.3%    2.9%
  Google Axion N4A   2.7%    2.2%

Reviewers: fmayer, nikic, vitalybuka

Reviewed By: fmayer

Pull Request: https://github.com/llvm/llvm-project/pull/186548
2026-04-06 17:47:24 -07:00
Yunbo Ni
017b9f9c7a
[InstCombine] Fix crash in foldReversedIntrinsicOperands for struct-return intrinsics (#186339)
Fixes #186334 

Similar to #176556 , add the missing result type check in
`foldReversedIntrinsicOperands()`. This prevents `CreateVectorReverse()`
from being applied to struct-returning intrinsics.
2026-03-13 10:28:21 +00:00
tudinhh
36de257076
[InstCombine] Handle fixed-width results in get_active_lane_mask fold (#185317)
The optimization introduced in #183329 incorrectly assumed that any
extraction from a scalable active lane mask used a scalable index. When
the result of a `llvm.vector.extract` is a fixed-width vector, the index
should not be multiplied by vscale.

This PR adds a check to ensure the index is only scaled by VScaleMin
when the return type of the extraction is a scalable vector, not
fixed-width.

Fixes #185271
2026-03-09 12:51:47 +01:00
Nikolas Klauser
ce79fb3712
[InstCombine] Always fold nonnull assumptions into operand bundles (#169923)
Fixes #168688
2026-03-02 13:58:08 +01:00
Matt Arsenault
82a1905c4b
InstCombine: Pass SimplifyQuery through SimplifyDemandedFPClass (#184096) 2026-03-02 12:00:25 +00:00
Nikita Popov
3ad43f2d1c
[LangRef] Clarify nsz semantics (#180906)
The current LangRef wording says that the sign of a zero argument or
result is "insignificant", which is not really clear on what this means.

Alive2 models this as non-deterministically flipping the zero sign bits
for both inputs and outputs.

This PR proposes to specify this flag as non-deterministically flipping
inputs only. A consequence of this is that fabs is guaranteed to have an
unset sign bit even if the input is zero (that is,
https://alive2.llvm.org/ce/z/irCftQ is no longer a valid transform), and
that the copysign result sign only depends on the second operand (that
is, https://alive2.llvm.org/ce/z/VnHdfh is no longer a valid transform).

These rules are still more liberal than we'd really like them to be, but
at least avoid some of the issues with nsz.

This is based on the discussion in:
https://discourse.llvm.org/t/rfc-clarify-the-behavior-of-fp-operations-on-bit-strings-with-nsz-flag/85981
2026-03-02 10:53:07 +01:00
Kerry McLaughlin
10b48e41e7
[InstCombine] Combine extract from get_active_lane_mask where all lanes inactive (#183329)
When extracting a subvector from the result of a get_active_lane_mask, return
a constant zero vector if it can be proven that all lanes will be inactive.

For example, the result of the extract below will be a subvector where
every lane is inactive if X & Y are const, and `Y * VScale >= X`:
  vector.extract(get.active.lane.mask(Start, X), Y)
2026-02-27 09:46:49 +00:00
Nathiyaa Sengodan
4fec4df12a
[InstCombine] Fold min/max of two subtracts with common RHS (#183240)
Fold: minmax(sub X, Z , sub Y, Z) -> sub minmax(X, Y), Z

When both sub instructions have no-wrap flags and share the same RHS
operand, we can fold:

  smin (sub nsw X, Z), (sub nsw Y, Z) -> sub nsw (smin X, Y), Z
  smax (sub nsw X, Z), (sub nsw Y, Z) -> sub nsw (smax X, Y), Z
  umin (sub nuw X, Z), (sub nuw Y, Z) -> sub nuw (umin X, Y), Z
  umax (sub nuw X, Z), (sub nuw Y, Z) -> sub nuw (umax X, Y), Z

This is valid because subtraction by a common value preserves relative
ordering when no signed/unsigned overflow occurs.

Proof: https://alive2.llvm.org/ce/z/n9gwj2  
Closes https://github.com/llvm/llvm-project/issues/167059
2026-02-26 01:21:28 +08:00
Hongxu Xu
8d3e6e709c
[InstCombine] Transform splat before n x i1 for vec.reduce.add (#182213)
```llvm
define i1 @src(i1 %0) {
  %2 = insertelement <8 x i1> poison, i1 %0, i32 0
  %3 = shufflevector <8 x i1> %2, <8 x i1> poison, <8 x i32> zeroinitializer
  %4 = tail call i1 @llvm.vector.reduce.add.v8i1(<8 x i1> %3)
  ret i1 %4
}

define i1 @tgt(i1 %0) {
  ret i1 0
}
```

alive2: https://alive2.llvm.org/ce/z/vejxot

`vector_reduce_add(<n x i1>)` to `Trunc(ctpop(bitcast <n x i1> to in))`
interferes with the `vector_reduce_add(<splat>)` to `mul`, so I
exchanged their order.

Relevant PR: #161020
2026-02-21 15:03:25 +00:00
Nikita Popov
0a740668a4
[InstCombine] Support minimumnum/maximumnum (#180529)
Support minimumnum/maximumnum intrinsics in various existing
minnum/maxnum/minimum/maximum folds.

The test coverage has been copied from minnum/maxnum.

Proofs: https://alive2.llvm.org/ce/z/YMlLwO
Proofs that time out: https://alive2.llvm.org/ce/z/dJN8wj
2026-02-09 15:55:59 +00:00
Nikolas Klauser
6dbdfd824a
[InstCombine] Drop nonnull assumes if the pointer is already known to be nonnull (#180434) 2026-02-09 13:13:32 +01:00
Nikita Popov
a654a27fcd
[InstCombine] Fold min/max(fpext x, C) to fpext(min/max(x, fptrunc C)) (#179968)
Fold `min/max(fpext x, C)` to `fpext(min/max(x, fptrunc C))` in cases
where the truncation of the constant is lossless.

This helps eliminate fpext/fptrunc pairs around min/max and addresses
the regression from https://github.com/llvm/llvm-project/pull/177988.

Proof: https://alive2.llvm.org/ce/z/y_Bcdd
2026-02-09 09:13:26 +00:00
Nikita Popov
531430b614
[InstCombine] Relax one-use check for min/max(fpext x, fpext y) to fpext(min/max(x, y)) fold (#180164)
If only of the operands is one-use, the total number of fpexts stays the
same, but the min/max is performed on a narrowed type. Additionally, the
fpext may fold with a following fptrunc.
2026-02-09 09:34:17 +01:00
Snehasish Kumar
7449d32d7e
[InstCombine][profcheck] Fix profile metadata propagation for umax in InstCombine (#179332)
Select instructions created from the expansion of an umax intrinsic do
not have profile data even though the function may have profile data.
This is because PGO instrumentation does not support intrinsics.

Assisted-by: gemini
2026-02-05 21:09:06 -08:00
Nikolas Klauser
3064291c9f
Reapply "[InstCombine] Always fold alignment assumptions into operand bundles (#177597)" (#179497)
Truncating at 32 bits is now avoided by removing a cast to `unsigned`.
This would also break at 64 bits (with the pointer size > 64 bit), but I
don't think LLVM supports such a
thing.

This reverts commit bc7315749d6d16d0f162f816b3ec0ef7169615f2.
2026-02-03 19:30:48 +01:00
Nico Weber
bc7315749d Revert "[InstCombine] Always fold alignment assumptions into operand bundles (#177597)"
This reverts commit b74e1bca6d77b3de5c05822d1631006ce2a30cc6.
Makes clang assert:
https://github.com/llvm/llvm-project/pull/177597#issuecomment-3824553291
2026-01-30 11:32:46 -05:00
Matt Arsenault
909041e480
InstCombine: Check one use before trying to simplify copysign sign (#178251)
Fixes #178245
2026-01-27 17:21:44 +00:00
Matt Arsenault
10b539f13e
InstCombine: Try SimplifyDemandedBits on copysign signs (#177942) 2026-01-26 18:43:13 +01:00
Nikolas Klauser
b74e1bca6d
[InstCombine] Always fold alignment assumptions into operand bundles (#177597) 2026-01-23 16:54:17 +01:00
Dan Blackwell
c63a744f3f
[CodeGen][InstCombine][Sanitizers] Emit lifetimes when compiling with memtag-stack (#177130)
Currently we do not emit lifetimes by default when compiling with
memtag-stack - which means we don't catch use-after-scope (when
compiling without optimization).

This patch fixes that by mirroring ASan, HWASan and MSan, and always
emitting lifetime markers. The patch is based on the changes made in
aeca569.

rdar://163713381
2026-01-22 14:22:44 +00:00
Yingwei Zheng
9696c8bd62
[InstCombine] Bail out on intrinsics with struct return types (#176556)
After https://github.com/llvm/llvm-project/pull/174835, overflow
intrinsics can be vectorized. But `foldShuffledIntrinsicOperands`
doesn't support shuffling vectors inside the struct return value.

Closes https://github.com/llvm/llvm-project/issues/176548.
2026-01-17 12:04:37 +00:00
Gábor Spaits
3424447645
[InstCombine] Remove unnecessary type equality check when creating zext or trunc (NFC) (#175947)
This came up during discussions under PR #161101.
2026-01-14 15:50:04 +01:00
Henry
c6f6efba3b
[NFC] Implicit container copy cleanup (#174702)
A set of cleanup for redundant implicit container copies. Fixed with
const reference or move semantics.

e8996cb24 [AMDGPU] replace copy with const reference (NFC)
25ceecee8 [-Wunsafe-buffer-usage] Replace vector copy with reference
(NFC)
e1f5254e0 [AMDGPU] Replace copy with move semantics (NFC)
8261250d7 [InstCombine] Replace vector copy with move semantic (NFC)
749bb21de [CommandLine] Avoid vector copy for const argument (NFC)
b89526f90 [LoongArch] Remove unnecessary vector copy (NFC)
6b22bcf56 [TextAPI] Replace map copy with const reference (NFC)
a121519d8 [BlockExtract] Avoid copy semantic for ctor (NFC)
3034d3063 [LifetimeSafety] Avoid map copy for dump methods (NFC)

---------

Co-authored-by: sfu <afwbu8tp6@mozmail.com>
2026-01-14 11:16:32 +01:00
Aryan Kadole
362b653c69
[InstCombine] Fold Minimum over trailing or leading zeros (#173768)
Add support for
`umin(clz(x), clz(y)) => clz(x | y)`
`umin(ctz(x), ctz(y)) => ctz(x | y)`

[C++ source](https://godbolt.org/z/E8abbjT7G)
[alive proof](https://alive2.llvm.org/ce/z/mh94_n)

Fixes #173691
2026-01-11 20:56:52 +08:00
Kshitij Paranjape
2daf321660
[InstCombine] Add support for Instruction combining of hyperbolic functions (#173730)
Fixes llvm/llvm-project#173706
2026-01-09 00:06:58 -05:00
Valeriy Savchenko
55eaa6c27b
[InstCombine][AArch64] Lower NEON shift intrinsics when possible (#172465) 2026-01-07 07:46:22 +00:00
Benjamin Maxwell
49e601a3a2
[InstCombine] Don't fold struct-ret intrinsics into vector selects (#173062)
Folding struct-ret intrinsics like `@llvm.sincos.v4f32` into selects
with vector conditions is invalid (the result must be a vector).
2025-12-20 09:51:35 +00:00
Matt Arsenault
6e47d4ef45
Reapply "InstCombine: Fold ldexp with constant exponent to fmul" (#171895) (#171977) 2025-12-12 12:55:55 +01:00
Matt Arsenault
757c5b3bc7
Revert "InstCombine: Fold ldexp with constant exponent to fmul" (#171895)
Reverts llvm/llvm-project#171731

Fails on a libc test
2025-12-11 21:12:59 +00:00
Matt Arsenault
5eb2ec2179
InstCombine: Fold ldexp with constant exponent to fmul (#171731)
If we can represent this with an fmul, prefer it as a canonical
form. More optimizations will understand fmul, and allows contract to
fma.
2025-12-11 19:20:45 +01:00
valadaptive
7f2bbba60d
[AArch64][ARM] Optimize more tbl/tbx calls into shufflevector (#169748)
Resolves #169701.

This PR extends the existing InstCombine operation which folds `tbl1`
intrinsics to `shufflevector` if the mask operand is constant. Before
this change, it only handled 64-bit `tbl1` intrinsics with no
out-of-bounds indices. I've extended it to support both 64-bit and
128-bit vectors, and it now handles the full range of `tbl1`-`tbl4` and
`tbx1`-`tbx4`, as long as at most two of the input operands are actually
indexed into.

For the purposes of `tbl`, we need a dummy vector of zeroes if there are
any out-of-bounds indices, and for the purposes of `tbx`, we use the
"fallback" operand. Both of those take up an operand for the purposes of
`shufflevector`.

This works a lot like https://github.com/llvm/llvm-project/pull/169110,
with some added complexity because we need to handle multiple operands.
I raised a couple questions in that PR that still need to be answered:
- Is it correct to check `IsA<UndefValue>` for each mask index, and set
the output mask index to -1 if so? This is later folded to a poison
value, and I'm not sure about the subtle differences between poison and
undef and when you can substitute one for the other. As I mentioned in
#169110, the existing x86 pass (`simplifyX86vpermilvar`) already behaves
this way when it comes to undef.
- How can I write an Alive2 proof for this? It's very hard to find good
documentation or tutorials about Alive2.

As with #169110, most of the regression test cases were generated using
Claude. Everything else was written by me.
2025-12-09 16:11:26 +00:00
Nikita Popov
7b652195d7
[IR] Add ImplicitTrunc argument to ConstantInt::get() (#170865)
Add an ImplicitTrunc argument to ConstantInt::get(), which allows
controlling whether implicit truncation of the value is permitted.
    
This argument currently defaults to true, but will be switched to false
in the future to guard against signed/unsigned confusion, similar to
what has already happened for APInt.
    
The argument gives an opt-out for cases where the truncation is
intended. The patch contains one illustrative example where this
happens.
2025-12-08 08:42:59 +01:00
David Green
f741851731 Revert "[AArch64][ARM] Move ARM-specific InstCombine transforms into Transforms/Utils (#169589)"
This reverts commit 1c32b6f51ccaaf9c65be11d7dca9e5a476cddb5a due to failures on
BUILD_SHARED_LIBS builds.
2025-12-02 11:46:50 +00:00
valadaptive
1c32b6f51c
[AArch64][ARM] Move ARM-specific InstCombine transforms into Transforms/Utils (#169589)
Back when `TargetTransformInfo::instCombineIntrinsic` was added in
https://reviews.llvm.org/D81728, several transforms common to both ARM
and AArch64 were kept in the non-target-specific `InstCombineCalls.cpp`
so they could be shared between the two targets.

I want to extend the transform of the `tbl` intrinsics into static
`shufflevector`s in a similar manner to
https://github.com/llvm/llvm-project/pull/169110 (right now it only
works with a 64-bit `tbl1`, but `shufflevector` should allow it to work
with up to 2 operands, and it can definitely work with 128-bit vectors).
I think separating out the transform into a TTI hook is a prerequisite.

~~I'm not happy about creating an entirely new module for this and
having to wire it up through CMake and everything, but I'm not sure
about the alternatives. If any maintainers can think of a cleaner way of
doing this, I'm very open to it.~~

I've moved the transforms into
`Transforms/Utils/ARMCommonInstCombineIntrinsic.cpp`, which is a lot
simpler.
2025-12-02 11:17:12 +00:00
Luke Lau
bb9449d5bb
[InstCombine] Fold @llvm.experimental.get.vector.length when cnt <= max_lanes (#169293)
On RISC-V, some loops that the loop vectorizer vectorizes pre-LTO may
turn out to have the exact trip count exposed after LTO, see #164762.

If the trip count is small enough we can fold away the
@llvm.experimental.get.vector.length intrinsic based on this corollary
from the LangRef:

> If %cnt is less than or equal to %max_lanes, the return value is equal
to %cnt.

This on its own doesn't remove the @llvm.experimental.get.vector.length
in #164762 since we also need to teach computeKnownBits about
@llvm.experimental.get.vector.length and the sub recurrence, but this PR
is a starting point.

I've added this in InstCombine rather than InstSimplify since we may
need to insert a truncation (@llvm.experimental.get.vector.length can
take an i64 %cnt argument, the result is always i32).

Note that there was something similar done in VPlan in #167647 for when
the loop vectorizer knows the trip count.
2025-11-27 07:16:03 +00:00
Peter Collingbourne
d2379effe9
Add deactivation symbol operand to ConstantPtrAuth.
Deactivation symbol operands are supported in the code generator by
building on the previously added support for IRELATIVE relocations.

Reviewers: ojhunt, fmayer, ahmedbougacha, nikic, efriedma-quic

Reviewed By: fmayer

Pull Request: https://github.com/llvm/llvm-project/pull/133537
2025-11-26 12:39:40 -08:00
Peter Collingbourne
6227eb90da
Add IR and codegen support for deactivation symbols.
Deactivation symbols are a mechanism for allowing object files to disable
specific instructions in other object files at link time. The initial use
case is for pointer field protection.

For more information, see the RFC:
https://discourse.llvm.org/t/rfc-deactivation-symbols/85556

Reviewers: ojhunt, nikic, fmayer, arsenm, ahmedbougacha

Reviewed By: fmayer

Pull Request: https://github.com/llvm/llvm-project/pull/133536
2025-11-26 12:37:09 -08:00
Daniel Thornburgh
c9ff2df8c3
[IR] "modular-format" attribute for functions using format strings (#147429)
A new InstCombine transform uses this attribute to rewrite calls to a
modular version of the implementation along with llvm.reloc.none
relocations against aspects of the implementation needed by the call.

This change only adds support for the 'float' aspect, but it also builds
the structure needed for others.

See issue #146159
2025-11-11 11:52:56 -08:00
Nikita Popov
7900e63fbb [InstCombine] Support ptrtoaddr when converting to align assume bundle
ptrtoaddr can be treated the same way as ptrtoint here.
2025-10-28 12:02:47 +01:00
Mihail Mihov
6034ab3d98
[InstCombine] Add CTLZ -> CTTZ simplification (#164733)
This PR adds the simplification `ctlz(~x & (x - 1)) -> bitwidth -
cttz(x, false)` ([Alive2](https://alive2.llvm.org/ce/z/vVDRCu)).

Closes issue #164436
2025-10-25 00:40:11 +08:00
Benjamin Maxwell
c80495c1b0
[InstCombine] Allow folding cross-lane operations into PHIs/selects (#164388)
Previously, cross-lane operations were disallowed here, but they are
only problematic if the `select` condition is a vector, as the input of
the operation is not simply one of the arms of the phi/select.
2025-10-23 09:27:57 +01:00
Nikita Popov
573ca36753
[IR] Replace alignment argument with attribute on masked intrinsics (#163802)
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.

This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)

It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
2025-10-20 08:50:09 +00:00
Ramkumar Ramachandra
36b543ab20
[InstComb] Handle undef in simplifyMasked(Store|Scatter) (#161825) 2025-10-03 16:52:48 +01:00
Gábor Spaits
d29798767c
[InstCombine] Transform vector.reduce.add and splat into multiplication (#161020)
Fixes #160066

Whenever we have a vector with all the same elemnts, created with
`insertelement` and `shufflevector` and we sum the vector, we have a
multiplication.
2025-09-29 00:06:20 +02:00
Axel Sorenson
dee28f9555
[InstCombine] Rotate transformation port from SelectionDAG to InstCombine (#160628)
The rotate transformation from
72c04bb882/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L10312-L10337)
has no middle-end equivalent in InstCombine. The following is a port of
that transformation to InstCombine.

---------

Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
2025-09-26 21:04:52 -07:00
Florian Hahn
d45a135918
[InstCombine] Remove redundant align 1 assumptions. (#160695)
It seems like we have a bunch of align 1 assumptions in practice and
unless I am missing something they should not add any value.

See https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2861/files

PR: https://github.com/llvm/llvm-project/pull/160695
2025-09-25 18:00:18 +01:00
Ramkumar Ramachandra
7fb3a91418
[PatternMatch] Introduce match functor (NFC) (#159386)
A common idiom is the usage of the PatternMatch match function within a
functional algorithm like all_of. Introduce a match functor to shorten
this idiom.

Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-17 21:04:33 +01:00
Matthew Devereau
ead4f3e271
[InstCombine] Canonicalize active lane mask params (#158065)
Rewrite active lane mask intrinsics to begin their range from 0 when
both parameters are constant integers.
2025-09-12 16:35:58 +01:00
Florian Hahn
93a1470a97
[InstCombine] Remove redundant alignment assumptions. (#123348)
Use known bits to remove redundant alignment assumptions.

Libc++ now adds alignment assumptions for std::vector::begin() and
std::vector::end(), so I expect we will see quite a bit more assumptions
in C++ [1]. Try to clean up some redundant ones to start with.

[1] https://github.com/llvm/llvm-project/pull/108961

PR: https://github.com/llvm/llvm-project/pull/123348
2025-09-12 13:45:36 +01:00
Hongyu Chen
75b0c89e62
[InstCombine][VectorCombine][NFC] Unify uses of lossless inverse cast (#156597)
This patch addresses
https://github.com/llvm/llvm-project/pull/155216#discussion_r2297724663.
This patch adds a helper function to put the inverse cast on constants,
with cast flags preserved(optional).
Follow-up patches will add trunc/ext handling on VectorCombine and flags
preservation on InstCombine.
2025-09-08 13:30:06 +00:00