40745 Commits

Author SHA1 Message Date
Ramkumar Ramachandra
86482dffba
[VPlan] Use m_Broadcast to improve a match (NFC) (#153607) 2025-08-14 18:10:58 +01:00
Vedant Paranjape
44df9826f3
[InstCombine] Propagate invariant.load metadata across unpacked loads (#152186)
For loads that operate on aggregate type, instcombine unpacks the loads.
It does not preserve the invariant.load metadata. This patch fixes that,
it looks for the metadata in the parent load and attaches the metadata
to the unpacked loads.

```
%struct.double2 = type { double, double }
%struct.double1 = type { double }

define %struct.double2 @func1(ptr %a) {
  %1 = load %struct.double2, ptr %a, align 16, !invariant.load !1
  ret %struct.double2 %1
}

!1 = !{}
```
Reproducer: https://godbolt.org/z/hcY8MMvYh
2025-08-14 10:08:26 -07:00
Alexey Bataev
ca4ebf9517
[SLP]Support LShr as base for copyable elements
Added support for LShr instructions as base for copyable elements. Also,
added simple analysis for best base instruction selection, if multiple
candidates are available.

Reviewers: hiraditya, RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/153393
2025-08-14 12:35:28 -04:00
Alexey Bataev
d57ab276b6 [SLP] Recalculate cleared deps for potential control schedule data nodes
Need to recalculate the dependencies for all potential control data
schedule nodes to prevent compiler crash.

Fixes #153571
2025-08-14 09:00:42 -07:00
Florian Hahn
177f27d220
[VPlan] Add incoming_[blocks,values] iterators to VPPhiAccessors (NFC) (#138472)
Add 3 new iterator ranges to VPPhiAccessors

* incoming_values(): returns a range over the incoming
  values of a phi 
* incoming_blocks(): returns a range over the incoming 
  blocks of a phi
* incoming_values_and_blocks: returns a range over pairs of
   incoming values and blocks.

Depends on https://github.com/llvm/llvm-project/pull/124838.

PR: https://github.com/llvm/llvm-project/pull/138472
2025-08-14 16:47:04 +01:00
Theodoros Theodoridis
d15b7a83a7
[llvm][LICM] Limit multi-use BOAssociation to FP and Vector (#149829)
Limit the re-association of BOps with multiple users to FP and Vector
arithmetic.
2025-08-14 11:56:55 +01:00
Elvis Wang
01fac67e2a
[TTI] Add cost kind to getAddressComputationCost(). NFC. (#153342)
This patch add cost kind to `getAddressComputationCost()` for #149955.

Note that this patch also remove all the default value in `getAddressComputationCost()`.
2025-08-14 16:01:44 +08:00
Pavel Skripkin
30144226a4
[llvm] [InstCombine] fold "icmp eq (X + (V - 1)) & -V, X" to "icmp eq (and X, V - 1), 0" (#152851)
This fold optimizes 

```llvm
define i1 @src(i32 %num, i32 %val) {
  %mask = add i32 %val, -1
  %neg = sub nsw i32 0, %val

  %num.biased = add i32 %num, %mask
  %_2.sroa.0.0 = and i32 %num.biased, %neg
  %_0 = icmp eq i32 %_2.sroa.0.0, %num
  ret i1 %_0
}
```
to
```llvm
define i1 @tgt(i32 %num, i32 %val) {
  %mask = add i32 %val, -1
  %tmp = and i32 %num, %mask
  %ret = icmp eq i32 %tmp, 0
  ret i1 %ret
}
```

For power-of-two `val`.

Observed in real life for following code

```rust
pub fn is_aligned(num: usize) -> bool {
    num.next_multiple_of(1 << 12) == num
}
```
which verifies that num is aligned to 4096.

Alive2 proof https://alive2.llvm.org/ce/z/QisECm
2025-08-14 10:23:03 +03:00
Luke Lau
af06835483
[VPlan] Use parameter packs to avoid unary/binary/ternary matchers. NFC (#152272)
Instead of defining unary/binary/ternary/4ary overloads of each matcher,
we can use parameter packs to support arbitrary numbers of operands.

This allows us to remove the explicit N-ary definitions for each
matcher.

We need to rewrite Recipe_match's constructor to use a parameter pack
too, otherwise we end up with ambiguous overloads.
2025-08-14 11:55:55 +08:00
Florian Hahn
9400490a3c
[LV] Remove unused ILV state (NFC).
Remove unused member variables from InnerLoopVectorizer.
2025-08-13 21:28:50 +01:00
Kazu Hirata
1f04b15c56
[Vectorize] Remove a redundant call to std::unique_ptr<T>::get (NFC) (#153359) 2025-08-13 10:37:31 -07:00
Arnold Schwaighofer
9871adc54a
[coro] [async] Make sure to reprocess non-split async functions (#153419)
We do this to inline the coro.end.async's tail called function into the
non-split async coroutine.

rdar://136296219
2025-08-13 10:10:35 -07:00
Sumanth Gundapaneni
77044f944c
[SeparateConstOffsetFromGEP] Decompose constant xor operand if possible (#150438)
Try to transform XOR(A, B+C) in to XOR(A,C) + B where XOR(A,C) is part
of base for memory operations.
This transformation can map these Xors in to better addressing mode and
eventually decompose them in to geps.
2025-08-13 11:49:44 -05:00
Alexey Bataev
dd5ba694bd [SLP]Recalculate deps for potential control-dependent schedule data
After clearing the dependencies in copyable data, need to recalculate
dependencies for the original ScheduleData, if it can be marked as
control dependent.

Fixes #153289
2025-08-13 08:18:26 -07:00
Ramkumar Ramachandra
d107c29fef
[VPlan] Strip unused CanonicalIVTy arg (NFC) (#153418) 2025-08-13 15:53:56 +01:00
Orlando Cazalet-Hyams
d13341db26
[RemoveDIs][NFC] Remove getAssignmentMarkers (#153214)
getAssignmentMarkers was for debug intrinsics. getDVRAssignmentMarkers
is used for DbgRecords.
2025-08-13 10:56:19 +01:00
Florian Hahn
48bfaa4c06
[VPlan] Replace VPBB for vector.ph during skeleton creation (NFC)
Shift replacement of regular VPBB for vector.ph with the VPIRBB wrapping
the created IR block directly to skeleton creation, to be consistent
with how the scalar preheader is handled.
2025-08-13 08:30:18 +01:00
Thurston Dang
cf002847a4
Revert "[msan] Improve packed multiply-add instrumentation" (#153343)
Reverts llvm/llvm-project#152941

Buildbot breakage:
https://lab.llvm.org/buildbot/#/builders/66/builds/17843
2025-08-12 21:32:07 -07:00
Luke Lau
9217b6ab2e
[VPlan] Enforce that there is only ever one header mask. NFC (#152489)
We almost only ever have one header mask, except with the data tail
folding style, i.e. with VPInstruction::ActiveLaneMask.

All we need to do is to make sure to erase the old header icmp based
header mask when replacing it.
2025-08-13 02:39:04 +00:00
Thurston Dang
ba603b5e4d
[msan] Improve packed multiply-add instrumentation (#152941)
The current instrumentation has false positives: if there is a single uninitialized bit in any of the operands, the entire output is poisoned. This does not take into account that multiplying an uninitialized value with zero results in an initialized zero value.

This step allows elements that are zero to clear the corresponding shadow during the multiplication step. The horizontal add step and accumulation step (if any) are modeled using bitwise OR.

Future work can apply this improved handler to the AVX512 equivalent intrinsics (x86_avx512_pmaddw_d_512, x86_avx512_pmaddubs_w_512.) and AVX VNNI intrinsics.
2025-08-12 19:13:48 -07:00
Mircea Trofin
374cbfd327
[licm] clone MD_prof when hoisting conditional branch (#152232)
The profiling - related metadata information for the hoisted conditional branch should be copied from the original branch, not from the current terminator of the block it's hoisted to.

The patch adds a way to disable the fix just so we can do an ablation test, after which the flag will be removed. The same flag will be reused for other similar fixes.

(This was identified through `profcheck` (see Issue #147390), and this PR addresses most of the test failures (when running under profcheck) under `Transforms/LICM`.)
2025-08-13 02:01:00 +02:00
Florian Hahn
8cdab07aaa
Reapply "[VPlan] Remove trivial dead VPPhi cycles."
This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9.

Recommit with a fix for the verifier error caused for EVL recipes.

Extra test coverage added in 6f939da60e.
2025-08-12 22:09:30 +01:00
Leon Clark
e2bbd6d287
[VectorCombine][AMDGPU] Narrow Phi of Shuffles. (#140188)
Attempt to narrow a phi of shufflevector instructions where the two
incoming values have the same operands but different masks.

Related to #128938.

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
2025-08-12 18:45:11 +01:00
Farzon Lotfi
544562ebc2
[DirectX] Remove lifetime intrinsics and run Dead Store Elimination (#152636)
fixes #151764

This fix has two parts first we track all lifetime intrinsics and if
they are users of an alloca of a target extention like dx.RawBuffer then
we eliminate those memory intrinsics when we visit the alloca.

We do step one to allow us to use the Dead Store Elimination Pass. This
removes the alloca and simplifies the use of the target extention back
to using just the global. That keeps things in a form the
DXILBitcodeWriter is expecting.

Obviously to pull this off we needed to bring back the legacy pass
manager plumbing for the DSE pass and hook it up into the DirectX
backend.

The net impact of this change is that DML shader pass rate went from
89.72% (4268 successful compilations) to 90.98% (4328 successful
compilations).
2025-08-12 12:42:08 -04:00
Florian Hahn
424258947e
[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879)
Materialize VF and VFxUF computation using VPInstruction
instead of directly creating IR.

This is one of the last few steps needed to model the full vector
skeleton in VPlan.

This is mostly NFC, although in some cases we remove some unused
computations.

PR: https://github.com/llvm/llvm-project/pull/152879
2025-08-12 14:13:13 +01:00
Leon Clark
9115bef8ee
[VectorCombine] Shrink loads used in shufflevector rebroadcasts. (#153138)
Reopen #128938.

Attempt to shrink the size of vector loads where only some of the
incoming lanes are used for rebroadcasts in shufflevector instructions.

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-12 14:08:37 +01:00
Andreas Jonson
1840106ddf
[SCCP] Add support for trunc nuw range. (#152990)
proof: https://alive2.llvm.org/ce/z/_7PVxq
2025-08-12 13:48:55 +02:00
Nikita Popov
ab323eb0c6 [SCCP][PredicateInfo] Do not predicate argument of lifetime intrinsic
Replacing the argument with a no-op bitcast violates a verifier
constraint, even if only temporarily. Any replacement based on it
would result in a violation even after the copy has been removed.

Fixes https://github.com/llvm/llvm-project/issues/153013.
2025-08-12 12:56:08 +02:00
David Sherwood
8140779a9a
[LV] Improve accuracy of branch weights in epilogue iteration check block (#152980)
When one of the vector loops (main or epilogue) is scalable and the
other isn't, we can use the estimated value of vscale to improve the
accuracy.
2025-08-12 10:37:47 +01:00
Sam Tebbs
0bfa1718af
[LV] Create in-loop sub reductions (#147026)
This PR allows the loop vectorizer to handle in-loop sub reductions by
forming a normal in-loop add reduction with a negated input.

Stacked PRs:
1. -> https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-08-12 10:22:41 +01:00
Jie Fu
2fc1b3dd9f [MemorySanitizer] Fix an unused-variable warning (NFC)
/llvm-project/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:2752:22:
 error: unused variable 'ParamType' [-Werror,-Wunused-variable]
    FixedVectorType *ParamType =
                     ^
1 error generated.
2025-08-12 07:51:53 +08:00
Thurston Dang
ef5022745c
[NFCI][msan] Refactor into 'horizontalReduce' (#152961)
The functionality is used by two helper functions, and will be used even more in the future (e.g.,
https://github.com/llvm/llvm-project/pull/152941).
2025-08-11 15:48:20 -07:00
Florian Hahn
1c7c8e3ad3
Revert "[VPlan] Remove trivial dead VPPhi cycles."
This reverts commit 1f17bb133f4f49942a1e0245291811ca3c99a7d2.

This seems to be breaking some RISCV bots, reverting for now
https://lab.llvm.org/buildbot/#/builders/210/builds/1266
2025-08-11 22:05:30 +01:00
Florian Hahn
1f17bb133f
[VPlan] Remove trivial dead VPPhi cycles.
Update removeDeadRecipes to remove trivial dead VPPhi cycles.

Should effectively be NFC end-to-end.
2025-08-11 21:29:49 +01:00
XChy
df75b4b942
Revert "[DFAJumpThreading] Prevent pass from using too much memory." (#153075)
Reverts llvm/llvm-project#145482
2025-08-12 04:26:47 +08:00
Alexey Bataev
2d7b55a028
[SLP]Initial support for copyable elements
Adds initial support for copyable elements, both schedulable and
non-schedulable.
Adds support only for add for now, other opcodes will added in future.
Still some cases are not handled, e.g. stores do not include this,
because currently do not check for copyable elements.

Reviewers: hiraditya, RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/147366
2025-08-11 09:41:19 -04:00
Alexey Bataev
67af2f6c5c [SLP]Initial FMAD support (#149102)
Added initial check for potential fmad conversion in reductions and
operands vectorization.

Added the check for instruction to fix #152683

Skipped the code for reduction to avoid regressions.
2025-08-11 05:53:55 -07:00
Weibo He
13cd725857
[CoroSplit] Remove lifetime marker checks for subranges of allocas (#152886)
#150248 starts to drop size argument of lifetime markers. Then lifetime
markers cannot refer to subrange of allocas and we can remove this
check.
2025-08-11 13:02:07 +02:00
Andreas Jonson
330a589450
[PredicateInfo] Handle trunc nuw i1 condition. (#152988)
proof: https://alive2.llvm.org/ce/z/mxtn4L
2025-08-11 13:00:54 +02:00
Luke Lau
aea82a780a
[VPlan] Remove some getCanonicalIV() uses. NFC (#152969)
A lot of time getCanonicalIV() is used to get the canonical IV type,
e.g. to instantiate a VPTypeAnalysis or to get the LLVMContext.

However VPTypeAnalysis has a constructor that takes the VPlan directly
and there's a method on VPlan to get the LLVMContext directly, so use
those instead where possible.

This lets us remove a constructor on VPTypeAnalysis.

Also remove an unused LLVMContext argument in UnrollState whilst we're
here.
2025-08-11 18:12:05 +08:00
Luke Lau
acb86fb9e0
[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657)
In some places we were passing the type of value being accessed, in
other cases we were passing the type of the pointer for the access.

The most "involved" user is
LoopVectorizationCostModel::getMemInstScalarizationCost, which is the
only call site that passes in the SCEV, and it passes along the pointer
type.

This changes call sites to consistently pass the pointer type, and
renames the arguments to clarify this.

No target actually checks the contents of the type passed, only to see
if it's a vector or not, so this shouldn't have an effect.
2025-08-11 18:00:12 +08:00
Ramkumar Ramachandra
95c525b1db
[VPlan] Preserve nusw on VectorEndPointer (#151558)
In createInterleaveGroups, get the nusw in addition to inbounds from the
existing GEP, and set them on the VPVectorEndPointerRecipe.
2025-08-11 10:38:25 +01:00
David Sherwood
aba0ce10c7
[LV] Add new line to interleaving disabled message (#152722) 2025-08-11 09:53:20 +01:00
David Sherwood
9181a7e294
[LV] Fix branch weights in epilogue min iteration check block (#152534)
I've changed how we construct the EpilogueVectorizerEpilogueLoop and
EpilogueVectorizerMainLoop classes so that we construct the parent class
with an additional boolean parameter indicating whether we're
vectorising the main or epilogue loop. The
InnerLoopAndEpilogueVectorizer class uses this new argument in
combination with the EpilogueLoopVectorizationInfo struct to set the
right UF and VF values. This then allows EpilogueVectorizerEpilogueLoop
to access the correct values of VF and UF for the main loop, which are
required when setting branch weights in the minimum iteration check
block.
2025-08-11 09:52:54 +01:00
Elvis Wang
37fe7a9933
[LV] Generate scalar xor for VPInstruction::Not if possible. (#152628)
`VPInstruction::Not` which will generate xor instruction is widely used
for the exit condition. This patch make `VPInstruction::Not` generate
scalar `xor` if possible.

This can help reducing the (splat true) in the `xor` and make `xor` be
scalar.
2025-08-11 16:35:21 +08:00
Yingwei Zheng
84b31581f8
Revert "[PatternMatch] Add m_[Shift]OrSelf matchers." (#152953)
Reverts llvm/llvm-project#152924
According to
f67668b586,
it is not an NFC change.
2025-08-11 09:35:16 +02:00
hanbeom
a750fcb52b
[GVN] Check IndirectBr in Predecessor Terminators (#151188)
Critical edges with an IndirectBr terminator cannot be split. 
Add a check it to prevent assertion failures.

Fixes: #150229
2025-08-11 09:25:52 +02:00
Nikita Popov
35bad229c1
[PredicateInfo] Use bitcast instead of ssa.copy (#151174)
PredicateInfo needs some no-op to which the predicate can be attached.
Currently this is an ssa.copy intrinsic. This PR replaces it with a
no-op bitcast.
    
Using a bitcast is more efficient because we don't have the overhead of
an overloaded intrinsic. It also makes things slightly simpler overall.
2025-08-11 09:25:01 +02:00
David Green
6ca6d45b29
[VectorCombine] Use hasOneUser in shuffle-to-identity fold (#152675)
We need to check that the node is part of the graph being converted, so
will not contain external uses when transformed.
2025-08-11 07:45:15 +01:00
Mel Chen
6db3776f9b
[LV][EVL] Simplify EVL recipe transformation by using a single EVL mask. nfc (#152479)
The EVL mask is always defined as `icmp ult (step-vector, EVL)`, so we
only need to generate it once per plan in the header. Then, we replace
all uses of the header mask with the EVL mask, and recursively optimize
the users of EVL mask into EVL recipes. This way, the transformation to
EVL recipes can be done with just a single loop.
2025-08-11 11:09:01 +08:00