40664 Commits

Author SHA1 Message Date
Thurston Dang
ef5022745c
[NFCI][msan] Refactor into 'horizontalReduce' (#152961)
The functionality is used by two helper functions, and will be used even more in the future (e.g.,
https://github.com/llvm/llvm-project/pull/152941).
2025-08-11 15:48:20 -07:00
Florian Hahn
1c7c8e3ad3
Revert "[VPlan] Remove trivial dead VPPhi cycles."
This reverts commit 1f17bb133f4f49942a1e0245291811ca3c99a7d2.

This seems to be breaking some RISCV bots, reverting for now
https://lab.llvm.org/buildbot/#/builders/210/builds/1266
2025-08-11 22:05:30 +01:00
Florian Hahn
1f17bb133f
[VPlan] Remove trivial dead VPPhi cycles.
Update removeDeadRecipes to remove trivial dead VPPhi cycles.

Should effectively be NFC end-to-end.
2025-08-11 21:29:49 +01:00
XChy
df75b4b942
Revert "[DFAJumpThreading] Prevent pass from using too much memory." (#153075)
Reverts llvm/llvm-project#145482
2025-08-12 04:26:47 +08:00
Alexey Bataev
2d7b55a028
[SLP]Initial support for copyable elements
Adds initial support for copyable elements, both schedulable and
non-schedulable.
Adds support only for add for now, other opcodes will added in future.
Still some cases are not handled, e.g. stores do not include this,
because currently do not check for copyable elements.

Reviewers: hiraditya, RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/147366
2025-08-11 09:41:19 -04:00
Alexey Bataev
67af2f6c5c [SLP]Initial FMAD support (#149102)
Added initial check for potential fmad conversion in reductions and
operands vectorization.

Added the check for instruction to fix #152683

Skipped the code for reduction to avoid regressions.
2025-08-11 05:53:55 -07:00
Weibo He
13cd725857
[CoroSplit] Remove lifetime marker checks for subranges of allocas (#152886)
#150248 starts to drop size argument of lifetime markers. Then lifetime
markers cannot refer to subrange of allocas and we can remove this
check.
2025-08-11 13:02:07 +02:00
Andreas Jonson
330a589450
[PredicateInfo] Handle trunc nuw i1 condition. (#152988)
proof: https://alive2.llvm.org/ce/z/mxtn4L
2025-08-11 13:00:54 +02:00
Luke Lau
aea82a780a
[VPlan] Remove some getCanonicalIV() uses. NFC (#152969)
A lot of time getCanonicalIV() is used to get the canonical IV type,
e.g. to instantiate a VPTypeAnalysis or to get the LLVMContext.

However VPTypeAnalysis has a constructor that takes the VPlan directly
and there's a method on VPlan to get the LLVMContext directly, so use
those instead where possible.

This lets us remove a constructor on VPTypeAnalysis.

Also remove an unused LLVMContext argument in UnrollState whilst we're
here.
2025-08-11 18:12:05 +08:00
Luke Lau
acb86fb9e0
[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657)
In some places we were passing the type of value being accessed, in
other cases we were passing the type of the pointer for the access.

The most "involved" user is
LoopVectorizationCostModel::getMemInstScalarizationCost, which is the
only call site that passes in the SCEV, and it passes along the pointer
type.

This changes call sites to consistently pass the pointer type, and
renames the arguments to clarify this.

No target actually checks the contents of the type passed, only to see
if it's a vector or not, so this shouldn't have an effect.
2025-08-11 18:00:12 +08:00
Ramkumar Ramachandra
95c525b1db
[VPlan] Preserve nusw on VectorEndPointer (#151558)
In createInterleaveGroups, get the nusw in addition to inbounds from the
existing GEP, and set them on the VPVectorEndPointerRecipe.
2025-08-11 10:38:25 +01:00
David Sherwood
aba0ce10c7
[LV] Add new line to interleaving disabled message (#152722) 2025-08-11 09:53:20 +01:00
David Sherwood
9181a7e294
[LV] Fix branch weights in epilogue min iteration check block (#152534)
I've changed how we construct the EpilogueVectorizerEpilogueLoop and
EpilogueVectorizerMainLoop classes so that we construct the parent class
with an additional boolean parameter indicating whether we're
vectorising the main or epilogue loop. The
InnerLoopAndEpilogueVectorizer class uses this new argument in
combination with the EpilogueLoopVectorizationInfo struct to set the
right UF and VF values. This then allows EpilogueVectorizerEpilogueLoop
to access the correct values of VF and UF for the main loop, which are
required when setting branch weights in the minimum iteration check
block.
2025-08-11 09:52:54 +01:00
Elvis Wang
37fe7a9933
[LV] Generate scalar xor for VPInstruction::Not if possible. (#152628)
`VPInstruction::Not` which will generate xor instruction is widely used
for the exit condition. This patch make `VPInstruction::Not` generate
scalar `xor` if possible.

This can help reducing the (splat true) in the `xor` and make `xor` be
scalar.
2025-08-11 16:35:21 +08:00
Yingwei Zheng
84b31581f8
Revert "[PatternMatch] Add m_[Shift]OrSelf matchers." (#152953)
Reverts llvm/llvm-project#152924
According to
f67668b586,
it is not an NFC change.
2025-08-11 09:35:16 +02:00
hanbeom
a750fcb52b
[GVN] Check IndirectBr in Predecessor Terminators (#151188)
Critical edges with an IndirectBr terminator cannot be split. 
Add a check it to prevent assertion failures.

Fixes: #150229
2025-08-11 09:25:52 +02:00
Nikita Popov
35bad229c1
[PredicateInfo] Use bitcast instead of ssa.copy (#151174)
PredicateInfo needs some no-op to which the predicate can be attached.
Currently this is an ssa.copy intrinsic. This PR replaces it with a
no-op bitcast.
    
Using a bitcast is more efficient because we don't have the overhead of
an overloaded intrinsic. It also makes things slightly simpler overall.
2025-08-11 09:25:01 +02:00
David Green
6ca6d45b29
[VectorCombine] Use hasOneUser in shuffle-to-identity fold (#152675)
We need to check that the node is part of the graph being converted, so
will not contain external uses when transformed.
2025-08-11 07:45:15 +01:00
Mel Chen
6db3776f9b
[LV][EVL] Simplify EVL recipe transformation by using a single EVL mask. nfc (#152479)
The EVL mask is always defined as `icmp ult (step-vector, EVL)`, so we
only need to generate it once per plan in the header. Then, we replace
all uses of the header mask with the EVL mask, and recursively optimize
the users of EVL mask into EVL recipes. This way, the transformation to
EVL recipes can be done with just a single loop.
2025-08-11 11:09:01 +08:00
Yingwei Zheng
1c499351d6
[PatternMatch] Add m_[Shift]OrSelf matchers. (#152924)
Address the comment
https://github.com/llvm/llvm-project/pull/147414/files#r2228612726.
As they are usually used to match integer packing patterns, it is enough
to handle constant shamts.
2025-08-11 09:58:16 +08:00
Florian Hahn
86813aa786
[VPlan] Add dedicated user for resume phi with epilogue vectorization.
Epilogue vectorization currently relies on the resume phi for the
canonical induction being always available, which is why VPPhi are
considered to have side-effects, to prevent their removal.

This patch adds a new ResumeForEpilogue opcode to mark the resume phi as
used for epilogue vectorization. This allows treating VPPhis in general
as not having side-effects, enabling removal of unused VPPhis.
2025-08-10 21:21:16 +01:00
David Green
cfe190979e Revert "[SLP]Initial FMAD support (#149102)"
This reverts commit 0fffb9f9ed81f4c2084b8fe040c88b60bb6c372a due to major
performance regressions.
2025-08-10 15:16:01 +01:00
weiguozhi
5e87792200
[LoopInfo] Pointer to stack object may not be loop invariant in a coroutine function (#149936)
A coroutine function may be split to ramp function and resume function,
and they have different stack frames, so a pointer to stack objects may
have different addresses depending on where it is used, so it's not a
loop invariant.

It temporarily fixes https://github.com/llvm/llvm-project/issues/149604.
2025-08-09 14:20:19 -07:00
Florian Hahn
06fd0f9d65
[VPlan] Move initial skeleton construction earlier (NFC). (#150848)
Split up the not clearly named prepareForVectorization transform into
buildVPlan0, which adds the vector preheader, middle and scalar
preheader blocks, as well as the canonical induction recipes and sets
the trip count. The new transform is run directly after building the
plain CFG VPlan initially.

The remaining code handling early exits and adding the branch in the
middle block is renamed to handleEarlyExitsAndAddMiddleCheck and still
runs at the original position.

With the code movement, we only have to add the skeleton once to the
initial VPlan, and cloning will take care of the rest. It will also
enable moving other construction steps to work directly on VPlan0, like
adding resume phis.

PR: https://github.com/llvm/llvm-project/pull/150848
2025-08-09 20:54:42 +01:00
Alexey Bataev
0fffb9f9ed [SLP]Initial FMAD support (#149102)
Added initial check for potential fmad conversion in reductions and
operands vectorization.

Added the check for instruction to fix #152683
2025-08-08 10:30:23 -07:00
Alexander Richardson
3a4b351ba1
[IR] Introduce the ptrtoaddr instruction
This introduces a new `ptrtoaddr` instruction which is similar to
`ptrtoint` but has two differences:

1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance
2) `ptrtoaddr` only extracts (and then extends/truncates) the low
   index-width bits of the pointer

For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.

This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.

RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/139357
2025-08-08 10:12:39 -07:00
Drew Kersnar
90e8c8e718
[InferAlignment] Propagate alignment between loads/stores of the same base pointer (#145733)
We can derive and upgrade alignment for loads/stores using other
well-aligned loads/stores. This optimization does a single forward pass through
each basic block and uses loads/stores (the alignment and the offset) to
derive the best possible alignment for a base pointer, caching the
result. If it encounters another load/store based on that pointer, it
tries to upgrade the alignment. The optimization must be a forward pass within a basic
block because control flow and exception throwing can impact alignment guarantees.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2025-08-08 12:05:29 -05:00
Alexey Bataev
0419b459be Revert "[SLP]Initial FMAD support (#149102)"
This reverts commit 0bcf45ea3458ba79eb4257afcfd6af954292c9ce to fix the
regresions, reported in https://github.com/llvm/llvm-project/issues/152683
2025-08-08 09:17:59 -07:00
Szymon Piotr Milczek
fd41700962
[InstCombine] visitShuffleVectorInst assert with vector of pointers fix. (#152341)
In visitShuffleVectorInst there's an if block that's meant to turn
shufflevector followed by bitcast into extractelement where possible.

It assumes that there will never be bitcasts performed on vectors of ptr
as such operations are almost always illegal, and ptrtoint instructions
should be used instead.

There is however an edge case where a bitcast instruction can be
performed on a vector of type `<1 x ptr>` to turn it into type `ptr`

In this edge case, the code initializes the variable `VecBitWidth` to 0.
Then, when iterating over users that are bitcasts, an attempt is made to
create a vector of size 0, which triggers and assert.

This commit changes initialization of `VecBitWidth` to use datalayout to
find the the size of the vector instead of getPrimitiveSizeInBits method
which results in 0 for ptr and vectors of ptr.
2025-08-08 15:23:02 +02:00
Mel Chen
ab7281d896
[VPlan] Update naming in VPInterleaveRecipe constructor. nfc (#152472) 2025-08-08 20:17:10 +08:00
Florian Hahn
82d633e9ff
[VPlan] Materialize vector trip count using VPInstructions. (#151925)
Materialize the vector trip count computation using VPInstruction
instead of directly creating IR. This is one of the last few steps
needed to model the full vector skeleton in VPlan. It also simplifies
vector-trip count computations for scalable vectors, as we can re-use
the UF x VF computation.

PR: https://github.com/llvm/llvm-project/pull/151925
2025-08-08 11:44:32 +01:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Antonio Frighetto
e977b28c37
[InstCombine] Match intrinsic recurrences when known to be hoisted
For value-accumulating recurrences of kind:
```
  %umax.acc = phi i8 [ %umax, %backedge ], [ %a, %entry ]
  %umax = call i8 @llvm.umax.i8(i8 %umax.acc, i8 %b)
```
The binary intrinsic may be simplified into an intrinsic with init
value and the other operand, if the latter is loop-invariant:
```
  %umax = call i8 @llvm.umax.i8(i8 %a, i8 %b)
```

Proofs: https://alive2.llvm.org/ce/z/ea2cVC.

Fixes: https://github.com/llvm/llvm-project/issues/145875.
2025-08-08 09:31:50 +02:00
Bushev Dmitry
b5902924b2
[DFAJumpThreading] Prevent pass from using too much memory. (#145482)
The limit 'dfa-max-num-paths' that is used to control number of
enumerated paths was not checked against inside getPathsFromStateDefMap.
It may lead to large memory consumption for complex enough switch
statements.
2025-08-07 20:15:42 +03:00
Alexey Bataev
0bcf45ea34
[SLP]Initial FMAD support (#149102)
Added initial check for potential fmad conversion in reductions and
operands vectorization.
2025-08-07 09:51:43 -04:00
Nikita Popov
dbfc3ed690
[TypeSanitizer] Use alloca size for lifetime markers (#152154)
Split out from https://github.com/llvm/llvm-project/pull/150248:

Use the size of the alloca instead of the size passed to the lifetime
intrinsic.

As a bonus, this handles dynamic allocas correctly (see the added test)
instead of doing a memset with size -1...
2025-08-07 14:39:32 +02:00
Florian Hahn
95c32bf2d4
[VPlan] Return invalid cost if any skeleton block has invalid costs. (#151940)
We need to reject plans that contain recipes with invalid costs. LICM
can move recipes with invalid costs out of the loop region, which then
get missed by the main cost computation.

Extend the logic to check recipes for invalid cost currently only
covering the middle block to include all skeleton blocks.

Fixes https://github.com/llvm/llvm-project/issues/144358 
Fixes https://github.com/llvm/llvm-project/issues/151664

PR: https://github.com/llvm/llvm-project/pull/151940
2025-08-07 10:45:27 +01:00
Matt Arsenault
1110e2ff9f
InlineFunction: Split inlining into predicate and apply functions (#134213)
This is to support a new inline function reduction in llvm-reduce,
which should pre-filter callsites that are not eligible for inlining.

This code was mostly structured as a match and apply, with a few
exceptions. The ugliest piece is for propagating and verifying
compatible
getGC and personalities. Also collection of EHPad and the convergence
token
to use are now cached in InlineFunctionInfo.

I was initially confused by the split between the checks performed here
and isInlineViable, so better document how this system is supposed to
work.
It turns out this split does make sense, in that isInlineViable checks
if it's possible based on the callee content and the ultimate inline
depended on the callsite context. I think more renames of these
functions
would help, and isInlineViable should probably move out of InlineCost to
be
with these transfoms.
2025-08-07 16:13:36 +09:00
Florian Hahn
a485e0eae0
[VPlan] Retrieve vector TC for epilogue from resume phi (NFC).
Instead of relying on getOrCreateVectorTripCount to initialize
EPI.VectorTripCount, delay initialization after we retrieved the resume
phi and get the trip count from there. This makes the code independent
of legacy vector trip count creation.
2025-08-07 07:52:35 +01:00
Luke Lau
df8da2ff83
[VPlan] Support VPWidenPointerInductionRecipes with EVL tail folding (#152110)
Now that VPWidenPointerInductionRecipes are modelled in VPlan in
#148274, we can support them in EVL tail folding.

We need to replace their VFxUF operand with EVL as the increment is not
guaranteed to always be VF on the penultimate iteration, and UF is
always 1 with EVL tail folding.

We also need to move the creation of the backedge value to the latch so
that EVL dominates it.

With this we will no longer fail to convert a VPlan to EVL tail folding,
so adjust tryAddExplicitVectorLength to account for this. This brings us
to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail
folding vs no tail folding.

The test in only-compute-cost-for-vplan-vfs.ll previously relied on
widened pointer inductions with EVL tail folding to end up in a scenario
with no vector VPlans, so this also replaces it with an unvectorizable
fixed-order recurrence test from
first-order-recurrence-multiply-recurrences.ll that also gets discarded.
2025-08-07 10:54:24 +08:00
Ramkumar Ramachandra
092388171f
[VPlan] Introduce m_[Specific]ICmp matcher (#151540) 2025-08-06 20:35:35 +01:00
Florian Hahn
25d1285eec
[VPlan] Replace single-entry VPPhis with their incoming values.
Replace trivial, single-entry VPPhis with their incoming values,
2025-08-06 20:03:31 +01:00
Alexey Bataev
4784ce9ebc [SLP][NFC]Check an external user before trying to address it in debug dump, NFC 2025-08-06 08:58:16 -07:00
Yussur Mustafa Oraji
ded1f3ec96
[TSan] Add option to ignore capturing behavior when instrumenting (#148156)
While not needed for most applications, some tools such as
[MUST](https://www.i12.rwth-aachen.de/cms/i12/forschung/forschungsschwerpunkte/lehrstuhl-fuer-hochleistungsrechnen/~nrbe/must/)
depend on the instrumentation being present.
MUST uses the ThreadSanitizer annotation interface to detect data races
in MPI programs, where the capture tracking is detrimental as it has no
bearing on MPI data races, leading to missed races.
2025-08-06 15:47:33 +02:00
Florian Hahn
e80e7e717e
[VPlan] Use scalar VPPhi instead of VPWidenPHIRecipe in createPlainCFG. (#150847)
The initial VPlan closely reflects the original scalar loop, so unsing
VPWidenPHIRecipe here is premature. Widened phi recipes should only be
introduced together with other widened recipes.

PR: https://github.com/llvm/llvm-project/pull/150847
2025-08-06 14:43:03 +01:00
Florian Hahn
777c320e6c
[VPlan] Address comments missed in #142309.
Address additional comments from
https://github.com/llvm/llvm-project/pull/142309.
2025-08-06 11:52:08 +01:00
Andrew Rogers
a3c386d241
[llvm] annotate recently added interfaces for DLL export (#152179)
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates symbols that were recently
added to LLVM and fixes incorrectly annotated symbols.

## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

## Overview

The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.

The following manual adjustments were also applied after running IDS:
- Add `LLVM_EXPORT_TEMPLATE` and `LLVM_TEMPLATE_ABI` annotations to
explicitly instantiated instances of `llvm::object::SFrameParser`.

## Validation

On Windows 11:
```
cmake -B build -S llvm -G Ninja -DLLVM_ENABLE_PROJECTS="llvm;clang;clang-tools-extra;lldb;lld" -DLLVM_OPTIMIZED_TABLEGEN=ON -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_BUILD_LLVM_DYLIB_VIS=ON -DLLVM_LINK_LLVM_DYLIB=ON -DLLVM_BUILD_TESTS=ON -DCLANG_LINK_CLANG_DYLIB=OFF -DCMAKE_BUILD_TYPE=Release
ninja -C build
```
2025-08-05 23:12:07 -07:00
Mircea Trofin
f675483905
[profcheck] Annotate select instructions (#152171)
For `select`, we don't have the equivalent of the branch probability analysis to offer defaults, so we make up our own and allow their overriding with flags.

Issue #147390
2025-08-06 02:48:50 +02:00
Florian Hahn
d478502a42
[VPlan] Ensure that IV resume phi for epilogue is always first. (NFCI)
Update handling of canonical IV resume phi for the epilogue loop to make
sure the resume phi for the canonical IV is always the first phi in the
scalar preheader.

This makes it easier to retrieve it in preparePlanForEpilogueVectorLoop.

For now, we keep an assert to make sure we use the same resume phi as
before. This will be removed in the future.
2025-08-05 21:06:41 +01:00
Florian Hahn
47258ca470
[VPlan] Use VPPhi instead of dyn_cast + opcode check in isPhi (NFC). 2025-08-05 19:20:12 +01:00