7991 Commits

Author SHA1 Message Date
Ryotaro Kasuga
2330fd2f73
[LoopPeel] Add new option to peeling loops to convert PHI into IV (#121104)
LoopPeel currently considers PHI nodes that become loop invariants
through peeling. However, in some cases, peeling transforms PHI nodes
into induction variables (IVs), potentially enabling further
optimizations such as loop vectorization. For example:

```c
// TSVC s292
int im = N-1;
for (int i=0; i<N; i++) {
  a[i] = b[i] + b[im];
  im = i;
}
```

In this case, peeling one iteration converts `im` into an IV, allowing
it to be handled by the loop vectorizer.

This patch adds a new feature to peel loops when to convert PHIs into
IVs. At the moment this feature is disabled by default.

Enabling it allows to vectorize the above example. I have measured on
neoverse-v2 and observed a speedup of more than 60% (options: `-O3
-ffast-math -mcpu=neoverse-v2 -mllvm -enable-peeling-for-iv`).

This PR is taken over from #94900
Related #81851
2025-08-20 13:44:56 +00:00
Orlando Cazalet-Hyams
6c9352530a
[RemoveDIs][NFC] Clean up BasicBlockUtils now intrinsics are gone (#154326)
A couple of minor readability changes now that we're not supporting both
intrinsics and records.
2025-08-20 10:03:44 +01:00
Stephen Tozer
5cedb01487 [Debugify] Fix compile error in tracking coverage build
Forward-fixes a compile error in bc216b057d (#150212) in specific build
configurations, due to a missing const_cast.
2025-08-19 11:18:42 +01:00
David Green
a7df02f83c
[InstCombine] Make strlen optimization more resilient to different gep types. (#153623)
This makes the optimization in optimizeStringLength for strlen(gep
@glob, %x) -> sub endof@glob, %x a little more resilient, and maybe a
bit more correct for geps with non-array types.
2025-08-19 10:37:17 +01:00
Andreas Jonson
1b60236200
[SimplifyCFG] Avoid redundant calls in gather. (NFC) (#154133)
Split out from https://github.com/llvm/llvm-project/pull/154007 as it
showed compile time improvements

NFC as there needs to be at least two icmps that is part of the chain.
2025-08-18 18:45:52 +02:00
Kazu Hirata
07eb7b7692
[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
2025-08-18 07:01:29 -07:00
Arne Stenkrona
ea2f5395b1
[SimplifyCFG] Avoid threading for loop headers (#151142)
Updates SimplifyCFG to avoid jump threading through loop headers if
-keep-loops is requested. Canonical loop form requires a loop header
that dominates all blocks in the loop. If we thread through a header, we
risk breaking its domination of the loop. This change avoids this issue
by conservatively avoiding threading through headers entirely.

Fixes: https://github.com/llvm/llvm-project/issues/151144
2025-08-18 09:46:55 +00:00
Kazu Hirata
cbf5af9668
[llvm] Remove unused includes (NFC) (#154051)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-08-17 23:46:35 -07:00
Andreas Jonson
5ae8a9b8ce
[SimplifyCfg] Handle trunc nuw i1 condition in Equality comparison. (#153051)
proof: https://alive2.llvm.org/ce/z/WVt4-F
2025-08-17 09:53:40 +02:00
Matt Arsenault
3e5d8a1439 Reapply "RuntimeLibcalls: Generate table of libcall name lengths (#153… (#153864)
This reverts commit 334e9bf2dd01fbbfe785624c0de477b725cde6f2.

Check if llvm-nm exists before building the benchmark.
2025-08-16 09:53:50 +09:00
gulfemsavrun
334e9bf2dd
Revert "RuntimeLibcalls: Generate table of libcall name lengths (#153… (#153864)
…210)"

This reverts commit 9a14b1d254a43dc0d4445c3ffa3d393bca007ba3.

Revert "RuntimeLibcalls: Return StringRef for libcall names (#153209)"

This reverts commit cb1228fbd535b8f9fe78505a15292b0ba23b17de.

Revert "TableGen: Emit statically generated hash table for runtime
libcalls (#150192)"

This reverts commit 769a9058c8d04fc920994f6a5bbb03c8a4fbcd05.

Reverted three changes because of a CMake error while building llvm-nm
as reported in the following PR:
https://github.com/llvm/llvm-project/pull/150192#issuecomment-3192223073
2025-08-15 13:32:27 -07:00
zGoldthorpe
a8d25683ee
[PatternMatch] Allow m_ConstantInt to match integer splats (#153692)
When matching integers, `m_ConstantInt` is a convenient alternative to
`m_APInt` for matching unsigned 64-bit integers, allowing one to
simplify

```cpp
const APInt *IntC;
if (match(V, m_APInt(IntC))) {
  if (IntC->ule(UINT64_MAX)) {
    uint64_t Int = IntC->getZExtValue();
    // ...
  }
}
```
to
```cpp
uint64_t Int;
if (match(V, m_ConstantInt(Int))) {
  // ...
}
```

However, this simplification is only true if `V` is a scalar type.
Specifically, `m_APInt` also matches integer splats, but `m_ConstantInt`
does not.

This patch ensures that the matching behaviour of `m_ConstantInt`
parallels that of `m_APInt`, and also incorporates it in some obvious
places.
2025-08-15 10:43:54 -06:00
Stephen Tozer
bc216b057d
[Debugify] Improve reduction of debugify coverage build output (#150212)
In current DebugLoc coverage builds, the output for any reasonably large
build can become very large if any missing DebugLocs are present; this
happens because single errors in LLVM may result in many errors being
reported in the output report. The main cause of this is that the empty
locations attached to instructions may be propagated to other
instructions in later passes, which will each be reported as new errors.
This patch prevents this by adding an "unknown" annotation to
instructions after reporting them once, ensuring that any other
DebugLocs copied or derived from the original empty location will not be
marked as new errors.

As a separate but related change, this patch updates the report
generation script to deduplicate results using the recorded stacktrace
if they are available, instead of the pass+instruction combination. This
reduces the size of the reduction, but makes the reduction highly
reliable, as the stacktrace allows us to very precisely identify when
two bugs have originated from the same place.
2025-08-15 14:01:04 +01:00
Matt Arsenault
cb1228fbd5
RuntimeLibcalls: Return StringRef for libcall names (#153209)
Does not yet fully propagate this down into the TargetLowering
uses, many of which are relying on null checks on the returned
value.
2025-08-15 09:55:39 +09:00
Orlando Cazalet-Hyams
d13341db26
[RemoveDIs][NFC] Remove getAssignmentMarkers (#153214)
getAssignmentMarkers was for debug intrinsics. getDVRAssignmentMarkers
is used for DbgRecords.
2025-08-13 10:56:19 +01:00
Andreas Jonson
1840106ddf
[SCCP] Add support for trunc nuw range. (#152990)
proof: https://alive2.llvm.org/ce/z/_7PVxq
2025-08-12 13:48:55 +02:00
Nikita Popov
ab323eb0c6 [SCCP][PredicateInfo] Do not predicate argument of lifetime intrinsic
Replacing the argument with a no-op bitcast violates a verifier
constraint, even if only temporarily. Any replacement based on it
would result in a violation even after the copy has been removed.

Fixes https://github.com/llvm/llvm-project/issues/153013.
2025-08-12 12:56:08 +02:00
Sam Tebbs
0bfa1718af
[LV] Create in-loop sub reductions (#147026)
This PR allows the loop vectorizer to handle in-loop sub reductions by
forming a normal in-loop add reduction with a negated input.

Stacked PRs:
1. -> https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-08-12 10:22:41 +01:00
Andreas Jonson
330a589450
[PredicateInfo] Handle trunc nuw i1 condition. (#152988)
proof: https://alive2.llvm.org/ce/z/mxtn4L
2025-08-11 13:00:54 +02:00
hanbeom
a750fcb52b
[GVN] Check IndirectBr in Predecessor Terminators (#151188)
Critical edges with an IndirectBr terminator cannot be split. 
Add a check it to prevent assertion failures.

Fixes: #150229
2025-08-11 09:25:52 +02:00
Nikita Popov
35bad229c1
[PredicateInfo] Use bitcast instead of ssa.copy (#151174)
PredicateInfo needs some no-op to which the predicate can be attached.
Currently this is an ssa.copy intrinsic. This PR replaces it with a
no-op bitcast.
    
Using a bitcast is more efficient because we don't have the overhead of
an overloaded intrinsic. It also makes things slightly simpler overall.
2025-08-11 09:25:01 +02:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Matt Arsenault
1110e2ff9f
InlineFunction: Split inlining into predicate and apply functions (#134213)
This is to support a new inline function reduction in llvm-reduce,
which should pre-filter callsites that are not eligible for inlining.

This code was mostly structured as a match and apply, with a few
exceptions. The ugliest piece is for propagating and verifying
compatible
getGC and personalities. Also collection of EHPad and the convergence
token
to use are now cached in InlineFunctionInfo.

I was initially confused by the split between the checks performed here
and isInlineViable, so better document how this system is supposed to
work.
It turns out this split does make sense, in that isInlineViable checks
if it's possible based on the callee content and the ultimate inline
depended on the callsite context. I think more renames of these
functions
would help, and isInlineViable should probably move out of InlineCost to
be
with these transfoms.
2025-08-07 16:13:36 +09:00
Mircea Trofin
f675483905
[profcheck] Annotate select instructions (#152171)
For `select`, we don't have the equivalent of the branch probability analysis to offer defaults, so we make up our own and allow their overriding with flags.

Issue #147390
2025-08-06 02:48:50 +02:00
Kazu Hirata
908ef45606 [Utils] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Utils/SplitModuleByCategory.cpp:321:14: error:
  moving a temporary object prevents copy elision
  [-Werror,-Wpessimizing-move]
2025-08-05 07:24:10 -07:00
Maksim Sabianin
3f59a22711
[offload][SYCL] Add Module splitting by categories. (#131347)
This patch adds Module splitting by categories. The splitting algorithm
is the necessary step in the SYCL compilation pipeline. Also it could be
reused for other heterogenous targets.

The previous attempt was at #119713. In this patch there is no
dependency in `TransformUtils` on "IPO" and on "Printing Passes". In
this patch a module splitting is self-contained and it doesn't introduce
linking issues.
2025-08-05 14:04:59 +00:00
Kazu Hirata
35dd88918f
[llvm] Use llvm::iterator_range::empty (NFC) (#151905) 2025-08-04 07:40:46 -07:00
Andreas Jonson
c6fd3d32c3
[SimplifyCfg] Add nneg to zext for switch to table conversion (#147180) 2025-08-04 16:18:05 +02:00
Nikita Popov
e833bb0991 [Local] Do not pass Root to replaceDominatedUsesWith (NFC)
Capture it in the lambdas instead.
2025-08-04 14:22:17 +02:00
Nikita Popov
86727fe9a1
[IR] Allow poison argument to lifetime markers (#151148)
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.

It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.

Fixes https://github.com/llvm/llvm-project/issues/151119.
2025-08-04 10:02:04 +02:00
Mircea Trofin
9a60841dc4
[PGO][profcheck] ignore explicitly cold functions (#151778)
There is a case when branch profile metadata is OK to miss, namely, cold functions. The goal of the RFC (see the referenced issue) is to avoid accidental omission (and, at a later date, corruption) of profile metadata. However, asking cold functions to have all their conditional branches marked with "0" probabilities would be overdoing it. We can just ask cold functions to have an explicit 0 entry count.

This patch:
- injects an entry count for functions, unless they have one (synthetic or not)
- if the entry count is 0, doesn't inject, nor does it verify the rest of the metadata
- at verification, if the entry count is missing, it reports an error

Issue #147390
2025-08-04 03:53:49 +02:00
Joel E. Denny
37e03b56b8
Revert "[PGO] Add llvm.loop.estimated_trip_count metadata" (#151585)
Reverts llvm/llvm-project#148758

[As
requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)
2025-07-31 15:56:31 -04:00
Joel E. Denny
a85c725952 Revert "[Utils] Fix a warning"
This reverts commit 3a18fe33f0763cd9276c99c276448412100f6270.

So that we can revert PR #148758.
2025-07-31 15:54:01 -04:00
Kazu Hirata
3a18fe33f0 [Utils] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Utils/LoopUtils.cpp:818:28: error: unused
  function 'operator<<' [-Werror,-Wunused-function]
2025-07-31 11:24:33 -07:00
Joel E. Denny
f7b65011de
[PGO] Add llvm.loop.estimated_trip_count metadata (#148758)
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.

An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
2025-07-31 12:28:25 -04:00
Florian Hahn
99d70e09a9
[SCEV] Allow adds of constants in tryToReuseLCSSAPhi. (#150693)
Update the logic added in
https://github.com/llvm/llvm-project/pull/147824 to also allow adds of
constants. There are a number of cases where this can help remove
redundant phis and replace some computation with a ptrtoint (which
likely is free in the backend).

PR: https://github.com/llvm/llvm-project/pull/150693
2025-07-31 16:33:25 +01:00
LU-JOHN
a757f23404
[SimplifyCFG] Extend jump-threading to allow live local defs (#135079)
Extend jump-threading to allow local defs that are live outside of the
threaded block. Allow threading to destinations where the local defs are
not live.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-07-31 09:44:14 -04:00
Nikita Popov
fa6965f722 [SCCP] Extract PredicateInfo handling into separate method (NFC) 2025-07-29 16:36:33 +02:00
Ellis Hoag
819f020b28
Use F.hasOptSize() instead of checking optsize directly (#147348) 2025-07-28 08:38:52 -07:00
Florian Hahn
f9f68af4b8
[SCEV] Make sure LCSSA is preserved when re-using phi if needed.
If we insert a new add instruction, it may introduce a new use outside
the loop that contains the phi node we re-use. Use fixupLCSSAFormFor to
fix LCSSA form, if needed.

This fixes a crash reported in
https://github.com/llvm/llvm-project/pull/147824#issuecomment-3124670997.
2025-07-28 16:24:46 +01:00
Florian Hahn
e21ee41be4
[SCEV] Try to re-use pointer LCSSA phis when expanding SCEVs. (#147824)
Generalize the code added in
https://github.com/llvm/llvm-project/pull/147214 to also support
re-using pointer LCSSA phis when expanding SCEVs with AddRecs.

A common source of integer AddRecs with pointer bases are runtime checks
emitted by LV based on the distance between 2 pointer AddRecs.

This improves codegen in some cases when vectorizing and prevents
regressions with https://github.com/llvm/llvm-project/pull/142309, which
turns some phis into single-entry ones, which SCEV will look through
now (and expand the whole AddRec), whereas before it would have to treat
the LCSSA phi as SCEVUnknown.

Compile-time impact neutral:
https://llvm-compile-time-tracker.com/compare.php?from=fd5fc76c91538871771be2c3be2ca3a5f2dcac31&to=ca5fc2b3d8e6efc09f1624a17fdbfbe909f14eb4&stat=instructions:u

PR: https://github.com/llvm/llvm-project/pull/147824
2025-07-25 15:29:40 +01:00
Kazu Hirata
3e53d4d386
[llvm] Remove unused includes (NFC) (#150265)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-07-23 15:18:46 -07:00
Mircea Trofin
df2d2d125b
[PGO] Add ProfileInjector and ProfileVerifier passes (#147388)
Adding 2 passes, one to inject `MD_prof` and one to check its presence. A subsequent patch will add these (similar to debugify) to `opt` (and, eventually, a variant of this, to `llc`)

Tracking issue: #147390
2025-07-23 21:34:58 +02:00
Nikita Popov
bdd638a897 [Local] Remove handling for lifetime intrinsic on non-alloca (NFC)
After #149310 this is guaranteed to be an alloca.
2025-07-23 14:21:22 +02:00
Nikita Popov
b59aaf7da7
[Sanitizers] Remove handling for lifetimes on non-alloca insts (NFC) (#149994)
After #149310 the pointer argument of lifetime.start/lifetime.end is
guaranteed to be an alloca, so we don't need to go through
findAllocaForValue() anymore, and don't have to have special handling
for the case where it fails.
2025-07-23 09:48:32 +02:00
Nikita Popov
307256ecbd
[GVNSink] Do not sink lifetimes of different allocas (#149818)
This was always undesirable, and after #149310 it is illegal and will
result in a verifier error.

Fix this by moving SimplifyCFG's check for this into
canReplaceOperandWithVariable(), so it's shared with GVNSink.
2025-07-22 09:44:03 +02:00
Jeremy Morse
c9ceb9b75f
[DebugInfo] Remove intrinsic-flavours of findDbgUsers (#149816)
This is one of the final remaining debug-intrinsic specific codepaths
out there, and pieces of cross-LLVM infrastructure to do with debug
intrinsics.
2025-07-21 17:49:25 +01:00
Yingwei Zheng
9e587ce6f0
[SCCP] Simplify [us]cmp(X, Y) into X - Y (#144717)
If the difference between [us]cmp's operands is not greater than 1, we
can simplify it into `X - Y`.
Alive2: https://alive2.llvm.org/ce/z/JS55so
llvm-opt-benchmark diff:
https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2464/files
2025-07-20 15:01:44 +08:00
Prabhu Rajasekaran
921c6dbeca
[llvm] Introduce callee_type metadata
Introduce `callee_type` metadata which will be attached to the indirect
call instructions.

The `callee_type` metadata will be used to generate `.callgraph` section
described in this RFC:
https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html

Reviewers: morehouse, petrhosek, nikic, ilovepi

Reviewed By: nikic, ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/87573
2025-07-18 14:40:54 -07:00
Florian Hahn
004c67ea25
[LV] Vectorize maxnum/minnum w/o fast-math flags. (#148239)
Update LV to vectorize maxnum/minnum reductions without fast-math flags,
by adding an extra check in the loop if any inputs to maxnum/minnum are
NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros 
are already handled consistently by maxnum/minnum.

If any input is NaN,
 *exit the vector loop,
 *compute the reduction result up to the vector iteration that contained
   NaN inputs and
 * resume in the scalar loop


New recurrence kinds are added for reductions using maxnum/minnum
without fast-math flags.

PR: https://github.com/llvm/llvm-project/pull/148239
2025-07-18 21:58:19 +01:00