14306 Commits

Author SHA1 Message Date
Justin Fargnoli
58de8f2c25
[Inliner] Add option (default off) to inline all calls regardless of the cost (#152365)
Add a default off option to the inline cost calculation to always inline
all viable calls regardless of the cost/benefit and cost/threshold
calculations.

For performance reasons, some users require that all calls be inlined.
Rather than forcing them to adjust the inlining threshold to an
arbitrarily high value, offer an option to inline all calls.
2025-08-18 17:48:49 +00:00
Panagiotis Karouzakis
c2e7fad446
[DemandedBits] Support non-constant shift amounts (#148880)
This patch adds support for the shift operators to handle non-constant
shift operands.

ashr proof -->https://alive2.llvm.org/ce/z/EN-siK
lshr proof --> https://alive2.llvm.org/ce/z/eeGzyB
shl proof --> https://alive2.llvm.org/ce/z/dpvbkq
2025-08-19 01:11:16 +08:00
Kazu Hirata
07eb7b7692
[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
2025-08-18 07:01:29 -07:00
Andreas Jonson
0561ff6a12
[LVI] Add support for trunc nuw range. (#154021)
Proof: https://alive2.llvm.org/ce/z/a5Yjb8
2025-08-17 20:24:09 +02:00
Florian Hahn
36be0bba2a
[SCEV] Check if predicate is known false for predicated AddRecs. (#151134)
Similarly to https://github.com/llvm/llvm-project/pull/131538, we can
also try and check if a predicate is known to wrap given the backedge
taken count.

For now, this just checks directly when we try to create predicated
AddRecs. This both helps to avoid spending compile-time on optimizations
where we know the predicate is false, and can also help to allow
additional vectorization (e.g. by deciding to scalarize memory accesses
when otherwise we would try to create a predicated AddRec with a
predicate that's always false).

The initial version is quite restricted, but can be extended in
follow-ups to cover more cases.

PR: https://github.com/llvm/llvm-project/pull/151134
2025-08-15 09:30:25 +01:00
Jasmine Tang
10d9e7b1b7
Reapply "[WebAssembly] Constant fold wasm.dot" (#153070)
In #149619, for the test of `@dot_follow_modulo_spec_2`, constant
folding the addition of two i32 1073741824 causes an overflow from 2^32
to -2^32=-2147483648, which triggers the UB sanitizer. This PR reapplies
the previous PR, explicitly casting the addition operand to int64_t
first before performing the addition before producing a int32 number via
`Constant *C = get(cast<IntegerType>(Ty->getScalarType()), V, isSigned)`
2025-08-14 18:52:35 -07:00
joaosaffran
d56fa96524
[DirectX] Add Range Overlap validation (#152229)
As part of the Root Signature Spec, we need to validate if Root
Signatures are not defining overlapping ranges.
Closes: https://github.com/llvm/llvm-project/issues/126645

---------

Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com>
Co-authored-by: Joao Saffran <jderezende@microsoft.com>
2025-08-14 18:40:11 -04:00
Michael Berg
334a046a3c
[LoopDist] Consider reads and writes together for runtime checks (#145623)
Emit safety guards for ptr accesses when cross partition loads exist
which have a corresponding store to the same address in a different
partition. This will emit the necessary ptr checks for these accesses.

The test case was obtained from SuperTest, which SiFive runs regularly.
We enabled LoopDistribution by default in our downstream compiler, this
change was part of that enablement.
2025-08-14 12:50:17 -07:00
Elvis Wang
01fac67e2a
[TTI] Add cost kind to getAddressComputationCost(). NFC. (#153342)
This patch add cost kind to `getAddressComputationCost()` for #149955.

Note that this patch also remove all the default value in `getAddressComputationCost()`.
2025-08-14 16:01:44 +08:00
Ryotaro Kasuga
bf6796fa8f
[DA] Extract duplicated logic from exactSIVtest and exactRDIVtest (NFC) (#152712)
This patch refactors `exactSIVtest` and `exactRDIVtest` by consolidating
duplicated logic into a single function. Same as #152688, the main goal
is to improve code maintainability, since extra validation logic (as
written in TODO comments) may be necessary.
2025-08-13 17:45:28 +09:00
Ryotaro Kasuga
bce0f9d2bf
[DA] Extract duplicated logic from gcdMIVtest (NFCI) (#152688)
This patch refactors `gcdMIVtest` by consolidating duplicated logic into
a single function. The main goal of this change is to improve code
maintainability rather than readability, especially since we may need to
revise this logic for correctness (as noted in the added TODO comments).

I hope this patch is NFC, but I've also added several new assertions,
which may cause some previously passing cases to fail.
2025-08-13 15:07:50 +09:00
Sam Tebbs
0bfa1718af
[LV] Create in-loop sub reductions (#147026)
This PR allows the loop vectorizer to handle in-loop sub reductions by
forming a normal in-loop add reduction with a negated input.

Stacked PRs:
1. -> https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-08-12 10:22:41 +01:00
Helena Kotas
5165a6c197
[HLSL] Update DXIL resource metadata code to support resource arrays (#152254)
Closes #145422
2025-08-11 14:55:54 -07:00
Luke Lau
acb86fb9e0
[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657)
In some places we were passing the type of value being accessed, in
other cases we were passing the type of the pointer for the access.

The most "involved" user is
LoopVectorizationCostModel::getMemInstScalarizationCost, which is the
only call site that passes in the SCEV, and it passes along the pointer
type.

This changes call sites to consistently pass the pointer type, and
renames the arguments to clarify this.

No target actually checks the contents of the type passed, only to see
if it's a vector or not, so this shouldn't have an effect.
2025-08-11 18:00:12 +08:00
Sushant Gokhale
e8918c318e
[SCEV] Consider non-volatile memory intrinsics as not having side-effect for forward progress (#150916)
For the attached test:
Before the loop-idiom pass, we have a store into the inner loop which is
considered simple and one that does not have any side effects on the
loop. Post loop-idiom pass, we get a memset into the outer loop that is
considered to introduce side effects on the loop. This changes the
backedge taken count before and after the pass and hence, the crash with
verify-scev.

We try to consider non-volatile memory intrinsics as not having
side-effect for forward progress to fix the issue.

Fixes #149377
2025-08-11 00:24:50 -07:00
Yingwei Zheng
2242e28671
[Analysis] Remove an unreachable check. NFC. (#152874)
Binops never produce pointer values.
2025-08-10 14:43:40 +08:00
weiguozhi
5e87792200
[LoopInfo] Pointer to stack object may not be loop invariant in a coroutine function (#149936)
A coroutine function may be split to ramp function and resume function,
and they have different stack frames, so a pointer to stack objects may
have different addresses depending on where it is used, so it's not a
loop invariant.

It temporarily fixes https://github.com/llvm/llvm-project/issues/149604.
2025-08-09 14:20:19 -07:00
Alexander Richardson
3cf7262876
[CaptureTracking] Handle ptrtoaddr
Unlike ptrtoint, ptrtoaddr does not capture provenance, only the address.
Note: As defined by the LangRef, we always treat `ptrtoaddr` as a
location-independent address capture since it is a direct inspection of the
pointer address.

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/152221
2025-08-08 14:22:42 -07:00
Alexander Richardson
3a4b351ba1
[IR] Introduce the ptrtoaddr instruction
This introduces a new `ptrtoaddr` instruction which is similar to
`ptrtoint` but has two differences:

1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance
2) `ptrtoaddr` only extracts (and then extends/truncates) the low
   index-width bits of the pointer

For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.

This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.

RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/139357
2025-08-08 10:12:39 -07:00
Ivan R. Ivanov
7c141e2118
[ValueTracking] Add missing check for two-value PN recurrence matching (#152700)
When InstTy is a type like IntrinsicInst which can have a variable
number of arguments, we can encounter a case where Operation will have
fewer than two arguments and error at the getOperand() calls.

Fixes: https://github.com/llvm/llvm-project/issues/152725.
2025-08-08 17:39:24 +02:00
Ryotaro Kasuga
bd39ae6125
[Delinearization] Add function for fixed size array without relying on GEP (#145050)
The existing functions `getIndexExpressionsFromGEP` and
`tryDelinearizeFixedSizeImpl` provide functionality to delinearize
memory accesses for fixed size array. They use the GEP source element
type in their optimization heuristics. However, driving optimization
heuristics based on GEP type information is not allowed.

This patch introduces new functions `findFixedSizeArrayDimensions` and
`delinearizeFixedSizeArray` to delinearize a fixed size array without
using the type information in GEP. The new function
`findFixedSizeArrayDimensions` infers the size of each dimension of the
array based on the value to be added to the address as induction
variables are incremented. `delinearizeFixedSizeArray` attempts to
restore the subscripts of each dimension based on the estimated array
size.

This is an initial implementation that may not cover all cases, but is
intended to replace the existing function in the future.

Related:
- https://discourse.llvm.org/t/enabling-loop-interchange/82589/4
-
https://github.com/llvm/llvm-project/pull/124911#issuecomment-2962499501
2025-08-08 19:08:14 +09:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Ryotaro Kasuga
05dd957cda
[DA] Fix the check between Subscript and Size after delinearization (#151326)
Delinearization provides two values: the size of the array, and the
subscript of the access. DA checks their validity (`0 <= subscript <
size`), with some special handling. In particular, to ensure `subscript
< size`, calculate the maximum value of `subscript - size` and check if
it is negative. There was an issue in its process: when `subscript -
size` is expressed as an affine format like `init + step * i`, the value
in the last iteration (`start + step * (num_iterations - 1)`) was
assumed to be the maximum value. This assumption is incorrect in the
following cases:

- When `step` is negative
- When the AddRec wraps

This patch introduces extra checks to ensure the sign of `step` and
verify the existence of nsw/nuw flags.

Also, `isKnownNonNegative(S - smax(1, Size))` was used as a regular
check, which is incorrect when `Size` is negative. This patch also
replace it with `isKnownNonNegative(S - Size)`, although it's still
unclear whether using `isKnownNonNegative` is appropriate in the first
place.

Fix #150604
2025-08-08 10:58:13 +09:00
Ramkumar Ramachandra
edeee824f0
Reland [VectorUtils] Trivially vectorize ldexp, [l]lround (#152476)
Changes: The original patch, landed as 1336675, was reverted due to a
bug in LoopVectorize resulting in a crash. The bug has now been fixed by
95c32bf ([VPlan] Return invalid cost if any skeleton block has invalid
costs), and this reland is identical to the original patch.
2025-08-07 12:07:29 +01:00
Michael Kruse
04196ba01a
[DA][NFC] clang-format DependenceAnalysis (#151505)
To avoid noise in PRs such as in #146383.
2025-08-07 11:44:25 +02:00
Andrew Lazarev
f61526971f
Revert "[WebAssembly] Constant fold wasm.dot" (#152382)
Reverts llvm/llvm-project#149619

It breaks ubsan bot:
https://lab.llvm.org/buildbot/#/builders/25/builds/10523

Earlier today the failure was hidden by another breakage that is fixed
now.
2025-08-06 15:16:19 -07:00
Jasmine Tang
9c6bb18040
[WebAssembly] Constant fold wasm.dot (#149619)
Constant fold wasm.dot of constant vectors/splats.

Test case added in
`llvm/test/Transforms/InstSimplify/ConstProp/WebAssembly/dot.ll`

Related to https://github.com/llvm/llvm-project/issues/55933
2025-08-05 15:22:37 -07:00
Pedro Lobo
2bbc614713
[InstCombine] Support offsets in memset to load forwarding (#151924)
Adds support for load offsets when performing `memset` load forwarding.
2025-08-05 17:09:06 +01:00
Nikita Popov
c1b387e23d
[MemoryLocation] Compute lifetime size from alloca size (#151982)
Split out from #150248:

Since #150944 the size passed to lifetime.start/end is considered
meaningless. The lifetime always applies to the whole alloca.

This adjusts MemoryLocation to determine the MemoryLocation size from
the alloca size, instead of using the argument.
2025-08-05 10:47:07 +02:00
Nikita Popov
ba099c516d
[StackLifetime] Remove handling for lifetime size mismatch (#151965)
Split out from #150248:

Since #150944 the size passed to lifetime.start/end is considered
meaningless. The lifetime always applies to the whole alloca.

Accordingly remove handling for size mismatch in the StackLifetime
analysis.
2025-08-05 09:19:10 +02:00
Nikita Popov
4b5b36e5c4 [GVN] Avoid creating lifetime of non-alloca
There is a larger problem here in that we should not be performing
arbitrary pointer replacements for assumes. This is handled for
branches, but assume goes through a different code path.

Fixes https://github.com/llvm/llvm-project/issues/151785.
2025-08-04 12:06:40 +02:00
Abhishek Kaushik
30728eb26b
[Reland][ValueTracking] Improve Bitcast handling to match SDAG (#145223)
Fixes #125228

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-04 14:51:03 +05:30
Nikita Popov
86727fe9a1
[IR] Allow poison argument to lifetime markers (#151148)
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.

It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.

Fixes https://github.com/llvm/llvm-project/issues/151119.
2025-08-04 10:02:04 +02:00
Florian Hahn
2ae996cbbe
[LAA] Support assumptions in evaluatePtrAddRecAtMaxBTCWillNotWrap (#147047)
This patch extends the logic added in
https://github.com/llvm/llvm-project/pull/128061 to support
dereferenceability information from assumptions as well.

Unfortunately both assumption cache and the dominator tree need to be
threaded through multiple layers to make them available where needed.

PR: https://github.com/llvm/llvm-project/pull/147047
2025-08-01 14:18:07 +01:00
Lewis Crawford
5146917407
[ConstantFolding] Fix incorrect nvvm_round folding (#151563)
The `nvvm_round` intrinsic should round to the nearest even number in
the case of ties. It lowers to PTX `cvt.rni`, which will "round to
nearest integer, choosing even integer if source is equidistant between
two integers", so it matches the semantics of `rint` (and not `round` as
the name suggests).
2025-08-01 10:31:43 +01:00
Muhammad Omair Javaid
176d54aa33 Revert "[VectorUtils] Trivially vectorize ldexp, [l]lround (#145545)"
This reverts commit 13366759c3b9db9366659d870cc73c938422b020.

This broke various LLVM testsuite buildbots for AArch64 SVE, but the
problem got masked because relevant buildbots were already failing
due to other breakage.

It has broken llvm-test-suite test:
gfortran-regression-compile-regression__vect__pr106253_f.test

https://lab.llvm.org/buildbot/#/builders/4/builds/8164
https://lab.llvm.org/buildbot/#/builders/17/builds/9858
https://lab.llvm.org/buildbot/#/builders/41/builds/8067
https://lab.llvm.org/buildbot/#/builders/143/builds/9607
2025-08-01 01:24:52 +05:00
Joel E. Denny
37e03b56b8
Revert "[PGO] Add llvm.loop.estimated_trip_count metadata" (#151585)
Reverts llvm/llvm-project#148758

[As
requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)
2025-07-31 15:56:31 -04:00
Joel E. Denny
f7b65011de
[PGO] Add llvm.loop.estimated_trip_count metadata (#148758)
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.

An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
2025-07-31 12:28:25 -04:00
Justin Bogner
3f066f5fcf
[HLSL][DirectX] Extract HLSLBinding out of DXILResource. NFC (#150633)
We extract the binding logic out of the DXILResource analysis passes into the
FrontendHLSL library. This will allow us to use this logic for resource and
root signature bindings in both the DirectX backend and the HLSL frontend.
2025-07-31 08:35:47 -07:00
Nathan Gauër
67273393b1
[VectorCombine][TTI] Prevent extract/ins rewrite to GEP (#150216)
Using GEP to index into a vector is not disallowed, but not recommended.
The SPIR-V backend needs to generate structured access into types, which
is impossible with an untyped GEP instruction unless we add more info to
the IR. Finding a solution is a work-in-progress, but in the meantime,
we'd like to reduce the amount of failures.

Preventing this optimizations from rewritting extract/insert
instructions into a GEP helps us lower more code to SPIR-V. This change
should be OK as it's only active when targeting SPIR-V and disabling a
non-recommended transformation.

Related to #145002
2025-07-31 14:14:00 +02:00
Florian Hahn
ab9b23c446
[SCEV] Use pattern match to check ZExt(Add()). (NFC)
Follow-up to
https://github.com/llvm/llvm-project/pull/151227#pullrequestreview-3074670031
to check the inner expression is an Add before calling getTruncateExpr.

Adds a new matcher that just matches and captures SCEVAddExpr, to
support matching a SCEVAddExpr with arbitrary number of operands.
2025-07-31 12:47:14 +01:00
Mel Chen
6752415ce8
[VectorUtils] Simplify the code by new function InterleaveGroup::isFull. nfc (#151112) 2025-07-31 16:02:53 +08:00
Florian Hahn
d74d841b65
[SECV] Try to push the op into ZExt: A + zext (-A + B) -> zext (B) (#151227)
Try to push the constant operand into a ZExt:
A + zext (-A + B) -> zext (B), if trunc (A) + -A + B does not
unsigned-wrap.

The actual code supports ZExts with arbitrary number of arguments, hence
the getAddExpr in the return.

This helps SCEV reasoning in some cases, commonly when adding an offset
to a zero-extended SCEV that subtracts the same offset.

Note that this is restricted to cases where we can fold away an operand
of the inner Add. This is needed to avoid bad interactions with patterns
when forming ZExts, which try to push to ZExt to add operands.

https://alive2.llvm.org/ce/z/q7d303

PR: https://github.com/llvm/llvm-project/pull/151227
2025-07-30 21:10:57 +01:00
Lewis Crawford
c5327b935b
[ConstantFolding] Fix typo in GetNVVMDenormMode (#151297)
Fix typo in function name of GetNVVMDenormMode
(Denrom vs Denorm).
2025-07-30 10:48:09 +01:00
Abhinav Garg
f527b319e3
[Uniformity Analysis] Fix print method to dump uniformity info (#151130) 2025-07-30 10:57:57 +05:30
Ramkumar Ramachandra
13366759c3
[VectorUtils] Trivially vectorize ldexp, [l]lround (#145545) 2025-07-29 19:23:09 +01:00
Paul Walker
1528ddbe76
[ConstantFolding][SVE] Do not fold fcmp of denormal without known mode. (#150614)
This is a follow on to
https://github.com/llvm/llvm-project/pull/115407 that introduced code
which bypasses the splat handling for scalable vectors. To maintain
existing tests I have moved the early return until after the splat
handling so all vector types are treated equally.
2025-07-29 12:37:59 +01:00
David Sherwood
6fbc397964
[IR] Add new CreateVectorInterleave interface (#150931)
This PR adds a new interface to IRBuilder called CreateVectorInterleave,
which can be used to create vector.interleave intrinsics of factors 2-8.

For convenience I have also moved getInterleaveIntrinsicID and
getDeinterleaveIntrinsicID from VectorUtils.cpp to Intrinsics.cpp where
it can be used by IRBuilder.
2025-07-29 08:47:07 +01:00
Shoreshen
a5deb59dfe
[AMDGPU] Add NoaliasAddrSpace to AAMDnodes (#149247)
This is the following PR of
https://github.com/llvm/llvm-project/pull/136553 which calculate
NoaliasAddrSpace.

This PR carries the info calculated into MIR by adding it into AAMDnodes
2025-07-29 10:10:06 +08:00
Kazu Hirata
c7cd1d0ae3
[Analysis] Remove an unnecessary cast (NFC) (#150838)
getOpcode() already returns Instruction::CastOps.
2025-07-27 10:43:30 -07:00