40664 Commits

Author SHA1 Message Date
Luke Lau
7250b66240
[VPlan] Create AVL as a phi from TC -> 0 with EVL tail folding (#151481)
This implements the first half of #151459, by changing the AVL so it's
no longer computed as `trip-count - EVL-based IV`, but instead a
separate scalar phi that is decremented by EVL each iteration.

This shortens the dependency chain for computing the AVL and should
eventually allow us to convert the branch condition to `branch-count
avl-next, 0`.

`simplifyBranchConditionForVFAndUF` had to be updated to prevent a
regression because this introduces a VPPhi in the header block.
2025-08-01 11:00:05 +08:00
Joel E. Denny
37e03b56b8
Revert "[PGO] Add llvm.loop.estimated_trip_count metadata" (#151585)
Reverts llvm/llvm-project#148758

[As
requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)
2025-07-31 15:56:31 -04:00
Joel E. Denny
a85c725952 Revert "[Utils] Fix a warning"
This reverts commit 3a18fe33f0763cd9276c99c276448412100f6270.

So that we can revert PR #148758.
2025-07-31 15:54:01 -04:00
shuffle2
7b5a44c605
[hwasan] Add hwasan-all-globals option (#149621)
hwasan-globals does not instrument globals with custom sections, because
existing code may use `__start_`/`__stop_` symbols to iterate over
globals in such a way which will cause hwasan assertions.

Introduce new hwasan-all-globals option, which instruments all
user-defined globals (but not those globals which are generated by the
hwasan instrumentation itself), including those with custom sections.

fixes #142442
2025-07-31 11:38:42 -07:00
Kazu Hirata
3a18fe33f0 [Utils] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Utils/LoopUtils.cpp:818:28: error: unused
  function 'operator<<' [-Werror,-Wunused-function]
2025-07-31 11:24:33 -07:00
Joel E. Denny
f7b65011de
[PGO] Add llvm.loop.estimated_trip_count metadata (#148758)
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.

An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
2025-07-31 12:28:25 -04:00
David Green
8f968fe3ec
[AggressiveInstCombine] Make cttz fold more resiliant to non-array geps (#150896)
Similar to #150639 this fixes the AggressiveInstCombine fold for convert
tables to cttz instructions if the gep types are not array types. i.e
`gep i16 @glob, i64 %idx` instead of `gep [64 x i16] @glob, i64 0, i64 %idx`.
2025-07-31 16:53:55 +01:00
Florian Hahn
99d70e09a9
[SCEV] Allow adds of constants in tryToReuseLCSSAPhi. (#150693)
Update the logic added in
https://github.com/llvm/llvm-project/pull/147824 to also allow adds of
constants. There are a number of cases where this can help remove
redundant phis and replace some computation with a ptrtoint (which
likely is free in the backend).

PR: https://github.com/llvm/llvm-project/pull/150693
2025-07-31 16:33:25 +01:00
Luke Lau
08c5944222
[VPlan] Fix header phi VPInstruction verification. NFC (#151472)
Noticed this when checking the invariant that all phis in the header
block must be header phis. I think there's a missing set of parentheses
here, since otherwise it only cast<VPInstruction> when RecipeI isn't a
VPInstruction.
2025-07-31 23:09:20 +08:00
Nikita Popov
a71909156e
[InstCombine] Set flags when canonicalizing GEP indices (#151516)
When truncating set nsw/nuw based on nusw/nuw. When extending, use zext
nneg if nusw+nuw.

Proof: https://alive2.llvm.org/ce/z/JA2Yzr
2025-07-31 15:58:04 +02:00
LU-JOHN
a757f23404
[SimplifyCFG] Extend jump-threading to allow live local defs (#135079)
Extend jump-threading to allow local defs that are live outside of the
threaded block. Allow threading to destinations where the local defs are
not live.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-07-31 09:44:14 -04:00
Samuel Tebbs
339b0a1d74 [LV][NFCI] Format fcc419b05f62 2025-07-31 14:37:59 +01:00
Samuel Tebbs
fcc419b05f [LV][NFCI] Swap reduction recipe operand order
https://github.com/llvm/llvm-project/pull/147026 will enable sub
reductions, which require that the phi value is the first operand since
they aren't commutative. This re-orders the operands when executing
reductions, which actually matches other existing code in
VPReductionRecipe::execute.
2025-07-31 14:35:10 +01:00
Nathan Gauër
67273393b1
[VectorCombine][TTI] Prevent extract/ins rewrite to GEP (#150216)
Using GEP to index into a vector is not disallowed, but not recommended.
The SPIR-V backend needs to generate structured access into types, which
is impossible with an untyped GEP instruction unless we add more info to
the IR. Finding a solution is a work-in-progress, but in the meantime,
we'd like to reduce the amount of failures.

Preventing this optimizations from rewritting extract/insert
instructions into a GEP helps us lower more code to SPIR-V. This change
should be OK as it's only active when targeting SPIR-V and disabling a
non-recommended transformation.

Related to #145002
2025-07-31 14:14:00 +02:00
Ramkumar Ramachandra
b7d00b827e
[VPlan] Uniformly use VPlanPatternMatch in transforms (NFC) (#151488) 2025-07-31 12:01:40 +01:00
Ramkumar Ramachandra
20f6ec4b29
[VPlan] Make VPBuilder APIs uniformly take ArrayRef (NFC) (#151484) 2025-07-31 11:33:04 +01:00
Nikita Popov
16d73839b1
[InstCombine] Support folding intrinsics into phis (#151115)
Call foldOpIntoPhi() for speculatable intrinsics. We already do this for
FoldOpIntoSelect().

Among other things, this partially subsumes
https://github.com/llvm/llvm-project/pull/149858.
2025-07-31 12:32:37 +02:00
Mel Chen
6752415ce8
[VectorUtils] Simplify the code by new function InterleaveGroup::isFull. nfc (#151112) 2025-07-31 16:02:53 +08:00
Benjamin Maxwell
cd16c706ba
[IRCE] Use function_ref<> instead of optional<function_ref<>> (NFC) (#151308)
llvm::function_ref<> is nullable.
2025-07-31 07:56:05 +01:00
Peter Collingbourne
ff38981a58
LTO: Redesign the CFI !aliases metadata.
With the current aliases metadata we lose information about which groups
of aliases survive symbol resolution. This causes various problems such
as #150075 where symbol resolution breaks the link between alias groups.

In this redesign of the aliases metadata, we stop representing the
individual aliases in !aliases. Instead, the individual aliases are
represented in !cfi.functions in the same way as functions, and the
alias groups (i.e. groups of symbols with the same address) are stored
in !aliases. At symbol resolution time, we filter out all non-prevailing
members of !aliases; the resulting set is used by LowerTypeTests to
recreate the aliases.

With this change it is now possible for a jump table entry to refer
to an alias in one of the ThinLTO object files (e.g. if a function is
non-prevailing but its alias is prevailing), so instead of deleting them,
rename them with the ".cfi" suffix.

Fixes #150070.

Fixes #150075.

Reviewers: teresajohnson, vitalybuka

Reviewed By: vitalybuka

Pull Request: https://github.com/llvm/llvm-project/pull/150690
2025-07-30 14:04:11 -07:00
zGoldthorpe
71d6762309
[InstCombine] Added pattern for recognising the construction of packed integers. (#147414)
This patch extends the instruction combiner to simplify the construction
of a packed scalar integer from a vector type, such as:
```llvm
target datalayout = "e"

define i32 @src(<4 x i8> %v) {
  %v.0 = extractelement <4 x i8> %v, i32 0
  %z.0 = zext i8 %v.0 to i32

  %v.1 = extractelement <4 x i8> %v, i32 1
  %z.1 = zext i8 %v.1 to i32
  %s.1 = shl i32 %z.1, 8
  %x.1 = or i32 %z.0, %s.1

  %v.2 = extractelement <4 x i8> %v, i32 2
  %z.2 = zext i8 %v.2 to i32
  %s.2 = shl i32 %z.2, 16
  %x.2 = or i32 %x.1, %s.2

  %v.3 = extractelement <4 x i8> %v, i32 3
  %z.3 = zext i8 %v.3 to i32
  %s.3 = shl i32 %z.3, 24
  %x.3 = or i32 %x.2, %s.3

  ret i32 %x.3
}

; ===============

define i32 @tgt(<4 x i8> %v) {
  %x.3 = bitcast <4 x i8> %v to i32
  ret i32 %x.3
}
```

Alive2 proofs (little-endian):
[YKdMeg](https://alive2.llvm.org/ce/z/YKdMeg)
Alive2 proofs (big-endian):
[vU6iKc](https://alive2.llvm.org/ce/z/vU6iKc)
2025-07-30 10:58:49 -06:00
Nikita Popov
385fe30ee0
[InstCombine] Strip trailing zero GEP indices (#151338)
Zero indices at the end do not change the GEP offset and can be removed.

(Doing the same at the start requires adjusting the source element
type.)
2025-07-30 17:55:00 +02:00
Nikita Popov
2672719a09
[InstCombine] Don't handle non-canonical index type in icmp of load fold (#151346)
We should just bail out and wait for it to be canonicalized. The current
implementation could emit a trunc without actually performing the
transform.
2025-07-30 17:52:08 +02:00
Thurston Dang
56944e606a
[msan] Approximately handle AVX Galois Field Affine Transformation (#150794)
e.g.,
      <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
      <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
      <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
       Out                                    A          x          b
where A and x are packed matrices, b is a vector, Out = A * x + b in
GF(2)

Multiplication in GF(2) is equivalent to bitwise AND. However, the
matrix computation also includes a parity calculation.

For the bitwise AND of bits V1 and V2, the exact shadow is:
Out_Shadow = (V1_Shadow & V2_Shadow) | (V1 & V2_Shadow) | (V1_Shadow &
V2)

We approximate the shadow of gf2p8affine using:
  Out_Shadow =   _mm512_gf2p8affine_epi64_epi8(x_Shadow, A_shadow, 0)
               | _mm512_gf2p8affine_epi64_epi8(x, A_shadow, 0)
               | _mm512_gf2p8affine_epi64_epi8(x_Shadow, A, 0)
               | _mm512_set1_epi8(b_Shadow)

This approximation has false negatives: if an intermediate dot-product
contains an even number of 1's, the parity is 0.

It has no false positives.

Updates the test from https://github.com/llvm/llvm-project/pull/149258
2025-07-30 08:06:50 -07:00
Kazu Hirata
8f9b01884d
[Coroutines] Remove a redundant call to std::unique_ptr<T>::get (NFC) (#151284) 2025-07-30 07:30:37 -07:00
Nikita Popov
8a09adc22a
[InstCombine] Split GEPs with multiple variable indices (#137297)
Split GEPs that have more than one variable index into two. This is in
preparation for the ptradd migration, which will not support multi-index
GEPs.

This also enables the split off part to be CSEd and LICMed.
2025-07-30 12:54:06 +02:00
Shih-Po Hung
cc8c941e17
[VPlan] Convert EVL loops to variable-length stepping after dissolution (#147222)
Loop regions require fixed-length steps and rounded-up trip counts, but
after dissolution creates explicit control flow, EVL loops can leverage
variable-length stepping with original trip counts.

This patch adds a post-dissolution transform pass to convert EVL loops
from fixed-length to variable-length stepping .
2025-07-30 16:50:57 +08:00
Luke Lau
b663e563cc
[VPlan] Fix header masks in EVL tail folding (#150202)
With EVL tail folding, the EVL may not always be VF on the
second-to-last iteration.

Recipes that have been converted to VP intrinsics via optimizeMaskToEVL
account for this, but recipes that are left behind will still use the
old header mask which may end up having a different vector length.

This is effectively the same as #95368, and fixes this by converting
header masks from icmp ule wide-canonical-iv, backedge-trip-count ->
icmp ult step-vector, evl. Without it, recipes that fall through
optimizeMaskToEVL may use the wrong vector length, e.g. in #150074 and
#149981.

We really need to split off optimizeMaskToEVL into
VPlanTransforms::optimize and move transformRecipestoEVLRecipes into
tryToBuildVPlanWithVPRecipes, so we don't mix up what is needed for
correctness and what is needed to optimize away the mask computations.
We should be able to still generate a correct albeit suboptimal VPlan
without running optimizeMaskToEVL. I've added a TODO for this, which I
think we can do after #148274

Fixes #150197
2025-07-30 11:31:04 +08:00
Florian Hahn
55f9eccee9
[LV] Revert back to use Loop::isLoopInvariant in isPredicatedInst. (#150828)
This partially reverts https://github.com/llvm/llvm-project/pull/140744,
restoring the original TheLoop->isLoopInvariant check instead the more
powerful Legal->isInvariant, which uses SCEV.

This causes a mis-compile, because SCEV can prove that the stored value
is loop-invariant, which in turn converts the store to a uniform store.
But in VPlan, we aren't yet able to determine that the stored value is
loop-invariant, so we extract the last lane, which is incorrect, because
it does not account for the mask of the store.

Restoring the original code is a safe fix and avoids this subtle
divergence.

Fixes https://github.com/llvm/llvm-project/issues/149347.

PR: https://github.com/llvm/llvm-project/pull/150828
2025-07-29 20:32:31 +01:00
Nikita Popov
1a974527bb [NewGVN] Slightly clean up the predicate swap handling (NFC)
I found the naming here confusing. This is not something generic
for intrinsics, it's specifically about predicates, and serves to
remember a previous swap choice.
2025-07-29 17:19:24 +02:00
Teresa Johnson
d4562a1991
[MemProf] Use DenseMap for call map (NFC) (#151161)
There is no reason to use std::map for the call maps maintained for
function clones during function clone assignment, as we don't iterate
over them and don't need deterministic ordering, so use the more
efficient DenseMap.
2025-07-29 08:18:31 -07:00
Nikita Popov
fa6965f722 [SCCP] Extract PredicateInfo handling into separate method (NFC) 2025-07-29 16:36:33 +02:00
Nikita Popov
74001beded [DSE] Use MemoryLocation API to get lifetime.end size (NFC) 2025-07-29 15:46:49 +02:00
Paul Walker
3ede2decbe
[LLVM][LV] Improve UF calculation for vscale based scalar loops. (#146102)
Update getSmallConstantTripCount() to return scalable ElementCount
values that is used to acurrately determine the maximum value for UF,
namely:

  TripCount / VF ==> X * VScale / Y * VScale ==> X / Y

This improves the chances of being able to remove the scalar loop and
also fixes an issue where a UF=2 is choosen for a scalar loop with
exactly VF(= X * VScale) iterations.
2025-07-29 12:49:38 +01:00
Nikita Popov
ef51514c38
[FunctionAttrs] Don't bail out on unknown calls (#150958)
When inferring attributes, we should not bail out early on unknown calls
(such as virtual calls), as we may still have call-site attributes that
can be used for inference.

Fixes https://github.com/llvm/llvm-project/issues/150817.
2025-07-29 11:45:31 +02:00
David Sherwood
6fbc397964
[IR] Add new CreateVectorInterleave interface (#150931)
This PR adds a new interface to IRBuilder called CreateVectorInterleave,
which can be used to create vector.interleave intrinsics of factors 2-8.

For convenience I have also moved getInterleaveIntrinsicID and
getDeinterleaveIntrinsicID from VectorUtils.cpp to Intrinsics.cpp where
it can be used by IRBuilder.
2025-07-29 08:47:07 +01:00
Kazu Hirata
255bba0136 [memprof] Fix a warning
This patch fixes:

  llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:4771:9:
  error: non-void lambda does not return a value in all control paths
  [-Werror,-Wreturn-type]
2025-07-28 19:35:02 -07:00
Teresa Johnson
f3761ab340
Reapply "[MemProf] Ensure all callsite clones are assigned a function clone" (#150856) (#151055)
This reverts commit 314e22bcab2b0f3d208708431a14215058f0718f, reapplying
PR150735 with a fix for the unstable iteration order exposed by the new
tests (PR151039).
2025-07-28 17:04:45 -07:00
Alex Voicu
6bcff9eb13
[HIPSTDPAR] Add handling for math builtins (#140158)
When compiling in `--hipstdpar` mode, the builtins corresponding to the
standard library might end up in code that is expected to execute on the
accelerator (e.g. by using the `std::` prefixed functions from
`<cmath>`). We do not have uniform handling for this in AMDGPU, and the
errors that obtain are quite arcane. Furthermore, the user-space changes
required to work around this tend to be rather intrusive.

This patch adds an additional `--hipstdpar` specific pass which forwards
to the run time component of HIPSTDPAR the intrinsics / libcalls which
result from the use of the math builtins, and which are not properly
handled. In the long run we will want to stop relying on this and handle
things in the compiler, but it is going to be a rather lengthy journey,
which makes this medium term escape hatch necessary.

The paired change in the run time component is here
<https://github.com/ROCm/rocThrust/pull/551>.
2025-07-28 22:29:31 +01:00
Teresa Johnson
ced3b90738
[MemProf] Change map to vector to avoid unstable iteration (#151039)
We iterate over a std::map indexed by FuncInfo, which is a pair of a
pointer and a clone number. In the ThinLTO case, this isn't an issue as
the function pointer always points to the same FunctionSummary object.
However, for regular LTO, this is a pointer to a Function object, which
is different for each clone. This will lead to unstable iteration order.

This was exposed in a test case added for PR150735, which added a new
instance of iteration over this map.

Since these function clones are added and numbered sequentially, change
this to a vector indexed by clone number, which points to a structure
containing the clone FuncInfo and the call map (the old map's key and
value, respectively).
2025-07-28 14:20:49 -07:00
Florian Hahn
c93d166c58
[VPlan] Simplify (MUL %x, 0) -> 0.
Simplify trivial multiplies.
https://alive2.llvm.org/ce/z/DabRkA
2025-07-28 21:50:57 +01:00
Ellis Hoag
819f020b28
Use F.hasOptSize() instead of checking optsize directly (#147348) 2025-07-28 08:38:52 -07:00
Florian Hahn
f9f68af4b8
[SCEV] Make sure LCSSA is preserved when re-using phi if needed.
If we insert a new add instruction, it may introduce a new use outside
the loop that contains the phi node we re-use. Use fixupLCSSAFormFor to
fix LCSSA form, if needed.

This fixes a crash reported in
https://github.com/llvm/llvm-project/pull/147824#issuecomment-3124670997.
2025-07-28 16:24:46 +01:00
Luke Lau
92d09245d6
[VPlan] Fall back to scalar epilogue if possible when EVL isn't legal (#150908)
When enabling predicated vectorization by default on RISC-V, there's a
bunch of performance regressions on llvm-test-suite's LoopInterleaving
microbenchmarks:
https://lnt.lukelau.me/db_default/v4/nts/788?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=791&baseline=730&submit=Update

Most of these regressions stem from the interleave_count pragma, which
causes EVL tail folding interleaving to be unsupported (since we don't
support unrolling with EVL)

Currently if DataWithEVL isn't legal we fall back to DataWithoutLaneMask
as the tail folding style, but this is very slow on RISC-V.

The order of performance roughly is something like:

DataWithEVL > None (scalar-epilogue) > Data[WithoutLaneMask]

So this patch tries to prevent the regressions by falling back to a
scalar epilogue where possible, i.e. the existing vectorization we have
today. Not we may still need to fall back to DataWithoutLaneMask, e.g.
if the trip count is low etc or it's forced by
-prefer-predicate-over-epilogue=predicate-dont-vectorize.
2025-07-28 20:10:36 +08:00
Florian Hahn
2f2df751d4
[LV] Use SCEV::getElementCount in selectEpilogueVectorizationFactor. (#150018)
Follow-up to https://github.com/llvm/llvm-project/pull/149789 to use
getElementCount to compute the remaining iterations in
selectEpilogueVectrizationFactor.

PR: https://github.com/llvm/llvm-project/pull/150018
2025-07-28 12:12:27 +01:00
Adar Dagan
1afb42bc10
[InstCombine] Let shrinkSplatShuffle act on vectors of different lengths (#148593)
shrinkSplatShuffle in InstCombine would only move truncs up through
shuffles if those shuffles inputs had the exact same type as their
output, this PR weakens this constraint to only requiring that the
scalar type of the input and output match.
2025-07-28 13:00:37 +02:00
Madhur Amilkanthwar
90de4a4ac9
[LoopFusion] Fix sink instructions (#147501)
If we have instructions in second loop's preheader which can be sunk, we
should also be adjusting PHI nodes to receive values from the fused loop's latch block.

Fixes #128600
2025-07-28 12:08:43 +05:30
Teresa Johnson
314e22bcab
Revert "[MemProf] Ensure all callsite clones are assigned a function clone" (#150856)
Reverts llvm/llvm-project#150735 due to bot failures that I need to
investigate
2025-07-27 15:55:22 -07:00
Teresa Johnson
0f2484a740
[MemProf] Ensure all callsite clones are assigned a function clone (#150735)
Fix a bug in function assignment where we were not assigning all
callsite clones to a function clone. This led to incorrect call updates
because multiple callsite clones could look like they were assigned to
the same function clone.

Add in a stat and debug message to help identify and debug cases where
this is still happening.
2025-07-27 11:48:30 -07:00
Florian Hahn
f8b1c7333f
[VPlan] Add getContext helper to VPlan (NFC). 2025-07-27 18:53:53 +01:00