41823 Commits

Author SHA1 Message Date
Nikita Popov
a22d1b5d43
[ConstantInt] Add ImplicitTrunc parameter to getSigned() (NFC) (#172875)
For consistency with `ConstantInt::get()`, add an ImplicitTrunc
parameter to `ConstantInt::getSigned()` as well. It currently defaults
to true and will be flipped to false in the future (by #171456).
2025-12-19 09:48:26 +01:00
Alex MacLean
a40f444265
[NVPTX] Add support for barrier.cta.red.* instructions (#172541)
This change adds full support for the ptx `barrier.cta.red` instruction,
following the same conventions as are already used for
`barrier.cta.sync` and `barrier.cta.arrive`.

In addition this MR removes the following intrinsics which are no longer
needed:
* llvm.nvvm.barrier0.popc -->
  llvm.nvvm.barrier.cta.red.popc.aligned.all(0, c)
* llvm.nvvm.barrier0.and -->
  llvm.nvvm.barrier.cta.red.and.aligned.all(0, z)
* llvm.nvvm.barrier0.or -->
  llvm.nvvm.barrier.cta.red.or.aligned.all(0, z)
2025-12-18 18:06:27 -08:00
Stefan Schmidt
759fb0a224
[llvm][LLD][COFF] Add fat-lto-object support for COFF targets (#165529)
This adds support for FatLTO to COFF targets in clang and lld.

The changes are adapted from
610fc5cbcc
and
14e3bec8fc
but much smaller because it just needed the COFF-specific parts wired
in, and I tried my best to adapt the pre-existing ELF tests for the COFF
version.

My main goal is to be able to use this for shipping pre-built
https://github.com/XboxDev/nxdk container images someday, which uses the
`i386-pc-win32` target.
2025-12-18 22:53:25 +02:00
Mel Chen
f196b1d66f
[VPlan] Extract reverse operation for reverse accesses (#146525)
This patch introduces VPInstruction::Reverse and extracts the reverse
operations of loaded/stored values from reverse memory accesses. This
extraction facilitates future support for permutation elimination within
VPlan.
2025-12-18 14:57:48 +00:00
Nikita Popov
e957c81750 [InstCombine] Use getSigned() for negative number in shift transform
Fixes the issue reported at:
https://github.com/llvm/llvm-project/pull/171456#issuecomment-3668263635
2025-12-18 11:00:39 +01:00
Simon Pilgrim
24d9550b27
[VectorCombine] foldShuffleOfBinops - if both operands are the same don't duplicate the total new cost (#172719)
If we're shuffling/concatenating the same operands then ensure we don't
duplicate the total cost, ensure we reuse the final shuffle and
recognise that we reduce the total instruction count (so fold even when
NewCost == OldCost, not just NewCost < OldCost).
2025-12-18 07:03:06 +00:00
Florian Hahn
9cc1585b13
[VPlan] Add VPBlockUtils::transferSuccessors (NFCI).
Add a new helper to transfer successors to a new, unconnected VPBB.
Helps to simplify existing code, and prepare for upcoming changes.
2025-12-17 22:48:22 +00:00
Florian Hahn
bab0dc4d48
Reapply "[LV] Mark checks as never succeeding for high cost cutoff."
Reapply 8a115b6934a90441 with an update to tests handling remarks.

The patch now directly emits a clear remark when we bail out
due to the memory check threshold.

Original message:
When GeneratedRTChecks::create bails out due to exceeding the cost
threshold, no runtime checks are generated and we must not proceed
assuming checks have been generated.

Mark the checks as never succeeding, to make sure we don't try to
vectorize assuming the runtime checks hold. This fixes a case where we
previously incorrectly vectorized assuming runtime checks had been
generated when forcing vectorization via metadate.

Fixes the mis-compile mentioned in
https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588
2025-12-17 20:21:49 +00:00
Miloš Poletanović
44a52ea8be
[InstCombine] Fix unsafe PHINode cast and simplify logic in PointerReplacer (#172332)
Fixes #171883.

Basically, if the operand of the phi is an Instruction but it's not
available, the [condition
](1847a4efae/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp (L300))would
just break, and when we reach the[ deferral
check](1847a4efae/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp (L313)),
execution would continue even though there is a non-Instruction operand,
leading to a crash in the [subsequent processing
loop](1847a4efae/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp (L320)).
2025-12-17 12:07:40 +00:00
Nikita Popov
dea9ec84a4 [SLSR] Allow implicit truncation for element size
Ideally we'd reject too large types in the IR verifier, but for now
we should follow the usual sext-or-trunc GEP semantics here.
2025-12-17 12:56:23 +01:00
Florian Hahn
eb0c7e752f
[VPlan] Replace BranchOnCount with Compare + BranchOnCond (NFC). (#172181)
Expand BranchOnCount to BranchOnCond + ICmp in convertToConcreteRecipes
to simplify codegen.

PR: https://github.com/llvm/llvm-project/pull/172181
2025-12-16 19:19:31 +00:00
Ramkumar Ramachandra
1c6e5b2d04
[LV] Improve code using VPlan::get{ConstantInt,True} (NFC) (#172471) 2025-12-16 13:03:43 +00:00
Nikita Popov
447c96363a [SimplifyLibCalls] Avoid implicit truncation in convertStrToInt()
This addresses two implicit truncation issues:
 * For the signed case, pass AsSigned.
 * For the negated unsigned case, truncate explicitly for clarity.
2025-12-16 09:33:47 +01:00
Luke Lau
67d0e21a62
Reapply "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846)" (#172261)
This reapplies #171846 with a test case and fix for a legacy cost-model
mismatch assertion.

In the previous version of the patch, we only considered the plan to
contain simplifications when it had a VPBlendRecipe and VF.isScalar()
was true.

However for some VPlans we may have a blend with only the first lane
used:

    BLEND ir<%phi> = ir<%foo.res> ir<%bar.res>/ir<%c>
    CLONE ir<%gep> = getelementptr ir<%p>, ir<%phi>
    vp<%5> = vector-pointer ir<%gep>

And in the legacy cost model we cost a blend as a phi if it's uniform:

// If we know that this instruction will remain uniform, check the cost
of
    // the scalar version.
    if (isUniformAfterVectorization(I, VF))
      VF = ElementCount::getFixed(1);

So this replaces the VF.isScalar() check with
vputils::onlyFirstLaneUsed, which matches how the VPlan cost model
mirrored the legacy model beforehand.

A VPInstruction::Select will also emit a scalar select for a vector VF
if only the first lane is used, so this also updates
VPBlendRecipe::computeCost to reflect that too.
2025-12-16 06:30:54 +00:00
Elvis Wang
1eba2cbe72
[LV] Convert uniform-address unmasked scatters to scalar store. (#166114)
This patch optimizes vector scatters that have a uniform (single-scalar)
address by replacing them with "extract-last-lane + scalar store" when
the scatter is unmasked.

Notes:

- The legacy cost model can scalarize a store if both the address and
the value are uniform. In VPlan we materialize the stored value via
ExtractLastLane, so only the address must be uniform.
- Some of the loops won't be vectorized any sine no vector instructions
will be generated.
2025-12-16 12:24:22 +08:00
Florian Hahn
83eea87a36
[VPlan] Create header phis once, after constructing VPlan0 (NFC). (#168291)
Together with https://github.com/llvm/llvm-project/pull/168289 &
https://github.com/llvm/llvm-project/pull/166099 we can construct header
phis once up front, after creating VPlan0, as the
induction/reduction/first-order-recurrence classification applies across
all VFs.

Depends on https://github.com/llvm/llvm-project/pull/168289 &
https://github.com/llvm/llvm-project/pull/166099 

PR: https://github.com/llvm/llvm-project/pull/168291
2025-12-15 22:12:10 +00:00
Florian Hahn
dbb4f5c2dd
[VPlan] Set VF scale factor in tryToCreatePartialReduction (NFCI).
Split off unrelated change from approved
https://github.com/llvm/llvm-project/pull/168291/ to land separately as
suggested.
2025-12-15 21:18:07 +00:00
Teresa Johnson
4b78647754
[MemProf] Add CalleeGUIDs from profile to existing VP metadata (#171495)
Previously, we only synthesized VP metadata with the callee GUIDs from
the memprof profile if no VP metadata already existed (i.e. from PGO).
With this change we will add in any that are not already in the VP
metadata, also with count 1.
2025-12-15 12:19:56 -08:00
Matt Arsenault
cbb2aa9b2d
InstCombine: Replace some isa<FPMathOperator> with dyn_cast (#172356)
This isa and get flag pattern is essentially an abstracted
isa and dyn_cast, so make this more direct.
2025-12-15 20:10:29 +00:00
Nicolai Hähnle
88bd56597c
VectorCombine: Improve the insert/extract fold in the narrowing case (#168820)
Keeping the extracted element in a natural position in the narrowed
vector has two beneficial effects:

1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which
allows the insert/extract fold to trigger.
2. It makes the narrowing shuffles in a chain of extract/insert
compatible, which allows foldLengthChangingShuffles to successfully
recognize a chain that can be folded.

There are minor X86 test changes that look reasonable to me. The IR
change for AVX2 in
llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
doesn't change the assembly generated by `llc -mtriple=x86_64--
-mattr=AVX2`
at all.
2025-12-15 11:25:51 -08:00
Teresa Johnson
e3c621c50b
[ThinLTO][MemProf] Add option to override max ICP with larger number (#171652)
Adds an option -module-summary-max-indirect-edges, and wiring into the
ICP logic that collects promotion candidates from VP metadata, to
support a larger number of promotion candidates for use in building the
ThinLTO summary. Also use this in the MemProf ThinLTO backend handling
where we perform memprof ICP during cloning.

The new option, essentially off by default, can be used to override the
value of -icp-max-prom, which is checked internally in ICP, with a
larger max value when collecting candidates from the VP metadata.

For MemProf in particular, where we synthesize new VP metadata targets
from allocation contexts, which may not be all that frequent, we need to
be able to include a larger set of these targets in the summary in order
to correctly handle indirect calls in the contexts. Otherwise we will
not set up the callsite graph edges correctly.
2025-12-15 10:16:06 -08:00
Alexey Bataev
b988555812 [SLP]Check if the extractelement is part of other buildvector node before marking for erasing
Need to check if the extractelement instruction is part of other
buildvector node, before trying to mark it for the deletion, otherwise
the compiler may reuse the deleted instruction.

Fixes #172221
2025-12-15 09:54:05 -08:00
Matt Arsenault
463c9f08be
InstCombine: Stop using m_c_BinOp for non-commutative ops (#172327)
The previous flow tried both m_BinOp and m_c_BinOp for noncommutative
ops. Seems to have worked out OK though, since there are no test
changes.
2025-12-15 17:57:53 +01:00
Nikita Popov
015ab4e2e4 [Reassociate] Allow implicit truncation when converting adds to mul
It's okay if the number of adds overflows. Explicitly allow implicit
truncation.
2025-12-15 15:44:03 +01:00
Nikita Popov
42a47bf18a [WPD] Avoid implicit truncation when creating full set
Use the bit mask for the type instead of `~0`, so that we don't
rely on implicit truncation of the top bits.
2025-12-15 15:44:03 +01:00
Nikita Popov
818c9138f9 [SimplifyCFG] Use getSigned() for signed value
Base is a sized quantity derived via getSExtValue(), so we should
use getSigned().
2025-12-15 15:44:03 +01:00
Bala_Bhuvan_Varma
0b2fe07e6b
[VectorCombine] Prevent redundant cost computation for repeated operand pairs in foldShuffleOfIntrinsics (#171965)
This pr resolves [#170867](https://github.com/llvm/llvm-project/issues/170867)

Existing code recomputes the cost for creating a shuffle instruction even for the
repeating Intrinsic operand pairs. This will result in higher newCost.
Hence the runtime will decide not to fold.

The change proposed in this pr will address this issue. When calculating
the newCost we are skipping the cost calculation of an operand pair if
it was already considered. And when creating the transformed code, we
are reusing the already created shuffle instruction for repeated operand
pair.
2025-12-15 14:42:41 +00:00
int-zjt
72f3995363
[CodeExtractor] Optimize PHI incoming value removal using removeIncomingValueIf() (NFC) (#171956) 2025-12-15 20:00:54 +08:00
int-zjt
c9c46a0820
[CloneFunction] Optimize PHI incoming value removal using reverse iteration (NFC) (#171955) 2025-12-15 20:00:25 +08:00
Ramkumar Ramachandra
0636225b93
[VPlan] Directly unroll VectorPointerRecipe (#168886)
In an effort to get rid of VPUnrollPartAccessor and directly unroll
recipes, start by directly unrolling VectorPointerRecipe, allowing for
VPlan-based simplifications and simplification of the corresponding
execute.
2025-12-15 10:54:06 +00:00
Florian Hahn
bcbbe2c2bc
[VPlan] Pass backedge value directly to FOR and reduction phis (NFC).
Pass backedge values directly to VPFirstOrderRecurrencePHIRecipe and
VPReductionPHIRecipe directly, as they must be provided and availbale.

Split off from https://github.com/llvm/llvm-project/pull/168291.
2025-12-14 20:59:22 +00:00
Florian Hahn
53cf22f3a1
[VPlan] Simplify live-ins early using SCEV. (#155304)
Use SCEV to simplify all live-ins during VPlan0 construction. This
enables us to remove special SCEV queries when constructing
VPWidenRecipes and improves results in some cases.

This leads to simplifications in a number of cases in real-world
applications (~250 files changed across LLVM, SPEC, ffmpeg)

PR: https://github.com/llvm/llvm-project/pull/155304
2025-12-14 20:15:05 +00:00
int-zjt
fd95803a35
[LoopRotate] Simplify PHINode::removeIncomingValue usage (NFC) (#171958) 2025-12-14 09:43:52 +08:00
Luke Lau
4ea8157773 Revert "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846)"
This reverts commit fd5f53aa9b21060063484fc6c346316a34a6464c.

It's triggering legacy cost model assertions reported in
https://github.com/llvm/llvm-project/pull/171846#issuecomment-3647640019
2025-12-13 20:05:34 +08:00
Nicolai Hähnle
54ae1222ef
VectorCombine: Fold chains of shuffles fed by length-changing shuffles (#168819)
Such chains can arise from folding insert/extract chains.
2025-12-12 13:53:03 -08:00
Florian Hahn
e6e3f94b5c
[VPlan] Re-add clarifying comment regarding part to extract. (NFC)
Re-add and emphasize comment regarding extracting from the last part, as
suggested post-commit in https://github.com/llvm/llvm-project/pull/171145.
2025-12-12 21:51:33 +00:00
Florian Hahn
333ee931df
[LV] Update stale comment after 4e05d702f02a. (NFC)
Address post-commit suggestion, update stale comment after 4e05d702f.
2025-12-12 21:36:56 +00:00
Florian Hahn
0171e881b5
[VPlan] Strip stray whitespace when printing VPWidenIntOrFpInduction.
printFlags takes care of inserting the needed spaces, remove unneeded
extra stray whitespace
2025-12-12 21:28:50 +00:00
Alireza Torabian
9bc38df587
[LoopFusion] Simplifying the legality checks (#171889)
Considering that the current loop fusion only supports adjacent loops,
we are able to simplify the checks in this pass. By removing
`isControlFlowEquivalent` check, this patch fixes multiple issues
including #166560, #166535, #165031, #80301 and #168263.

Now only the sequential/adjacent candidates are collected in the same
list. This patch is the implementation of approach 2 discussed in post
#171207.
2025-12-12 15:09:34 -05:00
Seraphimt
112a6126ef
Fixes non-functional changes found static analyzer (#171197)
As per @arsenm 's instructions, I've separated the non-functional
changes from https://github.com/llvm/llvm-project/pull/169958.
Afterwards I'll tackle the functional ones one by one. I hope I did
everything right this time.

Full descriptions in the article:
https://pvs-studio.com/en/blog/posts/cpp/1318/
3. Array overrun is possible.
The PVS-Studio warning: V557 Array overrun is possible. The value of
'regIdx' index could reach 31. VEAsmParser.cpp 696
10. Excessive check.
The PVS-Studio warning: V547 Expression 'IsLeaf' is always false.
PPCInstrInfo.cpp 419
11. Doubling the same check.
The PVS-Studio warning: V581 The conditional expressions of the 'if'
statements situated alongside each other are identical. Check lines:
5820, 5823. PPCInstrInfo.cpp 5823
15. Excessive check.
The PVS-Studio warning: V547 Expression 'i != e' is always true.
MachineFunction.cpp 1444
17. Excessive assignment.
The PVS-Studio warning: V1048 The 'FirstOp' variable was assigned the
same value. MachineInstr.cpp 1995
18. Excessive check.
The PVS-Studio warning: V547 Expression 'AllSame' is always true.
SimplifyCFG.cpp 1914
19. Excessive check.
The PVS-Studio warning: V547 Expression 'AbbrevDecl' is always true.
LVDWARFReader.cpp 398
2025-12-12 20:03:02 +01:00
Craig Topper
ef21740781
[LoopPeel] Check for onlyAccessesInaccessibleMemory instead of llvm.assume in peelToTurnInvariantLoadsDereferenceable. (#171910)
onlyAccessesInaccessibleMemory can't alias with a load. This allows us
to ignore more intrinsics than llvm.assume.

Follow up from #171547
2025-12-12 10:45:41 -08:00
Mircea Trofin
ff3dcd06a9
[GlobalOpt][profcheck] Mark as unknown the branch weights of global shrunk to boolean (#171530) 2025-12-12 08:34:11 -08:00
Matt Arsenault
6e47d4ef45
Reapply "InstCombine: Fold ldexp with constant exponent to fmul" (#171895) (#171977) 2025-12-12 12:55:55 +01:00
Nikita Popov
89c37fee25 [WPD] Use getSigned() for offset
This offset is a signed int64_t which can take negative values.
2025-12-12 11:15:44 +01:00
Peter Collingbourne
b0d3405578
SROA: Recognize llvm.protected.field.ptr intrinsics.
When an alloc slice's users include llvm.protected.field.ptr intrinsics
and their discriminators are consistent, drop the intrinsics in order
to avoid unnecessary pointer sign and auth operations.

Reviewers: nikic

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/151650
2025-12-11 18:22:05 -08:00
Florian Hahn
65deac0872
[VPlan] Remove vector type checking in inferScalartType (NFC).
inferScalarTypeForRecipe always infers a scalar type, so BaseTy must be
a scalar type. Remove unneeded cast.
2025-12-11 22:10:31 +00:00
Florian Hahn
4e05d702f0
[LV] Always include middle block cost in isOutsideLoopWorkProfitable. (#171102)
Always include the cost of the middle block in
isOutsideLoopWorkProfitable. This addresses the TODO from
https://github.com/llvm/llvm-project/pull/168949 and removes the
temporary restriction.

isOutsideLoopWorkProfitable already scales the cost outside loops
according the expected trip counts.

In practice this increases the minimum iteration threshold in a few
cases. On a large IR corpus based on C/C++ workloads, ~50 out of 179450
vector loops have their thresholds increased slightly.


PR: https://github.com/llvm/llvm-project/pull/171102
2025-12-11 21:41:47 +00:00
Matt Arsenault
757c5b3bc7
Revert "InstCombine: Fold ldexp with constant exponent to fmul" (#171895)
Reverts llvm/llvm-project#171731

Fails on a libc test
2025-12-11 21:12:59 +00:00
Teresa Johnson
75cd29b6d6
[MemProf] Add option to emit full call context for matched allocations (#170516)
Add the -memprof-print-matched-alloc-stack option to enable emitting the
full allocation call context (of stack ids) for each matched allocation
reported by -memprof-print-match-info. Noop when the latter is not
enabled.
2025-12-11 10:43:53 -08:00
Matt Arsenault
5eb2ec2179
InstCombine: Fold ldexp with constant exponent to fmul (#171731)
If we can represent this with an fmul, prefer it as a canonical
form. More optimizations will understand fmul, and allows contract to
fma.
2025-12-11 19:20:45 +01:00