39366 Commits

Author SHA1 Message Date
Florian Hahn
2bdc1a1337
[LV] Use frozen start value for FindLastIV if needed. (#132691)
FindLastIV introduces multiple uses of the start value, where in the
original source there was only a single use, when the epilogue is
vectorized.

Each use of undef may produce a different result, so introducing
multiple uses can produce incorrect results when the input is
undef/poison.

If the start value may be undef or poison, freeze it and use the frozen
value, which will be the same at all uses.

See the following scenarios in Alive2:
* Both main and epilogue vector loops execute, go to exit block: https://alive2.llvm.org/ce/z/_TSvRr
* Both main and epilogue vector loops execute, go to scalar loop: https://alive2.llvm.org/ce/z/CsPj5v
* Only epilogue vector loop executes, go to exit block: https://alive2.llvm.org/ce/z/5XqkNV
* Only epilogue vector loop executes, go to scalar loop: https://alive2.llvm.org/ce/z/JUpqRN

The latter 2 show requiring freezing the resume phi. That means we cannot freeze 
in the preheader. We could move the freeze to the main iteration count check, but
that would be a bit fragile to find and other transforms can sink the freeze if needed.


Depends on https://github.com/llvm/llvm-project/pull/132689
and https://github.com/llvm/llvm-project/pull/132690.

Fixes https://github.com/llvm/llvm-project/issues/126836

PR: https://github.com/llvm/llvm-project/pull/132691
2025-04-04 11:48:01 +01:00
Florian Hahn
a4573ee38d
[LoopUnroll] UnrollRuntimeMultiExit takes precedence over TTI. (#134259)
Update UnrollRuntimeLoopRemainder to always give priority to the
UnrollRuntimeMultiExit option, if provided.

After ad9da92cf6f7357 (https://github.com/llvm/llvm-project/pull/124462),
we would ignore the option if the backend indicates multi-exit is profitable.
This means it cannot be used to disable runtime unrolling.

To be consistent with canProfitablyRuntimeUnrollMultiExitLoop, always
respect the option.

This surfaced while discussing https://github.com/llvm/llvm-project/pull/131998.

PR: https://github.com/llvm/llvm-project/pull/134259
2025-04-04 10:16:50 +01:00
Tobias Stadler
1302610f03
[MergeFunc] Fix crash caused by bitcasting ArrayType (#133259)
createCast in MergeFunctions did not consider ArrayTypes, which results
in the creation of a bitcast between ArrayTypes in the thunk function,
leading to an assertion failure in the provided test case.

The version of createCast in GlobalMergeFunctions does handle
ArrayTypes, so this common code has been factored out into the
IRBuilder.
2025-04-04 10:16:40 +01:00
Mircea Trofin
4532512f6c
[ctxprof] Move MoveSymbolGUID to address dependency issues (#134334)
See PR #134192
2025-04-03 19:02:46 -07:00
Mircea Trofin
2146826169
[ctxprof] Support for "move" semantics for the contextual root (#134192)
This PR finishes what PR #133992 started.
2025-04-03 18:36:45 -07:00
Alex MacLean
ba0a52a04b
[InferAS] Support getAssumedAddrSpace for Arguments for NVPTX (#133991) 2025-04-03 16:47:36 -07:00
Florian Hahn
cdff7f0b6e
[LV] Retrieve middle VPBB via scalar ph to fix epilogue resumephis (NFC)
If ScalarPH has predecessors, we may need to update its reduction resume
values. If there is a middle block, it must be the first predecessor.
Note that the first predecessor may not be the middle block, if the
middle block doesn't branch to the scalar preheader. In that case,
fixReductionScalarResumeWhenVectorizingEpilog will be a no-op.

In preparation for https://github.com/llvm/llvm-project/pull/106748.
2025-04-03 21:46:48 +01:00
Mircea Trofin
61768b3528
[ctxprof] Don't import roots elsewhere (#134012)
Block a context root from being imported by its callers. 

Suppose that happened. Its caller - usually a message pump - inlines its copy of the root. Then it (the root) and whatever it calls will be the non-contextually optimized callee versions.
2025-04-03 13:21:39 -07:00
Alexey Bataev
daab7d0807 [SLP]Initial support for (masked)loads + compress and (masked)interleaved
Added initial support for (masked)loads + compress and
(masked)interleaved loads.

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/132099
2025-04-03 13:17:40 -07:00
Alexey Bataev
7c4013d591 Revert "[SLP]Initial support for (masked)loads + compress and (masked)interleaved"
This reverts commit 0bec0f5c059af5f920fe22ecda469b666b5971b0 to fix
a crash reported in https://lab.llvm.org/buildbot/#/builders/143/builds/6668.
2025-04-03 12:58:49 -07:00
Alexey Bataev
0bec0f5c05
[SLP]Initial support for (masked)loads + compress and (masked)interleaved
Added initial support for (masked)loads + compress and
(masked)interleaved loads.

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/132099
2025-04-03 13:21:22 -04:00
Matt Arsenault
a54736afd5
CloneFunction: Do not delete blocks with address taken (#134209)
If a block with a single predecessor also had its address taken,
it was getting deleted in this post-inline cleanup step. This would
result in the blockaddress in the resulting function getting deleted
and replaced with inttoptr 1.

This fixes one bug required to permit inlining of functions with blockaddress 
uses.

At the moment this is not testable (at least without an annoyingly complex
unit test),  and is a pre-bug fix for future patches. Functions with
blockaddress uses are rejected in isInlineViable, so we don't get this far
with the current InlineFunction uses (some of the existing cases seem to
reproduce this part of the rejection logic, like PartialInliner). This
will be tested in a pending llvm-reduce change.

Prerequisite for #38908
2025-04-03 23:52:25 +07:00
gbMattN
61ef286506
Fix signed/unsigned mismatch warning (#134255) 2025-04-03 15:56:33 +01:00
gbMattN
59074a3760
[ASan] Add metadata to renamed instructions so ASan doesn't use the i… (#119387)
…ncorrect name

Clang needs variables to be represented with unique names. This means
that if a variable shadows another, its given a different name
internally to ensure it has a unique name. If ASan tries to use this
name when printing an error, it will print the modified unique name,
rather than the variable's source code name

Fixes #47326
2025-04-03 15:27:14 +01:00
Ramkumar Ramachandra
6bbdc70066
[LV] Use getCallWideningDecision in more places (NFC) (#134236) 2025-04-03 14:53:19 +01:00
Camsyn
ecc35456d7
[Utils] Fix incorrect LCSSA PHI nodes when splitting critical edges with MergeIdenticalEdges (#131744)
This PR fixes incorrect LCSSA PHI node generation when splitting
critical edges with both
`PreserveLCSSA` and `MergeIdenticalEdges` enabled. The bug caused PHI
nodes in the split block
to miss predecessors when multiple identical edges were merged.
2025-04-03 12:02:03 +02:00
Yingwei Zheng
73e1710a4d
[SimplifyCFG] Remove unused variable. NFC. (#134211) 2025-04-03 15:22:51 +08:00
Ryotaro Kasuga
91f3965be4
[LoopInterchange] Fix the vectorizable check for a loop (#133667)
In the profitability check for vectorization, the dependency matrix was
not handled correctly. This can result to make a wrong decision: It may
say "this loop can be vectorized" when in fact it cannot. The root cause
of this is that the check process early returns when it finds '=' or 'I'
in the dependency matrix. To make sure that we can actually vectorize
the loop, we need to check all the rows of the matrix. This patch fixes
the process of checking whether we can vectorize the loop or not. Now it
won't make a wrong decision for a loop that cannot be vectorized.

Related: #131130
2025-04-03 16:21:19 +09:00
Yingwei Zheng
b6c0ce0bb6
[IR][NFC] Use SwitchInst::defaultDestUnreachable (#134199) 2025-04-03 14:47:47 +08:00
Snehasish Kumar
7f2abe8fd1
Revert "[Metadata] Preserve MD_prof when merging instructions when one is missing." (#134200)
Reverts llvm/llvm-project#132433

I suspect this change caused a failure in the bolt build bot.
https://lab.llvm.org/buildbot/#/builders/113/builds/6621

```
!9185 = !{!"branch_weights", i32 3912, i32 802}
Wrong number of operands
!9185 = !{!"branch_weights", i32 3912, i32 802}
fatal error: error in backend: Broken module found, compilation aborted!
```
2025-04-02 22:11:17 -07:00
Mircea Trofin
d59b2c4def
[ctxprof][nfc] Make computeImportForFunction a member of ModuleImportsManager (#134011) 2025-04-02 18:18:17 -07:00
Mircea Trofin
02467f9e21
[ctxprof] Option to move a whole tree to its own module (#133992)
Modules may contain a mix of functions that participate or don't participate in callgraphs covered by a contextual profile. We currently have been importing all the functions under a context root in the module defining that root, but if the other functions there are covered by flat profiles, the result is difficult to reason about.

This patch allows moving everything under a context root (and that root) in its own module. For now, we expect a module with a filename matching the GUID of the function be present in the set of modules known by the linker. This mechanism can be improved in a later patch.

Subsequent patches will handle implementing "move" instead of "import" semantics for the root function (because we want to make sure only one version of the root exists - so the optimizations we perform are actually the ones being observed at runtime).
2025-04-02 18:15:48 -07:00
Matt Arsenault
7559c64c5e
CloneModule: Map global initializers after mapping the function (#134082) 2025-04-03 07:17:12 +07:00
Florian Hahn
380defd4b3
[VPlan] Update VPInterleaveRecipe to take debug loc directly as arg (NFC) 2025-04-02 22:46:38 +01:00
Florian Hahn
4b67c53e20
[VPlan] Use recipe debug loc instead of instr DLs in more cases (NFC)
Update both VPInterleaveRecipe and VPReplicateRecipe codegen to use
debug location directly from the recipe, not the underlying instruction.
This removes another dependency on underlying instructions.
2025-04-02 21:51:17 +01:00
vporpo
a1b0b4997e
[SandboxVec][NFC] Replace std::regex with llvm::Regex (#134110) 2025-04-02 13:46:56 -07:00
Krzysztof Drewniak
554859c736
[TTI] Make isLegalMasked{Load,Store} take an address space (#134006)
In order to facilitate targets that only support masked loads/stores
on certain address spaces (AMDGPU will support them in an upcoming
patch, but only for address space 7), add an AddressSpace parameter
to isLegalMaskedLoad and isLegalMaskedStore
2025-04-02 15:38:10 -05:00
Florian Hahn
3bdf9a0880
[EquivalenceClasses] Use SmallVector for deterministic iteration order. (#134075)
Currently iterators over EquivalenceClasses will iterate over std::set,
which guarantees the order specified by the comperator. Unfortunately in
many cases, EquivalenceClasses are used with pointers, so iterating over
std::set of pointers will not be deterministic across runs.

There are multiple places that explicitly try to sort the equivalence
classes before using them to try to get a deterministic order
(LowerTypeTests, SplitModule), but there are others that do not at the
moment and this can result at least in non-determinstic value naming in
Float2Int.

This patch updates EquivalenceClasses to keep track of all members via a
extra SmallVector and removes code from LowerTypeTests and SplitModule
to sort the classes before processing.

Overall it looks like compile-time slightly decreases in most cases, but
close to noise:

https://llvm-compile-time-tracker.com/compare.php?from=7d441d9892295a6eb8aaf481e1715f039f6f224f&to=b0c2ac67a88d3ef86987e2f82115ea0170675a17&stat=instructions

PR: https://github.com/llvm/llvm-project/pull/134075
2025-04-02 20:27:43 +01:00
Alexey Bataev
843ef77dc2 [SLP]Update mapping between values and their matching entries upon selection
Need to update the mapping between gathered values and their matching
entries, if the list of the entries is updated and only some of them are
selected for final shuffling.

Fixes #134085
2025-04-02 11:59:32 -07:00
Snehasish Kumar
c18994c7cd
[Metadata] Preserve MD_prof when merging instructions when one is missing. (#132433)
Preserve branch weight metadata when merging instructions if one of the
instructions is missing metadata. This is similar in behaviour to what
we do today for other types of metadata such as mmra, memprof and
callsite metadata.
2025-04-02 11:13:45 -06:00
Snehasish Kumar
dde0be9d97
[Metadata] Handle memprof, callsite merging when one is missing. (#132106)
For memprof and callsite metadata we want to pick one deterministically
and keep that even if one of them may be missing.
2025-04-02 11:10:02 -06:00
Alexey Bataev
48a4b14cb6 [SLP]Fix whole vector registers calculations for compares
Need to check that the calculated number of the elements is not larger
than the original number of scalars to prevent a compiler crash.

Fixes #134013
2025-04-02 07:26:40 -07:00
Yingwei Zheng
65ed35393c
[IR] Add helper CmpPredicate::dropSameSign (#134071)
Address review comment
https://github.com/llvm/llvm-project/pull/133711#discussion_r2024519641
2025-04-02 22:25:01 +08:00
Han-Kuan Chen
5bbcc765cc
[SLP][REVEC] getNumElements should not be used as VF when REVEC is enabled. (#134031) 2025-04-02 19:04:07 +08:00
Luke Lau
8107b430ed
[VPlan] Simplify select c, x, x -> x (#133731)
As noted in 1a9358c090d0507be21c5e9b2d97a23ef1de8ab0, some
simplifications can produce a redundant select where the true and false
operands are the same, which this patch removes.

The is_fpclass test was changed so the condition wasn't made dead.
2025-04-02 10:26:48 +01:00
Ryotaro Kasuga
528e408b94
[LoopInterchange] Add an option to control the cost heuristics applied (#133664)
LoopInterchange has several heuristic functions to determine if
exchanging two loops is profitable or not. Whether or not to use each
heuristic and the order in which to use them were fixed, but #125830
allows them to be changed internally at will. This patch adds a new
option to control them via the compiler option.

The previous patch also added an option to prioritize the vectorization
heuristic. This patch also removes it to avoid conflicts between it and
the newly introduced one, e.g., both
`-loop-interchange-prioritize-vectorization=1` and
`-loop-interchange-profitabilities='cache,vectorization'` are specified.
2025-04-02 15:41:40 +09:00
Alexey Bataev
0e3049c562
[SLP]Support revectorization of the previously vectorized scalars
If the scalar instructions is marked for the vectorization in the tree,
it cannot be vectorized as part of the another node in the same tree, in
general. It may prevent some potentially profitable vectorization
opportunities, since some nodes end up being buildvector/gather nodes,
which add to the total cost.
Patch allows revectorization of the previously vectorized scalars.

Reviewers: hiraditya, RKSimon

Reviewed By: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/133091
2025-04-01 14:30:06 -04:00
Matt Arsenault
7e25b24073
IRNormalizer: Replace cl::opts with pass parameters (#133874)
Not sure why the "fold-all" option naming didn't match the
variable "FoldPreOutputs", but I've preserved the difference.

More annoyingly, the pass name "normalize" does not match the pass
name IRNormalizer and should probably be fixed one way or the other.

Also the existing test coverage for the flags is lacking. I've added
a test that shows they parse, but we should have tests that they
do something.
2025-04-01 23:27:20 +07:00
Jeremy Morse
1ebc308bba
[DebugInfo][RemoveDIs] Remove debug-intrinsic printing cmdline options (#131855)
During the transition from debug intrinsics to debug records, we used
several different command line options to customise handling: the
printing of debug records to bitcode and textual could be independent of
how the debug-info was represented inside a module, whether the
autoupgrader ran could be customised. This was all valuable during
development, but now that totally removing debug intrinsics is coming
up, this patch removes those options in favour of a single flag
(experimental-debuginfo-iterators), which enables autoupgrade, in-memory
debug records, and debug record printing to bitcode and textual IR.

We need to do this ahead of removing the
experimental-debuginfo-iterators flag, to reduce the amount of
test-juggling that happens at that time.

There are quite a number of weird test behaviours related to this --
some of which I simply delete in this commit. Things like
print-non-instruction-debug-info.ll , the test suite now checks for
debug records in all tests, and we don't want to check we can print as
intrinsics. Or the update_test_checks tests -- these are duplicated with
write-experimental-debuginfo=false to ensure file writing for intrinsics
is correct, but that's something we're imminently going to delete.

A short survey of curious test changes:
* free-intrinsics.ll: we don't need to test that debug-info is a zero
cost intrinsic, because we won't be using intrinsics in the future.
* undef-dbg-val.ll: apparently we pinned this to non-RemoveDIs in-memory
mode while we sorted something out; it works now either way.
* salvage-cast-debug-info.ll: was testing intrinsics-in-memory get
salvaged, isn't necessary now
* localize-constexpr-debuginfo.ll: was producing "dead metadata"
intrinsics for optimised-out variable values, dbg-records takes the
(correct) representation of poison/undef as an operand. Looks like we
didn't update this in the past to avoid spurious test differences.
* Transforms/Scalarizer/dbginfo.ll: this test was explicitly testing
that debug-info affected codegen, and we deferred updating the tests
until now. This is just one of those silent gnochange issues that get
fixed by RemoveDIs.

Finally: I've added a bitcode test, dbg-intrinsics-autoupgrade.ll.bc,
that checks we can autoupgrade debug intrinsics that are in bitcode into
the new debug records.
2025-04-01 14:27:11 +01:00
Florian Hahn
9e5bfbf77d
[EquivalenceClasses] Update member_begin to take ECValue (NFC).
Remove a level of indirection and update code to use range-based for
loops.
2025-04-01 09:28:46 +01:00
Florian Hahn
64d493f987
[EquivalenceClasses] Return ECValue directly from insert (NFC).
Removes a redundant lookup in the mapping.:
2025-04-01 08:45:46 +01:00
Ningning Shi(史宁宁)
6b647de031
[NFC] Remove the unused hasMinSize() (#133838)
The 'hasOptSize()' is 'hasFnAttribute(Attribute::OptimizeForSize) ||
hasMinSize()', so we don't need another 'hasMinSize()'.
2025-04-01 15:23:34 +08:00
Alexey Bataev
cf6a452cc7
[SLP]Fix same/alternate analysis in split node analysis for compares
getSameOpcode in some cases may consider 2 compares as having same
opcode, even though previously they were considered as alternate. It may
happen, because getSameOpcode looses info about previous instructions
and their states. Need to use isAlternateInstruction function instead
for the correct analysis.

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/133769
2025-03-31 19:33:40 -04:00
Luke Lau
6afe5e5d1a
[LV][EVL] Peek through combination tail-folded + predicated masks (#133430)
If a recipe was predicated and tail folded at the same time, it will
have a mask like

    EMIT vp<%header-mask> = icmp ule canonical-iv, backedge-tc
    EMIT vp<%mask> = logical-and vp<%header-mask>, vp<%pred-mask>

When converting to an EVL recipe, if the mask isn't exactly just the
header-mask we copy the whole logical-and.
We can remove this redundant logical-and (because it's now covered by
EVL) and just use vp<%pred-mask> instead.

This lets us remove the widened canonical IV in more places.
2025-03-31 21:28:39 +01:00
Luke Lau
b739a3cb65
[VPlan] Add m_Deferred. NFC (#133736)
This copies over the implementation of m_Deferred which allows matching
values that were bound in the pattern, and uses it for the (X && Y) ||
(X && !Y) -> X simplifcation.
2025-03-31 21:01:28 +01:00
Alexey Bataev
bfd8cc0a3e [SLP]Fix a check for the whole register use
Need to check the value type, not the return type, of the instructions,
when doing the analysis for the whole register use to prevent a compiler
crash.

Fixes #133751
2025-03-31 10:52:12 -07:00
Rahul Joshi
74b7abf154
[IRBuilder] Add new overload for CreateIntrinsic (#131942)
Add a new `CreateIntrinsic` overload with no `Types`, useful for
creating calls to non-overloaded intrinsics that don't need additional
mangling.
2025-03-31 08:10:34 -07:00
Alexey Bataev
78777a204a
[LV]Split store-load forward distance analysis from other checks, NFC (#121156)
The patch splits the store-load forwarding distance analysis from other
dependency analysis in LAA. Currently it supports only power-of-2
distances, required to support non-power-of-2 distances in future.

Part of #100755
2025-03-31 07:28:44 -04:00
Florian Hahn
809f857d2c
[VPlan] Support early-exit loops in optimizeForVFAndUF. (#131539)
Update optimizeForVFAndUF to support early-exit loops by handling
BranchOnCond(Or(..., CanonicalIV == TripCount)) via SCEV

PR: https://github.com/llvm/llvm-project/pull/131539
2025-03-31 07:55:48 +01:00
Kazu Hirata
2fc08d4c31
[Vectorize] Use DenseMap::insert_range (NFC) (#133656) 2025-03-30 22:57:45 -07:00