38786 Commits

Author SHA1 Message Date
Teresa Johnson
edb7f6c0da
[MemProf] Add more assertion checking to the edge removal helper (#125017)
Check a few unexpected cases (edge already removed, edge not in its
caller or callee edge lists).
2025-01-29 19:23:35 -08:00
Teresa Johnson
6c3bf34114
[MemProf] Fix summary identification for imported locals (#124659)
When we apply cloning decisions in the ThinLTO backend, we need to find
the corresponding summary for each function in the IR, and in some cases
for callee functions. This is complicated when the function was a
promoted local, in which case the GUID was formed from the hash of the
original source file prepended to the function name. Those functions
can be identified by the fact that they were given a ".llvm." suffix
during promotion.

We previously didn't do this correctly for promoted locals imported from
other modules, as we only tried the current module source name. This led
to crashes, in particular when the current module also had an local
function of the same original name. In particular, we were attempting to
iterate through the wrong summary's callsites, and there were fewer than
in the actual function so we accessed data off the end (in a release
build with assertion checking off - with assertion checking on we double
check the stack ids and that would have failed). Even if we hadn't
crashed or hit an assert, we could have applied the wrong cloning
decisions, leading to unsats at link time.

Luckily, function importing attaches thinlto_src_file metadata
containing the original source file name to all imported functions. It
normally doesn't do this by default, however, it always does if MemProf
context disambiguation is enabled. Therefore, we can just look to see if
the function contains this metadata and if so use it to recreate the
original GUID.

A similar issue can occur when looking for the ValueInfo / GUID of
a direct tail call to see if we synthesized a callsite record for a
missing tail call frame. In that case, the callee function may be a
declaration, if we imported its caller but not the callee function
definition. Because imported declarations don't get the thinlto_src_file
metadata, we instead look at its caller (which works because this
happens very early in the backend before any inlining).
2025-01-29 18:22:14 -08:00
Alex MacLean
de7438e472
[NVPTX] Auto-Upgrade some nvvm.annotations to attributes (#119261)
Add a new AutoUpgrade function to convert some legacy nvvm.annotations
metadata to function level attributes. These attributes are quicker to
look-up so improve compile time and are more idiomatic than using
metadata which should not include required information that changes the
meaning of the program.

Currently supported annotations are:

- !"kernel" -> ptx_kernel calling convention
- !"align" -> alignstack parameter attributes (return not yet supported)
2025-01-29 16:27:27 -08:00
vporpo
e094c0fa67
[SandboxVec][Legality] Don't vectorize when instructions repeat (#124479)
This patch adds a legality check that checks for repeated instrs in a
bundle and won't vectorize if such pattern is found.
2025-01-29 15:54:15 -08:00
Kazu Hirata
774b12c4a0
[memprof] Initialize AllocInfoIter and CallSitesIter (NFC) (#124972)
This patch initializes AllocInfoIter and CallSitesIter to their
respective end().  I'm doing this not because I'm worried about
uninitialized iterators, but because the resulting code looks shorter
and makes it clear which data structure each iterator is associated
with.
2025-01-29 14:31:00 -08:00
Teresa Johnson
8a86e6aefe
[MemProf] Constify a couple of methods used during cloning (#124994)
This also helps ensure we don't inadvartently create map entries
by forcing use of at() instead of operator[].
2025-01-29 14:18:11 -08:00
Simon Pilgrim
5921295dca
Revert "[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis." (#124962)
Reverts llvm/llvm-project#124129 as its currently causing a regression at #124499 - avoids the regression until a proper fix can be added to getSpillCost
2025-01-29 22:17:53 +00:00
Joel E. Denny
18f8106f31
[KernelInfo] Implement new LLVM IR pass for GPU code analysis (#102944)
This patch implements an LLVM IR pass, named kernel-info, that reports
various statistics for codes compiled for GPUs. The ultimate goal of
these statistics to help identify bad code patterns and ways to mitigate
them. The pass operates at the LLVM IR level so that it can, in theory,
support any LLVM-based compiler for programming languages supporting
GPUs. It has been tested so far with LLVM IR generated by Clang for
OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs.

By default, the pass runs at the end of LTO, and options like
``-Rpass=kernel-info`` enable its remarks. Example `opt` and `clang`
command lines appear in `llvm/docs/KernelInfo.rst`. Remarks include
summary statistics (e.g., total size of static allocas) and individual
occurrences (e.g., source location of each alloca). Examples of its
output appear in tests in `llvm/test/Analysis/KernelInfo`.
2025-01-29 12:40:19 -05:00
Nikita Popov
8a43d0e873 [Attributor] Check correct IRPosition in AANoCapture::isImpliedByIR()
This case is intended to check the callee argument, not the call-site.

Fixes an issue introduced in #123181.
2025-01-29 17:34:10 +01:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Alexey Bataev
4a1a697427
[SLP][NFC]Unify ScalarToTreeEntries and MultiNodeScalars, NFC
Currently, SLP has 2 distinct storages to manage mapping between
vectorized instructions and their corresponding vectorized TreeEntry
nodes. It leads to inefficient lookup for the matching TreeEntries and
makes it harder to correctly track instructions, associated with
multiple nodes.
There is a plan to extend this support for instructions, that require
scheduling, to allow support for copyable elements. Merging
ScalarToTreeEntry and MultiNodeScalars will allow reduce maintenance of
the feature

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/124914
2025-01-29 09:05:54 -05:00
Yingwei Zheng
cf37ae5cae
[InstCombine] Add one-use check when folding fabs over selects (#122270)
Fixes multi-use issue introduced by
https://github.com/llvm/llvm-project/pull/86390.
It allows the folding of `fabs (select Cond, TrueC, FalseC)` to avoid performance regression in ocio
2025-01-29 21:44:59 +08:00
Florian Hahn
2b55ef187c
[VPlan] Add helper to run VPlan passes, verify after run (NFC). (#123640)
Add new runPass helpers to run a VPlan transformation. This makes it
easier to add additional checks/functionality for each transform run. In
this patch, an option is added to run the verifier after each VPlan
transform.

Follow-ups will use the same helper to also support printing VPlans
after each transform.

Note that the verifier at the moment requires there to be a canonical IV
and vector loop region, so the final lowering transforms aren't run via
runPass yet.

PR: https://github.com/llvm/llvm-project/pull/123640
2025-01-29 10:50:01 +00:00
Ryotaro Kasuga
690f251063
[LoopInterchange] Handle LE and GE correctly (#124901)
LoopInterchange have converted `DVEntry::LE` and `DVEntry::GE` in
direction vectors to '<' and '>' respectively. This handling is
incorrect because the information about the '=' it lost. This leads to
miscompilation in some cases. To resolve this issue, convert them to '*'
instead.

Resolve #123920
2025-01-29 19:30:54 +09:00
Ryotaro Kasuga
89e767f127
[LoopIdiom] Move up atomic checks for memcpy/memmove (NFC) (#124535)
This patch moves up the checks that verify if it is legal to replace the
atomic load/store with memcpy. Currently these checks are done after we
determine to convert the load/store to memcpy/memmove, which makes the
logic a bit confusing.

This patch is a prelude to #50892
2025-01-29 19:21:45 +09:00
Fangrui Song
4c7aa6f983 [msan] Fix -Wunused-variable in non-assertion builds after #124421 2025-01-28 20:20:25 -08:00
Thurston Dang
fdadef9be3
[msan] Handle x86_avx512_(min|max)_p[sd]_512 intrinsics (#124421)
The AVX/SSE variants are already handled heuristically (maybeHandleSimpleNomemIntrinsic via handleUnknownIntrinsic), but the AVX512 variants contain an additional parameter (the rounding method) which fails to match heuristically. This patch generalizes maybeHandleSimpleNomemIntrinsic to allow additional flags (ignored by MSan) and explicitly call it to handle AVX512 min/max ps/pd intrinsics.

It also updates the test added in https://github.com/llvm/llvm-project/pull/123980
2025-01-28 19:12:44 -08:00
vporpo
79cbad188a
[SandboxVec] Clear Context's state within runOnFunction() (#124842)
`sandboxir::Context` is defined at a pass-level scope with the
`SandboxVectorizerPass` class because the function pass manager `FPM`
object depends on it, and that is in pass-level scope to avoid
recreating the pass pipeline every single time `runOnFunction()` is
called.

This means that the Context's state lives on across function passes. The
problem is twofold:
(i) the LLVM IR to Sandbox IR map can grow very large including objects
from different functions, which is of no use to the vectorizer, as it's
a function-level pass.
(ii) this can result in stale data in the LLVM IR to Sandbox IR object
map, as other passes may delete LLVM IR objects.

To fix both issues this patch introduces a `Context::clear()` function
that clears the `LLVMValueToValueMap`.
2025-01-28 18:28:08 -08:00
Thurston Dang
4a426079d6
[msan] Use horizontal add to compute shadow for horizontal sub (#124835)
This improves the horizontal sub handling (from
https://github.com/llvm/llvm-project/pull/124159), by always using
horizontal add for the shadow, as recommended by Vitaly.

Fixes https://github.com/llvm/llvm-project/issues/124662
2025-01-28 14:56:05 -08:00
Florian Hahn
6338bde568
[VPlan] Use cast<VPRecipeBase> in verifier (NFC).
All users of VPValue must be a VPRecipeBase, use cast.
2025-01-28 21:01:02 +00:00
Thurston Dang
7bd9c780e3
[msan][NFCI] Generalize handleIntrinsicByApplyingToShadow to allow alternative intrinsic for shadows (#124831)
https://github.com/llvm/llvm-project/pull/124159 uses
handleIntrinsicByApplyingToShadow for horizontal add/sub, but Vitaly
recommends always using the add version to avoid false negatives for
fully uninitialized data
(https://github.com/llvm/llvm-project/issues/124662).

This patch lays the groundwork by generalizing
handleIntrinsicByApplyingToShadow to allow using a different intrinsic
(of the same type as the original intrinsic) for the shadow. Planned
work will apply it to horizontal sub.
2025-01-28 12:35:07 -08:00
Thurston Dang
063db51cd4 Reapply "[msan] Add handlers for AVX masked load/store intrinsics (#123857)"
This reverts commit b9d301cc7e4fe4c442ec15169686fa4a18f5cdfc i.e.,
relands db79fb2a91df31a07f312f8e061936927ac5c506.

I had mistakenly thought this caused a buildbot breakage (the actual
culprit was my other patch,
https://github.com/llvm/llvm-project/pull/123980, which landed at the
same time) and thus had reverted it even though AFAIK it is not broken.
2025-01-28 18:11:44 +00:00
Alexey Bataev
947d8ebbf3
[SLP]Unify getNumberOfParts use
Adds getNumberOfParts and uses it instead of similar code across code
base, fixes analysis of non-vectorizable types in
computeMinimumValueSizes.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/124774
2025-01-28 12:16:44 -05:00
Ramkumar Ramachandra
d76ea250c8
Reland [InstCombine] Teach foldSelectOpOp about samesign (#124320)
Changes: There was a serious bug in the previous patch, leading to a
miscompile. See #122723 for the miscompile report from Alexander, and
the follow-up investigation by Nikita. The patch has since been
reworked, and now includes the testcase from the miscompile.

Follow up on 4a0d53a (PatternMatch: migrate to CmpPredicate) to get rid
of one of the FIXMEs it introduced by replacing a predicate comparison
with CmpPredicate::getMatching.

Co-authored-by: Nikita Popov <npopov@redhat.com>
2025-01-28 16:53:01 +00:00
Thurston Dang
ef92e6b99f
[BoundsChecking] Update ubsantrap to use GuardKind (#124613)
This change makes it consistent with other uses of ubsantrap.

This also updates the tests. Notably, BoundsChecking/runtimes.ll had guard=3 which passed only because the method of calculating the parameter (`IRB.GetInsertBlock()->getParent()->size()`) happened to give the same answer.
2025-01-28 08:52:31 -08:00
Alexey Bataev
a1ab5b4c87 [SLP]Check the MainOp matches the requirements for the instructions
Need to include MainOp into the analysis of the instructions in
getSameOpcode to be sure that it is checked for the requirements to
prevent crashes during further analysis.
2025-01-28 06:00:52 -08:00
Alexey Bataev
1d5fbe83c3 [SLP]Adjust NumberOfParts value for adjusted number of buildvector scalars
Need to adjust NumParts value, when GatheredScalars scalars are adjusted
after extractelements analysis, to fix compiler crash
2025-01-28 05:45:13 -08:00
Jeremy Morse
304a99091c [NFC][DebugInfo] Use iterators for insertion at some final callsites
These are the callsites that have materialised in the last three weeks
since I last built with deprecation warnings.
2025-01-28 11:37:11 +00:00
Nicholas Guy
cdea38f91a
Reland "[LoopVectorizer] Add support for chaining partial reductions #120272" (#124282)
Change `getScaledReduction` to take an existing vector, rather than
creating and returning a new one each call.
Rename `getScaledReduction` to `getScaledReductions` to more accurately
reflect what it's now doing.

---------

Co-authored-by: Karlo Basioli <68535415+basioli-k@users.noreply.github.com>
2025-01-28 10:40:35 +00:00
David Sherwood
0f61558b97
[LoopVectorize][NFC] Remove unused variable in addUsersInExitBlocks (#124553)
We were allocating a VPTypeAnalysis object on the stack,
but never using it for anything.
2025-01-28 09:11:27 +00:00
vporpo
334a1cdbfa
[SandboxIR] createFunction() should always create a function (#124665)
This patch removes the assertion that checks for an existing function.
If one exists it will remove it and create a new one. This helps remove
a crash when a function declaration object already exists and we are
about to create a SandboxIR object for the definition.
2025-01-27 20:16:30 -08:00
Thurston Dang
fa9ac62d02
[ubsan] Parse and use <cutoffs[0,1,2]=70000;cutoffs[5,6,8]=90000> in LowerAllowCheckPass (#124211)
This adds and utilizes a cutoffs parameter for LowerAllowCheckPass, via the Options parameter (introduced in https://github.com/llvm/llvm-project/pull/122994).

Future work will connect -fsanitize-skip-hot-cutoff (introduced patch in https://github.com/llvm/llvm-project/pull/121619) in the clang frontend to the cutoffs parameter used here.
2025-01-27 20:08:53 -08:00
Han-Kuan Chen
08d14e10ca
[SLP] Fix CommonMask will be transformed into an incorrect mask if createShuffle is called multiple times. (#124244)
We have two types of mask in SLP: a scalar mask and a vector mask.
When vectorizing four i32 additions into <4 x i32>, SLP creates a mask
of length 4.
When vectorizing four <2 x i32> additions into <8 x i32>, SLP also
creates a mask of length 4.
We refer to the first case as a scalar mask (because the mask element
represents a scalar, i32), and the second case as a vector mask (because
the mask element represents a vector, <4 x i32>).
At some point, we must convert the scalar mask into a vector mask
(otherwise, calling TTI cost functions or IRBuilderBase functions may
yield incorrect results).
Since both ShuffleCostEstimator and ShuffleInstructionBuilder can modify
the CommonMask, we have decided to perform the mask transformation only
within createShuffle. However, we do not store the transformed result,
as createShuffle may be called multiple times.
2025-01-28 12:02:37 +08:00
Joseph Huber
760a786d15
[Clang] Prevent mlink-builtin-bitcode from internalizing the RPC client (#118661)
Summary:
Currently, we only use `-mlink-builtin-bitcode` for non-LTO NVIDIA
compiliations. This has the problem that it will internalize the RPC
client symbol which needs to be visible to the host. To counteract that,
I put `retain` on it, but this also prevents optimizations on the global
itself, so the passes we have that remove the symbol don't work on
OpenMP anymore. This patch does the dumbest solution, adding a special
string check for it in clang. Not the best solution, the runner up would
be to have a clang attribute for `externally_initialized` because those
can't be internalized, but that might have some unfortunate
side-effects. Alternatively we could make NVIDIA compilations do LTO all
the time, but that would affect some users and it's harder than I
thought.
2025-01-27 19:30:59 -06:00
Florian Hahn
713482fccf [VPlan] Use State.get to extract lane mask for BranchOnMask.
Simplifies the code slightly and avoids redundant extracts/broadcasts if
the operand is live-in or already scalar.
2025-01-27 21:35:36 +00:00
Florian Hahn
ad9da92cf6
[LoopUnroll] Add RuntimeUnrollMultiExit to loop unroll options (NFC) (#124462)
Add an extra knob to RuntimeUnrollMultiExit to let backends control
whether to allow multi-exit unrolling on a per-loop basis.

This gives backends more fine-grained control on deciding if multi-exit
unrolling is profitable for a given loop and uarch. Similar to
4226e0a0c75.

PR: https://github.com/llvm/llvm-project/pull/124462
2025-01-27 21:20:04 +00:00
Jeremy Morse
285009f202 [NFC][DebugInfo] Rewrite more call-sites to insert with iterators (#124288)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators. The
call-sites updated in this patch are those where the dyn_cast_or_null cast
utility doesn't compose well with iterator insertion. It can distinguish
between nullptr and a "present" (non-null) Instruction pointer, but not
between a legal and illegal instruction iterator. This can lead to
end-iterator dereferences and thus crashes.

We can improve this in the future (as parent-pointers can now be accessed
from ilist nodes), but for the moment, add explicit tests for end()
iterators at the five call sites affected by this.
2025-01-27 20:30:45 +00:00
Kazu Hirata
e0c5a8553d
[memprof] Migrate away from PointerUnion::dyn_cast (NFC) (#124505)
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses cast
because we know which alternative to expect in the ternary expression.
2025-01-27 10:35:37 -08:00
Jeremy Morse
34b139594a
[NFC][DebugInfo] Switch more call-sites to using iterator-insertion (#124283)
To finalise the "RemoveDIs" work removing debug intrinsics, we're
updating call sites that insert instructions to use iterators instead.
This set of changes are those where it's not immediately obvious that
just calling getIterator to fetch an iterator is correct, and one or two
places where more than one line needs to change.

Overall the same rule holds though: iterators generated for the start of
a block such as getFirstNonPHIIt need to be passed into insert/move
methods without being unwrapped/rewrapped, everything else can use
getIterator.
2025-01-27 16:44:14 +00:00
Jeremy Morse
81d18ad864
[NFC][DebugInfo] Make some block-start-position methods return iterators (#124287)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators. A
number of these (such as getFirstNonPHIOrDbg) are sufficiently
infrequently used that we can just replace the pointer-returning version
with an iterator-returning version, hopefully without much/any
disruption.

Thus this patch has getFirstNonPHIOrDbg and
getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all
call-sites. There are no concerns about the iterators returned being
converted to Instruction*'s and losing the debug-info bit: because the
methods skip debug intrinsics, the iterator head bit is always false
anyway.
2025-01-27 16:27:54 +00:00
Florian Hahn
09a29fcc8d
[VPlan] Don't collect live-ins in collectUsersInExitBlocks. (NFC) (#123819)
Live-ins don't need to be handled, other than adding to the exit phi
recipe. Do that early and assert that otherwise the exit value is
defined in the vector loop region.

This should enable simply skipping other exit values that do not need
further fixing, e.g. if handling the exit value from the early exit
directly in handleUncountableEarlyExit.

PR: https://github.com/llvm/llvm-project/pull/123819
2025-01-27 16:12:07 +00:00
Nikita Popov
212f344b84 [InstCombine] Handle constant expression result in tryFactorization()
If IRBuilder folds the result to a constant expression, don't try
to set nowrap flags on it.

Fixes https://github.com/llvm/llvm-project/issues/124526.
2025-01-27 16:25:37 +01:00
Jeremy Morse
e14962a39c
[NFC][DebugInfo] Use iterators for instruction insertion in more places (#124291)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators.
This patch changes some more complex call-sites, those crossing file
boundaries and where I've had to perform some minor rewrites.
2025-01-27 15:25:17 +00:00
Alexey Bataev
f1d5e70a00 [SLP][NFC]Do not check poison values for corresponding vectorized entries
No need to check poison values if they have been vectorized and/or mark
them as vectorized, it should work only for instructions.
2025-01-27 06:38:23 -08:00
Thurston Dang
b9d301cc7e Revert "[msan] Add handlers for AVX masked load/store intrinsics (#123857)"
This reverts commit db79fb2a91df31a07f312f8e061936927ac5c506.

Reason: buildbot breakage
(https://lab.llvm.org/buildbot/#/builders/144/builds/16636/steps/6/logs/FAIL__LLVM__avx512-intrinsics-upgrade_ll)
2025-01-27 01:10:35 +00:00
Thurston Dang
db79fb2a91
[msan] Add handlers for AVX masked load/store intrinsics (#123857)
This patch adds explicit support for AVX masked load/store intrinsics,
largely by applying the intrinsics to the shadows (but subtly different
to handleIntrinsicByApplyingToShadow()).

We do not reuse the handleMaskedLoad/Store functions. The key challenge
is that the LLVM masked intrinsics require a vector of booleans, while
AVX masked intrinsics use the MSBs of a vector of integers.
X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad mentions that the x86
backend does not know how to efficiently convert from a vector of
booleans back into the AVX mask format; therefore, they (and we) do not
reduce AVX masked intrinsics into LLVM masked intrinsics.
2025-01-26 15:40:55 -08:00
Vasileios Porpodas
1c4341d176 [SandboxVec][DAG] Fix interval check without Node
This patch moves the check of whether a node exists before the check of
whether it is contained in the interval.
2025-01-26 11:54:09 -08:00
Florian Hahn
1395cd015f [VPlan] Support multi-exit loops in HCFG builder.
Update HCFG construction to support multi-exit loops. If there is no
unique exit block, map the middle block of the initial plan to the exit
block from the latch.

This further unifies HCFG construction and prepares for use to also
build an initial VPlan (VPlan0) for inner loops.

Effectively NFC as this isn't used on the default code path yet.
2025-01-25 21:55:15 +00:00
Fangrui Song
2131115be5
[InstCombine] Drop Range attribute when simplifying 'fshl' based on demanded bits (#124429)
When simplifying operands based on demanded bits, the return value range
of llvm.fshl might change. Keeping the Range attribute might cause
llvm.fshl to generate a poison and lead to miscompile. Drop the Range
attribute similar to `dropPosonGeneratingFlags` elsewhere.

Fix #124387
2025-01-25 13:35:11 -08:00
Vasileios Porpodas
b178c2d63e [SandboxVec][DAG] Fix trim schedule
Fix trimSchedule by skipping instructions without a DAG Node.
2025-01-25 09:42:14 -08:00