30975 Commits

Author SHA1 Message Date
Alex Bradbury
8fcb1263f4
[PreISelIntrinsicLowering] Produce a memset_pattern16 libcall for llvm.experimental.memset.pattern when available (#120420)
This is to enable a transition of LoopIdiomRecognize to selecting the
llvm.experimental.memset.pattern intrinsic as requested in #118632 (as
opposed to supporting selection of the libcall or the intrinsic). As
such, although it _is_ a TODO to add costing considerations on whether
to lower to the libcall (when available) or expand directly, lacking
such logic is helpful at this stage in order to minimise any unexpected
code gen changes in this transition.
2025-01-30 07:12:53 +00:00
vporpo
e094c0fa67
[SandboxVec][Legality] Don't vectorize when instructions repeat (#124479)
This patch adds a legality check that checks for repeated instrs in a
bundle and won't vectorize if such pattern is found.
2025-01-29 15:54:15 -08:00
Simon Pilgrim
5921295dca
Revert "[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis." (#124962)
Reverts llvm/llvm-project#124129 as its currently causing a regression at #124499 - avoids the regression until a proper fix can be added to getSpillCost
2025-01-29 22:17:53 +00:00
Mikhail Gudim
1822462e2a
[InstCombine][VectorCombine][NFC] Move a test from InstCombine to (#124948)
VectorCombine

Since the transformation which is the subject of the
'fold-binop-of-reductions.ll` test will be in VectorCombine move the
test there.
2025-01-29 13:54:38 -05:00
Teresa Johnson
ae6d5dd58b
[MemProf] Prune unneeded non-cold contexts (#124823)
We can take advantage of the fact that we subsequently only clone cold
allocation contexts, since not cold behavior is the default, and
significantly reduce the amount of metadata (and later ThinLTO summary
and MemProfContextDisambiguation graph nodes) by pruning unnecessary not
cold contexts when building metadata from the trie.

Specifically, we only need to keep notcold contexts that overlap the
longest with cold allocations, to know how deeply to clone those
contexts to expose the cold allocation behavior.

For a large target this reduced ThinLTO bitcode object sizes by about
35%. It reduced the ThinLTO indexing time by about half and the peak
ThinLTO indexing memory by about 20%.
2025-01-29 10:38:31 -08:00
Simon Pilgrim
88e00141f8 [PhaseOrdering][X86] Add additional hadd/hsub test coverage
Add v16i16 coverage and "reverse order hadd/hsub" tests
2025-01-29 17:25:54 +00:00
Nikita Popov
8a43d0e873 [Attributor] Check correct IRPosition in AANoCapture::isImpliedByIR()
This case is intended to check the callee argument, not the call-site.

Fixes an issue introduced in #123181.
2025-01-29 17:34:10 +01:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Yingwei Zheng
f226cabbb1
[ValueTracking] Handle nonnull attributes at callsite (#124908)
Alive2: https://alive2.llvm.org/ce/z/yJfskv
Closes https://github.com/llvm/llvm-project/issues/124540.
2025-01-29 23:14:36 +08:00
Yingwei Zheng
cf37ae5cae
[InstCombine] Add one-use check when folding fabs over selects (#122270)
Fixes multi-use issue introduced by
https://github.com/llvm/llvm-project/pull/86390.
It allows the folding of `fabs (select Cond, TrueC, FalseC)` to avoid performance regression in ocio
2025-01-29 21:44:59 +08:00
Ryotaro Kasuga
690f251063
[LoopInterchange] Handle LE and GE correctly (#124901)
LoopInterchange have converted `DVEntry::LE` and `DVEntry::GE` in
direction vectors to '<' and '>' respectively. This handling is
incorrect because the information about the '=' it lost. This leads to
miscompilation in some cases. To resolve this issue, convert them to '*'
instead.

Resolve #123920
2025-01-29 19:30:54 +09:00
Simon Pilgrim
89ca3e72ca
[CostModel][X86] Reduce worst case v8i16/v16i8 SSE2 shuffle costs (#124789)
These were based off instruction count, not throughput - we can probably improve these further, but these throughput numbers match the worse expanded shuffles we see in the vector-shuffle-128-v* codegen tests.
2025-01-29 10:23:09 +00:00
David Sherwood
776ef9d1be
[LoopVectorize][NFC] Regenerate some early exit test CHECK lines (#124900) 2025-01-29 09:48:55 +00:00
David Sherwood
c836b8956d
[LoopVectorize][NFC] Disable output for tests that don't need it (#124747)
There are a lot of tests that do not depend upon the IR output
for validation, relying instead on the debug output. For these
tests we can add the -disable-output command line argument.
2025-01-29 08:09:50 +00:00
natedal
b0924ed64e
[AggressiveInstCombine] Add tests for memchr inline threshold (NFC) (#121711)
This adds a test checking that, if length=2, memchr is called. This is
undesirable as it would be faster to directly compare the two array
elements with the target element, rather than calling the external
memchr function.
2025-01-29 14:06:04 +08:00
Stephen Tozer
822f74a911
[Clang] Cleanup docs and comments relating to -fextend-variable-liveness (#124767)
This patch contains a number of changes relating to the above flag;
primarily it updates comment references to the old flag names,
"-fextend-lifetimes" and "-fextend-this-ptr" to refer to the new names,
"-fextend-variable-liveness[={all,this}]". These changes are all NFC.

This patch also removes the explicit -fextend-this-ptr-liveness flag
alias, and shortens the help-text for the main flag; these are both
changes that were meant to be applied in the initial PR (#110000), but
due to some user-error on my part they were not included in the merged
commit.
2025-01-28 18:25:32 +00:00
Alexey Bataev
947d8ebbf3
[SLP]Unify getNumberOfParts use
Adds getNumberOfParts and uses it instead of similar code across code
base, fixes analysis of non-vectorizable types in
computeMinimumValueSizes.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/124774
2025-01-28 12:16:44 -05:00
Ramkumar Ramachandra
d76ea250c8
Reland [InstCombine] Teach foldSelectOpOp about samesign (#124320)
Changes: There was a serious bug in the previous patch, leading to a
miscompile. See #122723 for the miscompile report from Alexander, and
the follow-up investigation by Nikita. The patch has since been
reworked, and now includes the testcase from the miscompile.

Follow up on 4a0d53a (PatternMatch: migrate to CmpPredicate) to get rid
of one of the FIXMEs it introduced by replacing a predicate comparison
with CmpPredicate::getMatching.

Co-authored-by: Nikita Popov <npopov@redhat.com>
2025-01-28 16:53:01 +00:00
Florian Hahn
3007f31e74 [LoopUnroll] Add AArch64 tests for multi-exit loop unrolling.
Test coverage to https://github.com/llvm/llvm-project/pull/124751.
2025-01-28 14:25:27 +00:00
Alexey Bataev
a1ab5b4c87 [SLP]Check the MainOp matches the requirements for the instructions
Need to include MainOp into the analysis of the instructions in
getSameOpcode to be sure that it is checked for the requirements to
prevent crashes during further analysis.
2025-01-28 06:00:52 -08:00
Alexey Bataev
1d5fbe83c3 [SLP]Adjust NumberOfParts value for adjusted number of buildvector scalars
Need to adjust NumParts value, when GatheredScalars scalars are adjusted
after extractelements analysis, to fix compiler crash
2025-01-28 05:45:13 -08:00
Nicholas Guy
cdea38f91a
Reland "[LoopVectorizer] Add support for chaining partial reductions #120272" (#124282)
Change `getScaledReduction` to take an existing vector, rather than
creating and returning a new one each call.
Rename `getScaledReduction` to `getScaledReductions` to more accurately
reflect what it's now doing.

---------

Co-authored-by: Karlo Basioli <68535415+basioli-k@users.noreply.github.com>
2025-01-28 10:40:35 +00:00
Thurston Dang
fa9ac62d02
[ubsan] Parse and use <cutoffs[0,1,2]=70000;cutoffs[5,6,8]=90000> in LowerAllowCheckPass (#124211)
This adds and utilizes a cutoffs parameter for LowerAllowCheckPass, via the Options parameter (introduced in https://github.com/llvm/llvm-project/pull/122994).

Future work will connect -fsanitize-skip-hot-cutoff (introduced patch in https://github.com/llvm/llvm-project/pull/121619) in the clang frontend to the cutoffs parameter used here.
2025-01-27 20:08:53 -08:00
Han-Kuan Chen
08d14e10ca
[SLP] Fix CommonMask will be transformed into an incorrect mask if createShuffle is called multiple times. (#124244)
We have two types of mask in SLP: a scalar mask and a vector mask.
When vectorizing four i32 additions into <4 x i32>, SLP creates a mask
of length 4.
When vectorizing four <2 x i32> additions into <8 x i32>, SLP also
creates a mask of length 4.
We refer to the first case as a scalar mask (because the mask element
represents a scalar, i32), and the second case as a vector mask (because
the mask element represents a vector, <4 x i32>).
At some point, we must convert the scalar mask into a vector mask
(otherwise, calling TTI cost functions or IRBuilderBase functions may
yield incorrect results).
Since both ShuffleCostEstimator and ShuffleInstructionBuilder can modify
the CommonMask, we have decided to perform the mask transformation only
within createShuffle. However, we do not store the transformed result,
as createShuffle may be called multiple times.
2025-01-28 12:02:37 +08:00
Florian Hahn
713482fccf [VPlan] Use State.get to extract lane mask for BranchOnMask.
Simplifies the code slightly and avoids redundant extracts/broadcasts if
the operand is live-in or already scalar.
2025-01-27 21:35:36 +00:00
Florian Hahn
09a29fcc8d
[VPlan] Don't collect live-ins in collectUsersInExitBlocks. (NFC) (#123819)
Live-ins don't need to be handled, other than adding to the exit phi
recipe. Do that early and assert that otherwise the exit value is
defined in the vector loop region.

This should enable simply skipping other exit values that do not need
further fixing, e.g. if handling the exit value from the early exit
directly in handleUncountableEarlyExit.

PR: https://github.com/llvm/llvm-project/pull/123819
2025-01-27 16:12:07 +00:00
Simon Pilgrim
1bb784a748 [LowerMatrixIntrinsics] multiply-minimal.ll - use -passes="..." to allow DOS to correctly evaluate the RUN command
Necessary for running update_test_checks.py on windows
2025-01-27 16:05:29 +00:00
Simon Pilgrim
ad2b2aa50b [PhaseOrdering] vector-trunc.ll - use -passes="default<O2>" to allow DOS to correctly evaluate the RUN command
Necessary for running update_test_checks.py on windows
2025-01-27 16:05:29 +00:00
Simon Pilgrim
178f47143a
[CostModel][X86] getShuffleCost - shuffles with only one defined element are always cheap (#124412)
If we're just moving a single element around inside a 128-bit lane (probably as an alternative to extracting it), we can assume this is cheap as a single PSRLDQ/PSHUFD/SHUFPS.

I've got the horrid feeling we're moving towards matching all SSE shuffle patterns inside the cost model, but I'm going to do my best to avoid this for now :|
2025-01-27 15:56:22 +00:00
Nikita Popov
212f344b84 [InstCombine] Handle constant expression result in tryFactorization()
If IRBuilder folds the result to a constant expression, don't try
to set nowrap flags on it.

Fixes https://github.com/llvm/llvm-project/issues/124526.
2025-01-27 16:25:37 +01:00
Ramkumar Ramachandra
3a4376b8f9
LAA: handle 0 return from getPtrStride correctly (#124539)
getPtrStride returns 0 when the PtrScev is loop-invariant, and this is
not an erroneous value: it returns std::nullopt to communicate that it
was not able to find a valid pointer stride. In analyzeLoop, we call
getPtrStride with a value_or(0) conflating the zero return value with
std::nullopt. Fix this, handling loop-invariant loads correctly.
2025-01-27 14:21:14 +00:00
David Sherwood
b7286dbef9
Reland "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop #96752" (#123616)
The last attempt failed a sanitiser build because we were
creating a reference to a null Predicates pointer in
isDereferenceableAndAlignedInLoop. This was exposed by
the unit test IsDerefReadOnlyLoop in
unittests/Analysis/LoadsTest.cpp. I fixed this by falling
back on getConstantMaxBackedgeTakenCount if Predicates is
null - see line 316 in llvm/lib/Analysis/Loads.cpp. There
are no other changes.
2025-01-27 11:59:38 +00:00
Andreas Jonson
f8ab91f74f [LVI][CVP] Add test for trunc bittest. (NFC) 2025-01-26 16:55:05 +01:00
Teresa Johnson
2af819fa3d
[MemProf] Add test for hot hints (#124394)
The change in PR124219 required removing one of the tests added for
-memprof-use-hot-hints, since we no longer label any contexts as hot in
metadata, so add a new test that checks the hot attribute instead.
2025-01-26 07:53:53 -08:00
Simon Pilgrim
dec47b76f4
[CostModel][X86] Update baseline CTTZ/CTLZ costs for x86_64 (#124312)
Followup to #123623 - now that the CMOV has been removed, the throughput has improved, reducing the benefit of vectorization on pre-x86-64-v3 CPUs
2025-01-26 14:43:51 +00:00
Florian Hahn
81d38da65e [LV] Add more tests for narrowing interleave groups for AArch64.
Add additional tests for
https://github.com/llvm/llvm-project/pull/106441.
2025-01-26 13:52:18 +00:00
Fangrui Song
2131115be5
[InstCombine] Drop Range attribute when simplifying 'fshl' based on demanded bits (#124429)
When simplifying operands based on demanded bits, the return value range
of llvm.fshl might change. Keeping the Range attribute might cause
llvm.fshl to generate a poison and lead to miscompile. Drop the Range
attribute similar to `dropPosonGeneratingFlags` elsewhere.

Fix #124387
2025-01-25 13:35:11 -08:00
Fangrui Song
89f2fee9f8 [InstCombine] Add test for incorrect retention of Range attribute in fshl 2025-01-25 13:17:15 -08:00
Alexey Bataev
5e65f43041 [SLP][NFC]Add a test, producing serie of extrtactelements, building non-extendable tree 2025-01-25 11:50:14 -08:00
Florian Hahn
6383a12e3b
[VPlan] Refactor HCFG builder to preserve original vector latch (NFC).
Update HCFG builder to preserve the original latch block of the initial
VPlan, ensuring there is always a latch.

It also skips creating the BranchOnCond for the latch of the top-level
loop, instead of removing it later. Exiting via the latch is controlled
by later recipes.

This further unifies HCFG construction and prepares for use to also
build an initial VPlan (VPlan0) for inner loops.
2025-01-25 13:32:01 +00:00
David Green
52bffdf9f5
[IPSCCP][FuncSpec] Protect against metadata access from call args. (#124284)
Fixes an issue reported from #114964, where metadata arguments were
attempted to be accessed as constants.
2025-01-25 10:59:50 +00:00
Alex MacLean
07ed8187ac
[OpenMP] Replace nvvm.annotation usage with kernel calling conventions (#122320)
Specifying a kernel with the `ptx_kernel` or `amdgpu_kernel` calling
convention is a more idiomatic and compile-time performant than using
the `nvvm.annoation !"kernel"` metadata.

Transition OMPIRBuilder to use calling conventions for PTX kernels and
no longer emit `nvvm.annoation`. Update OpenMPOpt to work with kernels
specified via calling convention as well as metadata. Update OpenMP
tests to use the calling conventions.
2025-01-24 16:56:10 -08:00
Teresa Johnson
c725a95e08
[MemProf] Convert Hot contexts to NotCold early (#124219)
While we convert hot contexts to notcold contexts during the cloning
step, their existence was greatly limiting the context trimming
performed when we add the MemProf profile to the IR. To address this,
any hot contexts are converted to notcold contexts immediately after
first checking for unambiguous allocation types, and before checking it
again and before adding metadata while performing context trimming.

Note that hot hints are now disabled by default, however, this avoids
adding unnecessary overhead if they are re-enabled.
2025-01-24 15:58:13 -08:00
vporpo
6409799bdc
[SandboxVec][Legality] Pack from different BBs (#124363)
When the inputs of the pack come from different BBs we need to make sure
we emit the pack instructions at the correct place.
2025-01-24 15:39:37 -08:00
vporpo
ac75d32280
[SandboxVec][VecUtils] Filter out instructions not in BB in VecUtils:getLowest() (#124360)
This patch changes the functionality of `VecUtils::getLowest(Vals, BB)`
such that it filters out any instructions in `Vals` that are not in BB.
This is useful when Vals contains instructions from different BBs,
because in that case we are only interested in one BB.
2025-01-24 14:52:57 -08:00
Teresa Johnson
ae8b560899
[MemProf] Disable hot hints by default (#124338)
By default we were marking some contexts as hot, and adding hot hints to
unambiguously hot allocations. However, there is not yet support for
cloning to expose hot allocation contexts, and none is planned for the
forseeable future.

While we convert hot contexts to notcold contexts during the cloning
step, their existence was greatly limiting the context trimming
performed when we add the MemProf profile to the IR. This change simply
disables the generation of hot contexts / hints by default, as few
allocations were unambiguously hot.

A subsequent change will address the issue when hot hints are optionally
enabled. See PR124219 for details.

This change resulted in significant overhead reductions for a large
target:
~48% reduction in the per-module ThinLTO bitcode summary sizes
~72% reduction in the distributed ThinLTO bitcode combined summary sizes
~68% reduction in thin link time
~34% reduction in thin link peak memory
2025-01-24 13:06:11 -08:00
Alexandros Lamprineas
1b1270f30b
[FMV][GlobalOpt] Enable static resolution of non-FMV callers. (#124314)
The undetectable FMV features predres and ls64 have been removed,
therefore the optimization is now re-enabled. The llvm testsuite
Graviton4 bots are expected to remain green.
2025-01-24 19:48:40 +00:00
Stephen Long
ab976a1712
PreISelIntrinsicLowering: Lower llvm.exp/llvm.exp2 to a loop if scalable vec arg (#117568) 2025-01-24 14:02:06 -05:00
Simon Pilgrim
a12d7e4b61
[SLP] getVectorCallCosts - don't provide scalar argument data for vector IntrinsicCostAttributes (#124254)
getVectorCallCosts determines the cost of a vector intrinsic, based off
an existing scalar intrinsic call - but we were including the scalar
argument data to the IntrinsicCostAttributes, which meant that not only
was the cost calculation not type-only based, it was making incorrect
assumptions about constant values etc.

This also exposed an issue that x86 relied on fallback calculations for
funnel shift costs - this is great when we have the argument data as
that improves the accuracy of uniform shift amounts etc., but meant that
type-only costs would default to Cost=2 for all custom lowered funnel
shifts, which was far too cheap.

This is the reverse of #124129 where we weren't including argument data
when we could.

Fixes #63980
2025-01-24 15:13:13 +00:00
Simon Pilgrim
625e0a40f1 [SLP][X86] Add missing SSE2/SSE4 checks from vector rotate tests 2025-01-24 10:12:19 +00:00