30961 Commits

Author SHA1 Message Date
Sam Tebbs
8be3fc0f5c [AArch64] Disallow vscale x 1 partial reductions (#125252)
We don't want to allow partial reductions resulting in a vscale x 1 type
as we can't lower it in the backend.

(cherry picked from commit c7995a6905f2320f280013454676f992a8c6f89f)
2025-02-05 14:37:39 +00:00
Stephen Tozer
822f74a911
[Clang] Cleanup docs and comments relating to -fextend-variable-liveness (#124767)
This patch contains a number of changes relating to the above flag;
primarily it updates comment references to the old flag names,
"-fextend-lifetimes" and "-fextend-this-ptr" to refer to the new names,
"-fextend-variable-liveness[={all,this}]". These changes are all NFC.

This patch also removes the explicit -fextend-this-ptr-liveness flag
alias, and shortens the help-text for the main flag; these are both
changes that were meant to be applied in the initial PR (#110000), but
due to some user-error on my part they were not included in the merged
commit.
2025-01-28 18:25:32 +00:00
Alexey Bataev
947d8ebbf3
[SLP]Unify getNumberOfParts use
Adds getNumberOfParts and uses it instead of similar code across code
base, fixes analysis of non-vectorizable types in
computeMinimumValueSizes.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/124774
2025-01-28 12:16:44 -05:00
Ramkumar Ramachandra
d76ea250c8
Reland [InstCombine] Teach foldSelectOpOp about samesign (#124320)
Changes: There was a serious bug in the previous patch, leading to a
miscompile. See #122723 for the miscompile report from Alexander, and
the follow-up investigation by Nikita. The patch has since been
reworked, and now includes the testcase from the miscompile.

Follow up on 4a0d53a (PatternMatch: migrate to CmpPredicate) to get rid
of one of the FIXMEs it introduced by replacing a predicate comparison
with CmpPredicate::getMatching.

Co-authored-by: Nikita Popov <npopov@redhat.com>
2025-01-28 16:53:01 +00:00
Florian Hahn
3007f31e74 [LoopUnroll] Add AArch64 tests for multi-exit loop unrolling.
Test coverage to https://github.com/llvm/llvm-project/pull/124751.
2025-01-28 14:25:27 +00:00
Alexey Bataev
a1ab5b4c87 [SLP]Check the MainOp matches the requirements for the instructions
Need to include MainOp into the analysis of the instructions in
getSameOpcode to be sure that it is checked for the requirements to
prevent crashes during further analysis.
2025-01-28 06:00:52 -08:00
Alexey Bataev
1d5fbe83c3 [SLP]Adjust NumberOfParts value for adjusted number of buildvector scalars
Need to adjust NumParts value, when GatheredScalars scalars are adjusted
after extractelements analysis, to fix compiler crash
2025-01-28 05:45:13 -08:00
Nicholas Guy
cdea38f91a
Reland "[LoopVectorizer] Add support for chaining partial reductions #120272" (#124282)
Change `getScaledReduction` to take an existing vector, rather than
creating and returning a new one each call.
Rename `getScaledReduction` to `getScaledReductions` to more accurately
reflect what it's now doing.

---------

Co-authored-by: Karlo Basioli <68535415+basioli-k@users.noreply.github.com>
2025-01-28 10:40:35 +00:00
Thurston Dang
fa9ac62d02
[ubsan] Parse and use <cutoffs[0,1,2]=70000;cutoffs[5,6,8]=90000> in LowerAllowCheckPass (#124211)
This adds and utilizes a cutoffs parameter for LowerAllowCheckPass, via the Options parameter (introduced in https://github.com/llvm/llvm-project/pull/122994).

Future work will connect -fsanitize-skip-hot-cutoff (introduced patch in https://github.com/llvm/llvm-project/pull/121619) in the clang frontend to the cutoffs parameter used here.
2025-01-27 20:08:53 -08:00
Han-Kuan Chen
08d14e10ca
[SLP] Fix CommonMask will be transformed into an incorrect mask if createShuffle is called multiple times. (#124244)
We have two types of mask in SLP: a scalar mask and a vector mask.
When vectorizing four i32 additions into <4 x i32>, SLP creates a mask
of length 4.
When vectorizing four <2 x i32> additions into <8 x i32>, SLP also
creates a mask of length 4.
We refer to the first case as a scalar mask (because the mask element
represents a scalar, i32), and the second case as a vector mask (because
the mask element represents a vector, <4 x i32>).
At some point, we must convert the scalar mask into a vector mask
(otherwise, calling TTI cost functions or IRBuilderBase functions may
yield incorrect results).
Since both ShuffleCostEstimator and ShuffleInstructionBuilder can modify
the CommonMask, we have decided to perform the mask transformation only
within createShuffle. However, we do not store the transformed result,
as createShuffle may be called multiple times.
2025-01-28 12:02:37 +08:00
Florian Hahn
713482fccf [VPlan] Use State.get to extract lane mask for BranchOnMask.
Simplifies the code slightly and avoids redundant extracts/broadcasts if
the operand is live-in or already scalar.
2025-01-27 21:35:36 +00:00
Florian Hahn
09a29fcc8d
[VPlan] Don't collect live-ins in collectUsersInExitBlocks. (NFC) (#123819)
Live-ins don't need to be handled, other than adding to the exit phi
recipe. Do that early and assert that otherwise the exit value is
defined in the vector loop region.

This should enable simply skipping other exit values that do not need
further fixing, e.g. if handling the exit value from the early exit
directly in handleUncountableEarlyExit.

PR: https://github.com/llvm/llvm-project/pull/123819
2025-01-27 16:12:07 +00:00
Simon Pilgrim
1bb784a748 [LowerMatrixIntrinsics] multiply-minimal.ll - use -passes="..." to allow DOS to correctly evaluate the RUN command
Necessary for running update_test_checks.py on windows
2025-01-27 16:05:29 +00:00
Simon Pilgrim
ad2b2aa50b [PhaseOrdering] vector-trunc.ll - use -passes="default<O2>" to allow DOS to correctly evaluate the RUN command
Necessary for running update_test_checks.py on windows
2025-01-27 16:05:29 +00:00
Simon Pilgrim
178f47143a
[CostModel][X86] getShuffleCost - shuffles with only one defined element are always cheap (#124412)
If we're just moving a single element around inside a 128-bit lane (probably as an alternative to extracting it), we can assume this is cheap as a single PSRLDQ/PSHUFD/SHUFPS.

I've got the horrid feeling we're moving towards matching all SSE shuffle patterns inside the cost model, but I'm going to do my best to avoid this for now :|
2025-01-27 15:56:22 +00:00
Nikita Popov
212f344b84 [InstCombine] Handle constant expression result in tryFactorization()
If IRBuilder folds the result to a constant expression, don't try
to set nowrap flags on it.

Fixes https://github.com/llvm/llvm-project/issues/124526.
2025-01-27 16:25:37 +01:00
Ramkumar Ramachandra
3a4376b8f9
LAA: handle 0 return from getPtrStride correctly (#124539)
getPtrStride returns 0 when the PtrScev is loop-invariant, and this is
not an erroneous value: it returns std::nullopt to communicate that it
was not able to find a valid pointer stride. In analyzeLoop, we call
getPtrStride with a value_or(0) conflating the zero return value with
std::nullopt. Fix this, handling loop-invariant loads correctly.
2025-01-27 14:21:14 +00:00
David Sherwood
b7286dbef9
Reland "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop #96752" (#123616)
The last attempt failed a sanitiser build because we were
creating a reference to a null Predicates pointer in
isDereferenceableAndAlignedInLoop. This was exposed by
the unit test IsDerefReadOnlyLoop in
unittests/Analysis/LoadsTest.cpp. I fixed this by falling
back on getConstantMaxBackedgeTakenCount if Predicates is
null - see line 316 in llvm/lib/Analysis/Loads.cpp. There
are no other changes.
2025-01-27 11:59:38 +00:00
Andreas Jonson
f8ab91f74f [LVI][CVP] Add test for trunc bittest. (NFC) 2025-01-26 16:55:05 +01:00
Teresa Johnson
2af819fa3d
[MemProf] Add test for hot hints (#124394)
The change in PR124219 required removing one of the tests added for
-memprof-use-hot-hints, since we no longer label any contexts as hot in
metadata, so add a new test that checks the hot attribute instead.
2025-01-26 07:53:53 -08:00
Simon Pilgrim
dec47b76f4
[CostModel][X86] Update baseline CTTZ/CTLZ costs for x86_64 (#124312)
Followup to #123623 - now that the CMOV has been removed, the throughput has improved, reducing the benefit of vectorization on pre-x86-64-v3 CPUs
2025-01-26 14:43:51 +00:00
Florian Hahn
81d38da65e [LV] Add more tests for narrowing interleave groups for AArch64.
Add additional tests for
https://github.com/llvm/llvm-project/pull/106441.
2025-01-26 13:52:18 +00:00
Fangrui Song
2131115be5
[InstCombine] Drop Range attribute when simplifying 'fshl' based on demanded bits (#124429)
When simplifying operands based on demanded bits, the return value range
of llvm.fshl might change. Keeping the Range attribute might cause
llvm.fshl to generate a poison and lead to miscompile. Drop the Range
attribute similar to `dropPosonGeneratingFlags` elsewhere.

Fix #124387
2025-01-25 13:35:11 -08:00
Fangrui Song
89f2fee9f8 [InstCombine] Add test for incorrect retention of Range attribute in fshl 2025-01-25 13:17:15 -08:00
Alexey Bataev
5e65f43041 [SLP][NFC]Add a test, producing serie of extrtactelements, building non-extendable tree 2025-01-25 11:50:14 -08:00
Florian Hahn
6383a12e3b
[VPlan] Refactor HCFG builder to preserve original vector latch (NFC).
Update HCFG builder to preserve the original latch block of the initial
VPlan, ensuring there is always a latch.

It also skips creating the BranchOnCond for the latch of the top-level
loop, instead of removing it later. Exiting via the latch is controlled
by later recipes.

This further unifies HCFG construction and prepares for use to also
build an initial VPlan (VPlan0) for inner loops.
2025-01-25 13:32:01 +00:00
David Green
52bffdf9f5
[IPSCCP][FuncSpec] Protect against metadata access from call args. (#124284)
Fixes an issue reported from #114964, where metadata arguments were
attempted to be accessed as constants.
2025-01-25 10:59:50 +00:00
Alex MacLean
07ed8187ac
[OpenMP] Replace nvvm.annotation usage with kernel calling conventions (#122320)
Specifying a kernel with the `ptx_kernel` or `amdgpu_kernel` calling
convention is a more idiomatic and compile-time performant than using
the `nvvm.annoation !"kernel"` metadata.

Transition OMPIRBuilder to use calling conventions for PTX kernels and
no longer emit `nvvm.annoation`. Update OpenMPOpt to work with kernels
specified via calling convention as well as metadata. Update OpenMP
tests to use the calling conventions.
2025-01-24 16:56:10 -08:00
Teresa Johnson
c725a95e08
[MemProf] Convert Hot contexts to NotCold early (#124219)
While we convert hot contexts to notcold contexts during the cloning
step, their existence was greatly limiting the context trimming
performed when we add the MemProf profile to the IR. To address this,
any hot contexts are converted to notcold contexts immediately after
first checking for unambiguous allocation types, and before checking it
again and before adding metadata while performing context trimming.

Note that hot hints are now disabled by default, however, this avoids
adding unnecessary overhead if they are re-enabled.
2025-01-24 15:58:13 -08:00
vporpo
6409799bdc
[SandboxVec][Legality] Pack from different BBs (#124363)
When the inputs of the pack come from different BBs we need to make sure
we emit the pack instructions at the correct place.
2025-01-24 15:39:37 -08:00
vporpo
ac75d32280
[SandboxVec][VecUtils] Filter out instructions not in BB in VecUtils:getLowest() (#124360)
This patch changes the functionality of `VecUtils::getLowest(Vals, BB)`
such that it filters out any instructions in `Vals` that are not in BB.
This is useful when Vals contains instructions from different BBs,
because in that case we are only interested in one BB.
2025-01-24 14:52:57 -08:00
Teresa Johnson
ae8b560899
[MemProf] Disable hot hints by default (#124338)
By default we were marking some contexts as hot, and adding hot hints to
unambiguously hot allocations. However, there is not yet support for
cloning to expose hot allocation contexts, and none is planned for the
forseeable future.

While we convert hot contexts to notcold contexts during the cloning
step, their existence was greatly limiting the context trimming
performed when we add the MemProf profile to the IR. This change simply
disables the generation of hot contexts / hints by default, as few
allocations were unambiguously hot.

A subsequent change will address the issue when hot hints are optionally
enabled. See PR124219 for details.

This change resulted in significant overhead reductions for a large
target:
~48% reduction in the per-module ThinLTO bitcode summary sizes
~72% reduction in the distributed ThinLTO bitcode combined summary sizes
~68% reduction in thin link time
~34% reduction in thin link peak memory
2025-01-24 13:06:11 -08:00
Alexandros Lamprineas
1b1270f30b
[FMV][GlobalOpt] Enable static resolution of non-FMV callers. (#124314)
The undetectable FMV features predres and ls64 have been removed,
therefore the optimization is now re-enabled. The llvm testsuite
Graviton4 bots are expected to remain green.
2025-01-24 19:48:40 +00:00
Stephen Long
ab976a1712
PreISelIntrinsicLowering: Lower llvm.exp/llvm.exp2 to a loop if scalable vec arg (#117568) 2025-01-24 14:02:06 -05:00
Simon Pilgrim
a12d7e4b61
[SLP] getVectorCallCosts - don't provide scalar argument data for vector IntrinsicCostAttributes (#124254)
getVectorCallCosts determines the cost of a vector intrinsic, based off
an existing scalar intrinsic call - but we were including the scalar
argument data to the IntrinsicCostAttributes, which meant that not only
was the cost calculation not type-only based, it was making incorrect
assumptions about constant values etc.

This also exposed an issue that x86 relied on fallback calculations for
funnel shift costs - this is great when we have the argument data as
that improves the accuracy of uniform shift amounts etc., but meant that
type-only costs would default to Cost=2 for all custom lowered funnel
shifts, which was far too cheap.

This is the reverse of #124129 where we weren't including argument data
when we could.

Fixes #63980
2025-01-24 15:13:13 +00:00
Simon Pilgrim
625e0a40f1 [SLP][X86] Add missing SSE2/SSE4 checks from vector rotate tests 2025-01-24 10:12:19 +00:00
Elvis Wang
aff1242b8e
[LV] Align debug location of the widen-phi to the original phi. (#120338)
This patch align the debug location of the widen-phi to the debug
location of original phi.

Split from: #120054
2025-01-24 17:49:54 +08:00
Simon Pilgrim
7746596713 [SLP][X86] Add VBMI2 coverage for funnel shift tests
VBMI2 CPUs actually have vector funnel shift instruction support
2025-01-24 09:47:40 +00:00
vporpo
d2234ca163
[SandboxVec][BottomUpVec] Fix packing when PHIs are present (#124206)
Before this patch we might have emitted pack instructions in between PHI
nodes. This patch fixes it by fixing the insert point of the new packs.
2025-01-23 16:29:01 -08:00
Alexander Kornienko
788318484d
Revert "[InstCombine] Teach foldSelectOpOp about samesign" (#124123)
Reverts llvm/llvm-project#122723 due to a miscompilation

See
https://github.com/llvm/llvm-project/pull/122723#issuecomment-2608777844
for details and the test case.
2025-01-24 01:08:10 +01:00
Min-Yih Hsu
bc74a1edbe
[IA] Generalize the support for power-of-two (de)interleave intrinsics (#123863)
Previously, AArch64 used pattern matching to support
llvm.vector.(de)interleave of 2 and 4; RISC-V only supported
(de)interleave of 2.

This patch consolidates the logics in these two targets by factoring out
the common factor calculations into the InterleaveAccess Pass.
2025-01-23 15:27:51 -08:00
vporpo
c7053ac202
[SandboxVec][BottomUpVec] Disable crossing BBs (#124039)
Crossing BBs is not currently supported by the structures of the
vectorizer. This patch fixes instances where this was happening,
including:
- a walk of use-def operands that updates the UnscheduledSuccs counter,
- the dead instruction removal is now done per BB,
- the scheduler, which will reject bundles that cross BBs.
2025-01-23 15:08:13 -08:00
David Blaikie
42043c423f Reapply "Verifier: Add check for DICompositeType elements being null"
This remove some erroneous debug info from tests that should address the
test failures that showed up when the this was previously committed.

This reverts commit 6716ce8b641f0e42e2343e1694ee578b027be0c4.
2025-01-23 22:29:30 +00:00
Vitaly Buka
0e213834df
Revert "[LoopVectorizer] Add support for chaining partial reductions (#120272)" (#124198)
Introduced stack buffer overflow, see #120272.

`getScaledReduction` can return empty vector, and there is not check for
that.

This reverts commit c9b7303b9b18129c4ee6b56aaa2a0a9f59be2d09.
This reverts commit caf0540b91b0fee31353dc7049ae836e0f814cff.
2025-01-23 14:00:33 -08:00
David Green
775d0f36f7
[GVN] Handle scalable vectors with the same size in VNCoercion (#123984)
This allows us to forward to a load even if the types do not match
(nxv4i32 vs nxv2i64 for example). Scalable types are allowed in
canCoerceMustAliasedValueToLoad so long as the size (minelts *
scalarsize) is the same, and some follow-on code is adjusted to make
sure it handles scalable sizes correctly. Methods like
analyzeLoadFromClobberingWrite and analyzeLoadFromClobberingStore still
do nothing for scalable vectors, as Offsets and mismatching types are
not supported.
2025-01-23 18:43:50 +00:00
David Green
bec4c7f5f7
[InstCombine] Unpack scalable struct loads/stores. (#123986)
This teaches unpackLoadToAggregate and unpackStoreToAggregate to unpack
scalable structs to individual loads/stores with insertvalues /
extractvalues. The gep used for the offsets uses an i8 ptradd as opposed
to a struct gep, as the geps for scalable structs are not supported and
we canonicalize to i8.
2025-01-23 18:04:27 +00:00
David Green
3d72619d75 [InstCombine] Add a test for splitting scalable structs. NFC 2025-01-23 17:36:06 +00:00
David Green
6d4e72abb8 [GVN] Add extra vscale tests with different types. NFC 2025-01-23 17:36:06 +00:00
Nicholas Guy
caf0540b91
[LoopVectorizer] Add support for chaining partial reductions (#120272)
Chaining partial reductions, where multiple partial reductions share an
accumulator, allow for more values to be combined together as part of
the reduction without discarding the semantics of the partial reduction
itself.
2025-01-23 17:24:57 +00:00
Simon Pilgrim
d8cd8d56ea
[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis. (#124129)
We were only constructing the IntrinsicCostAttributes with the arg type info, and not the args themselves, preventing more detailed cost analysis (constant / uniform args etc.)

Just pass the whole IntrinsicInst to the constructor and let it resolve everything it can.

Noticed while having yet another attempt at #63980
2025-01-23 16:57:13 +00:00