30816 Commits

Author SHA1 Message Date
Sam Tebbs
795e35a653
Reland "[LoopVectorizer] Add support for partial reductions" with non-phi operand fix. (#121744)
This relands the reverted #120721 with a fix for cases where neither
reduction operand are the reduction phi. Only
63114239cc8d26225a0ef9920baacfc7cc00fc58 and
63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted
PR.

---------

Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>
2025-01-13 11:20:35 +00:00
Florian Hahn
8df64ed777 [LV] Don't consider IV increments uniform if exit value is used outside.
In some cases, there might be a chain of uniform instructions producing
the exit value. To generate correct code in all cases, consider the IV
increment not uniform, if there are users outside the loop.

Instead, let VPlan narrow the IV, if possible using the logic from
3ff1d01985752.

Test case from #122602 verified with Alive2:
    https://alive2.llvm.org/ce/z/bA4EGj

Fixes https://github.com/llvm/llvm-project/issues/122496.
Fixes https://github.com/llvm/llvm-project/issues/122602.
2025-01-12 22:03:21 +00:00
Florian Hahn
f5a35a31bf [LV] Add test cases with incorrect IV live-outs.
Add test cases for https://github.com/llvm/llvm-project/issues/122496
and https://github.com/llvm/llvm-project/issues/122602.
2025-01-12 20:55:20 +00:00
Florian Hahn
3ff1d01985 Recommit "[VPlan] Try to narrow wide and replicating recipes to uniform recipes."
This reverts commit 0ebb3ac7c92c4c1c44e7f3d17832d75ec5a42a67.

Re-applies commit with typos fixed.
2025-01-12 20:10:28 +00:00
Florian Hahn
0ebb3ac7c9 Revert "[VPlan] Try to narrow wide and replicating recipes to uniform recipes."
This reverts commit 1afba19913253dda865a8e57b37b9f4dabead1ac.

Typo breaking the build
2025-01-12 19:37:45 +00:00
Florian Hahn
1afba19913 [VPlan] Try to narrow wide and replicating recipes to uniform recipes.
Use the existing VPlan-based analysis to identify recipes that only have
their first lane demanded and transform them to uniform recpliate
recipes. This simplifies the generated code in some places and prepares
for fixing https://github.com/llvm/llvm-project/issues/122496.
2025-01-12 19:32:01 +00:00
Ruhung
4f7dc1b55a
[InstCombine] Fold (add (add A, 1), (sext (icmp ne A, 0))) to call umax(A, 1) (#122491)
Transform (add (add A, 1), (sext (icmp ne A, 0))) into call umax(A, 1).

Fixes #121853.

Alive2: https://alive2.llvm.org/ce/z/TweTan
2025-01-12 16:51:58 +01:00
goldsteinn
17ef436e3d
[ValueTracking] Take into account whether zero is poison when computing CR for ct{t,l}z (#122548) 2025-01-11 15:11:11 -06:00
goldsteinn
cc995ad064
[InstSimpify] Simplifying (xor (sub C_Mask, X), C_Mask) -> X (#122552)
- **[InstSimpify] Add tests for simplifying `(xor (sub C_Mask, X),
C_Mask)`; NFC**
- **[InstSimpify] Simplifying `(xor (sub C_Mask, X), C_Mask)` -> `X`**

Helps address regressions with folding `clz(Pow2)`.

Proof: https://alive2.llvm.org/ce/z/zGwUBp
2025-01-11 15:10:42 -06:00
Amr Hesham
642e493d4d
[InstCombine] Convert fshl(x, 0, y) to shl(x, and(y, BitWidth - 1)) when BitWidth is pow2 (#122362)
Convert `fshl(x, 0, y)` to `shl(x, and(y, BitWidth - 1))` when BitWidth
is pow2

Alive2 proof: https://alive2.llvm.org/ce/z/3oTEop
Fixes: #122235
2025-01-11 11:48:05 +01:00
Veera
2d5f07c828
[InstCombine] Fold X udiv Y to X lshr cttz(Y) if Y is a power of 2 (#121386)
Fixes #115767

This PR folds `X udiv Y` to `X lshr cttz(Y)` if Y is a power of two
since bitwise operations are faster than division.

Proof: https://alive2.llvm.org/ce/z/qHmLta
2025-01-11 13:56:13 +08:00
Mircea Trofin
6329355860
[ctxprof] Move test serialization to yaml (#122545)
We have a textual representation of contextual profiles for test scenarios, mainly. This patch moves that to YAML instead of JSON. YAML is more succinct and readable (some of the .ll tests should be illustrative). In addition, JSON is parse-able by the YAML reader.

A subsequent patch will address deserialization.

(thanks, @kazutakahirata, for showing me how to use the llvm YAML reader/writer APIs, which I incorrectly thought to be more low-level than the JSON ones!)
2025-01-10 18:04:25 -08:00
Florian Hahn
44058e5b5f [LV] Precommit tests for #106441.
Tests for https://github.com/llvm/llvm-project/pull/106441
from https://github.com/llvm/llvm-project/issues/82936.
2025-01-10 18:49:44 +00:00
Alexey Bataev
681c83a2f9 [SLP]Fix mask generation after cost estimation
When estimating the cost of entries shuffles for buildvectors, need to
rebuild original mask, not a generated submask, used for subregisters
analysis.

Fixes #122430
2025-01-10 09:32:35 -08:00
Alex MacLean
59ced72bc2
[ValueTracking] Add rotate idiom to haveNoCommonBitsSet special cases (#122165)
An occasional idiom for rotation is "(A << B) + (A >> (BitWidth - B))".
Currently this is not well handled on targets with native
funnel-shift/rotate support. Add a special case to haveNoCommonBitsSet
to ensure that the addition is converted to a disjoint or in InstCombine
so during instruction selection the idiom can be converted to an
efficient rotation implementation.

Proof: https://alive2.llvm.org/ce/z/WdCZsN
2025-01-10 09:17:44 -08:00
Alexey Bataev
3c9c94a24f Revert "[SLP]Fix mask generation after cost estimation"
This reverts commit 547ba9730bf05df3383150f730a689f2c8336206 to fix
buildbots reported in
https://lab.llvm.org/buildbot/#/builders/123/builds/11370, https://lab.llvm.org/buildbot/#/builders/133/builds/9492
2025-01-10 08:46:42 -08:00
Alexey Bataev
547ba9730b [SLP]Fix mask generation after cost estimation
When estimating the cost of entries shuffles for buildvectors, need to
rebuild original mask, not a generated submask, used for subregisters
analysis.

Fixes #122430
2025-01-10 08:17:56 -08:00
Alexey Bataev
920c58916a [SLP][NFC]Add a test with the mask translate after buildvector shuffle cost estimation 2025-01-10 08:12:03 -08:00
Nikita Popov
eeac0ffaf4 Revert "[MachineLICM] Use RegisterClassInfo::getRegPressureSetLimit (#119826)"
This reverts commit b4e17d4a314ed87ff6b40b4b05397d4b25b6636a.

This causes a large compile-time regression.
2025-01-10 09:05:06 +01:00
Teresa Johnson
3055e86c71
[MemProf] Disable cloning of callsites in recursive cycles by default (#122354)
This disables the support added in PR121985 by default while we
investigate a compile time crash.
2025-01-09 12:01:43 -08:00
vporpo
6312beef78
[SandboxVec][BottomUpVec] Use SeedCollector and slice seeds (#120826)
With this patch we switch from the temporary dummy seeds to actual seeds
provided by the seed collector.
The seeds get sliced and each slice is used as the starting point for
vectorization.
2025-01-09 11:53:48 -08:00
Alexey Bataev
5ff36748cf [SLP]Fix mask processing for reused gathered scalars
Need to sync the mask between cost and actual emission to avoid bugs in
mask calculation

Fixes #122324
2025-01-09 11:24:48 -08:00
Florian Hahn
b6cda338ab
[Loads] Also consider getPointerAlignment when checking assumptions. (#120916)
Also use getPointerAlignment when trying to use alignment and
dereferenceable assumptions. This catches cases where dereferencable is
known via the assumption but alignment is known via getPointerAlignment
(e.g. via argument attribute or align of 1)

PR: https://github.com/llvm/llvm-project/pull/120916
2025-01-09 18:19:39 +00:00
Mikhail Gudim
c87ef146e1
[InstCombine][NFC] Precommit a test for folding a binary op of reductions. (#121568) 2025-01-09 12:15:20 -05:00
Pengcheng Wang
b4e17d4a31
[MachineLICM] Use RegisterClassInfo::getRegPressureSetLimit (#119826)
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of
`TargetRegisterInfo::getRegPressureSetLimit` with some logics to
adjust the limit by removing reserved registers.

It seems that we shouldn't use
`TargetRegisterInfo::getRegPressureSetLimit`
directly, just like the comment "This limit must be adjusted
dynamically for reserved registers" said.

Separate from https://github.com/llvm/llvm-project/pull/118787
2025-01-09 21:05:52 +08:00
Florian Hahn
b0697dc1de
[LV] Only check isVectorizableEarlyExitLoop with multiple exits. (#121994)
Currently we emit early-exit related debug messages/remarks even when
there is a single exit. Update to only check isVectorizableEarlyExitLoop
if there isn't a single exit block.

PR: https://github.com/llvm/llvm-project/pull/121994
2025-01-09 12:05:19 +00:00
Nikita Popov
dcdf44aca7
[InstCombine] Remove foldSelectICmpEq() fold (#122098)
This fold matches complex patterns, for which we have no proof of
real-world relevance, and which does not actually handle the originally
motivating cases from https://github.com/llvm/llvm-project/issues/71792
either.

In https://github.com/llvm/llvm-project/pull/121708 and
https://github.com/llvm/llvm-project/pull/121753 we have handled some
simpler variants by extending existing folds.

I propose to remove this code until we have evidence that it is useful
for something.
2025-01-09 12:33:01 +01:00
Benjamin Maxwell
f88ef1bd1b
[LV] Teach LoopVectorizationLegality about struct vector calls (#119221)
This is a split-off from #109833 and only adds code relating to checking
if a struct-returning call can be vectorized.

This initial patch only allows the case where all users of the struct
return are `extractvalue` operations that can be widened.

```
%call = tail call { float, float } @foo(float %in_val)
%extract_a = extractvalue { float, float } %call, 0
%extract_b = extractvalue { float, float } %call, 1
```

Note: The tests require the VFABI changes from #119000 to pass.
2025-01-09 09:27:29 +00:00
Yingwei Zheng
b8337dc4b2
[InstCombine] Handle commuted patterns in foldBinOpShiftWithShift (#122126)
Closes https://github.com/llvm/llvm-project/issues/121775.
2025-01-09 14:36:17 +08:00
David Green
676c641718
[VectorCombine] Use getInstructionCost to cost Shuffle. (#122068)
This allows it to produce a more accurate cost for the shuffle, using
the more accurate calls to getShuffleCost in getInstructionCost. It
helps fix some of the regressions from vector combine a little while
ago, now that we have better subvector extract costs.
2025-01-08 20:48:40 +00:00
Andreas Jonson
d4182f1b56
[InstCombine] move foldAndOrOfICmpsOfAndWithPow2 into foldLogOpOfMaskedICmps (#121970) 2025-01-08 18:04:38 +01:00
Simon Pilgrim
322ff42315 [PhaseOrdering][AArch64] block_scaling_decompr_8bit.ll - use -passes="default<O3>" to allow DOS to correctly evaluate the RUN command
Necessary for running update_test_checks.py on windows
2025-01-08 15:07:21 +00:00
Alexey Bataev
1160994602 [SLP]Fix a crash for very long GEP chains
Need to check if the GEP bases are equal and return false early. Also,
need to return false if the lookup is too deep, considering bases equal
too. Fixes a crash in the assertion.
2025-01-08 06:47:41 -08:00
David Green
a8dab1aa03
[AArch64] Add a subvector extract cost. (#121472)
These can generally be emitted using an ext instruction or mov from the
high half. The half half extracts can be free depending on the users,
but that is not handled here, just the basic costs. It originally
included all subvector extracts, but that was toned-down to just
half-vector extracts to try and help the mid end not breakup high/low
extracts without having the SLP vectorizer create a mess using other
shuffles.
2025-01-08 08:13:07 +00:00
Luke Lau
f0d5104c94
[VPlan] Handle some VPInstructions in may{Read,Write}FromMemory (#120058)
This just copies the same conservative definition from mayWriteToMemory,
and enables more VPInstructions to be hoisted out in LICM.

I think this should give more accurate costs, and I was able to build
llvm-test-suite without the legacy-vplan cost model assertion going off.
2025-01-08 15:17:26 +08:00
Alex MacLean
4583f6d344
[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806)
the `ptx_kernel` calling convention is a more idiomatic and standard way
of specifying a NVPTX kernel than using the metadata which is not
supposed to change the meaning of the program. Further, checking the
calling convention is significantly faster than traversing the metadata,
improving compile time.

This change updates the clang and mlir frontends as well as the
NVPTXCtorDtorLowering pass to emit kernels using the calling convention.
In addition, this updates all NVPTX unit tests to use the calling
convention as well.
2025-01-07 18:24:50 -08:00
Vyacheslav Klochkov
9184c42869
[LoadStoreVectorizer] Postprocess and merge equivalence classes (#121861)
This patch introduces a new method:

void Vectorizer::mergeEquivalenceClasses(EquivalenceClassMap &EQClasses)
const;

The method is called at the end of
Vectorizer::collectEquivalenceClasses() and is needed to merge
equivalence classes that differ only by their underlying objects (UO1
and UO2), where UO1 is 1-level-indirection underlying base for UO2. This
situation arises due to the limited lookup depth used during the search
of underlying bases with llvm::getUnderlyingObject(ptr).

Using any fixed lookup depth can result into creation of multiple
equivalence classes that only differ by 1-level indirection bases.

The new approach merges equivalence classes if they have adjacent bases
(1-level indirection). If a series of equivalence classes form ladder
formed of 1-step/level indirections, they are all merged into a single
equivalence class. This provides more opportunities for the load-store
vectorizer to generate better vectors.

---------

Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
2025-01-07 17:17:26 -08:00
Teresa Johnson
b8ad6fb066
[MemProf] Allow cloning of callsites in recursive cycles (#121985)
Optionally (by default) no longer mark callsite nodes as Recursive,
which means they would be automatically skipped during cloning. This was
too conservative as it prevents cloning of any callsite that showed up
in any recursive cycle, even for non-recursive contexts.

While this will enable partial cloning of recursive contexts, the
recursive calls themselves will not be updated to call the correct
clone, possibly leading to some unnecessary but benign cloning and
affecting bytes hinted reporting. To prevent this, optional support
looks for recursive cycles in contexts during cloning and removes
those contexts from cloning. This requires some additional runtime
overhead, so is disabled by default for now.

Support for correct cloning of recursive cycles is WIP.
2025-01-07 17:00:46 -08:00
Florian Hahn
0eaa69eb23
[VPlan] Handle VPExpandSCEVRecipe in isUniformAfterVectorization.
VPExpandSCEVRecipes must be placed in the entry and are alway uniform.
This fixes a crash by always identifying them as uniform, even if the
main vector loop region has been removed.

Fixes https://github.com/llvm/llvm-project/issues/121897.
2025-01-07 21:35:09 +00:00
Florian Hahn
ea14bdb035
[LV] Add test showing debug output for loops with uncountable BTCs.
Currently we print an early-exit related related debug message, even
though there's no early exit.
2025-01-07 20:27:30 +00:00
goldsteinn
6192fafe9c
[InstSimplify] Use multi-op replacement when simplify select (#121708)
- **[InstSimplify] Refactor `simplifyWithOpsReplaced` to allow multiple
replacements; NFC**
- **[InstSimplify] Use multi-op replacement when simplify `select`**

In the case of `select X | Y == 0 :...` or `select X & Y == -1 : ...`
we can do more simplifications by trying to replace both `X` and `Y`
with the respective constant at once.

Handles some cases for https://github.com/llvm/llvm-project/pull/121672
more generically.
2025-01-07 11:42:01 -06:00
Andreas Jonson
15d3e4afd6 [InstCombine] Test for two types of bittests (NFC) 2025-01-07 18:34:31 +01:00
Florian Mayer
ef391dbc29
[LV] Drop incorrect inbounds for reverse vector pointer when folding tail (#120730)
When folding the tail, we may compute an address that we don't in the
original scalar loop and it may not be inbounds. Drop Inbounds in that
case.
2025-01-07 06:14:01 -08:00
Yingwei Zheng
4e066b6be4
[PatternMatch] Match commuted patterns in Signum_match (#121911)
Closes https://github.com/llvm/llvm-project/issues/121776.
2025-01-07 21:31:48 +08:00
Lewis Crawford
a629d9e102
[NVPTX] Constant-folding for f2i, d2ui, f2ll etc. (#118965)
Add constant-folding support for the NVVM intrinsics for converting
float/double to signed/unsigned int32/int64 types, including all
rounding-modes and ftz modifiers.
2025-01-07 13:17:36 +00:00
Yingwei Zheng
882df05435
[InstCombine] Fold (A | B) ^ (A & C) --> A ? ~C : B (#121906)
Closes https://github.com/llvm/llvm-project/issues/121773.
2025-01-07 20:50:35 +08:00
Simon Pilgrim
5a7dfb4659 [CostModel][X86] Attempt to match v4f32 shuffles that map to MOVSS/INSERTPS instruction
improveShuffleKindFromMask matches this as a SK_InsertSubvector of a v1f32 (which legalises to f32) into a v4f32 base vector, making it easy to recognise. MOVSS is limited to index0.
2025-01-07 11:31:44 +00:00
Nikita Popov
63d4e0fb66 [InstCombine] Compute result directly on APInts
If the bitwidth is 2 and we add two 1s, the result may overflow.
This is fine in terms of correctness, but triggers the APInt ctor
assertion. Fix this by performing the calculation directly on APInts.

Fixes the issue reported in:
https://github.com/llvm/llvm-project/pull/114539#issuecomment-2574845003
2025-01-07 12:13:19 +01:00
Simon Pilgrim
4a42658c1b [VectorCombine][X86] shuffle-of-cmps.ll - tweak shuf_fcmp_oeq_v4i32 shuffle to be not so cheap
An upcoming patch will recognise this as a cheap INSERTPS shuffle - alter the shuffle to ensure the 2 x FCMP is still cheaper on SSE4 targets
2025-01-07 11:07:48 +00:00
Yingwei Zheng
231d113c7e
[InstCombine] Handle commuted patterns in foldSelectWithSRem (#121896)
Closes https://github.com/llvm/llvm-project/issues/121771.
2025-01-07 17:09:58 +08:00