38497 Commits

Author SHA1 Message Date
Yingwei Zheng
eafbab6fac
[EntryExitInstrumenter][AArch64][RISCV][LoongArch] Pass __builtin_return_address(0) into _mcount (#121107)
On RISC-V, AArch64, and LoongArch, the `_mcount` function takes
`__builtin_return_address(0)` as an argument since
`__builtin_return_address(1)` is not available on these platforms. This
patch fixes the argument passing to match the behavior of glibc/gcc.

Closes https://github.com/llvm/llvm-project/issues/121103.
2025-01-01 15:02:08 +08:00
Florian Hahn
b06a45c66f
[VPlan] Add all blocks to outer loop if present during ::execute (NFCI).
This ensures that all blocks created during VPlan execution are properly
added to an enclosing loop, if present.

Split off from https://github.com/llvm/llvm-project/pull/108378 and also
needed once more of the skeleton blocks are created directly via VPlan.

This also allows removing the custom logic for early-exit loop
vectorization added as part of
https://github.com/llvm/llvm-project/pull/117008.
2024-12-31 19:34:34 +00:00
Simon Pilgrim
b195bb87e1 [VectorCombine] scalarizeLoadExtract - consistently use LoadInst and ExtractElementInst specific operand getters. NFC
Noticed while investigating the hung builds reported after af83093933ca73bc82c33130f8bda9f1ae54aae2
2024-12-31 14:42:39 +00:00
Florian Hahn
ddef380cd6
[VPlan] Move simplifyRecipe(s) definitions up to allow re-use (NFC)
Move definitions to allow easy reuse in
https://github.com/llvm/llvm-project/pull/108378.
2024-12-31 13:23:19 +00:00
Muhammad Omair Javaid
332d2647ff Revert "[LV]: Teach LV to recursively (de)interleave. (#89018)"
This reverts commit ccfe0de0e1e37ed369c9bf89dd0188ba0afb2e9a.

This breaks LLVM build on AArch64 SVE Linux buildbots
https://lab.llvm.org/buildbot/#/builders/143/builds/4462
https://lab.llvm.org/buildbot/#/builders/17/builds/4902
https://lab.llvm.org/buildbot/#/builders/4/builds/4399
https://lab.llvm.org/buildbot/#/builders/41/builds/4299
2024-12-31 03:12:24 +05:00
Simon Pilgrim
d5a96eb125 Revert af83093933ca73bc82c33130f8bda9f1ae54aae2 "[VectorCombine] eraseInstruction - ensure we reattempt to fold other users of an erased instruction's operands"
Reports of hung builds, but I don't have time to investigate at the moment.
2024-12-30 21:20:56 +00:00
Simon Pilgrim
af83093933 [VectorCombine] eraseInstruction - ensure we reattempt to fold other users of an erased instruction's operands
As we're reducing the use count of the operands its more likely that they will now fold, as they were previously being prevented by a m_OneUse check, or the cost of retaining the extra instruction had been too high.

This is necessary for some upcoming patches, although the only change so far is instruction ordering as it allows some SSE folds of 256/512-bit with 128-bit subvectors to occur earlier in foldShuffleToIdentity as the subvector concats are free.

Pulled out of #120984
2024-12-30 17:52:42 +00:00
Florian Hahn
16d19aaedf
[VPlan] Manage created blocks directly in VPlan. (NFC) (#120918)
This patch changes the way blocks are managed by VPlan. Previously all
blocks reachable from entry would be cleaned up when a VPlan is
destroyed. With this patch, each VPlan keeps track of blocks created for
it in a list and this list is then used to delete all blocks in the list
when the VPlan is destroyed. To do so, block creation is funneled
through helpers in directly in VPlan.

The main advantage of doing so is it simplifies CFG transformations, as
those do not have to take care of deleting any blocks, just adjusting
the CFG. This helps to simplify
https://github.com/llvm/llvm-project/pull/108378 and
https://github.com/llvm/llvm-project/pull/106748.

This also simplifies handling of 'immutable' blocks a VPlan holds
references to, which at the moment only include the scalar header block.

PR: https://github.com/llvm/llvm-project/pull/120918
2024-12-30 12:08:12 +00:00
Florian Hahn
7f3428d3ed
[VPlan] Compute induction end values in VPlan. (#112145)
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on https://github.com/llvm/llvm-project/pull/110004 and
https://github.com/llvm/llvm-project/pull/109975.

PR: https://github.com/llvm/llvm-project/pull/112145
2024-12-29 19:05:08 +00:00
Simon Pilgrim
f2f02b21cd [VectorCombine] foldShuffleOfBinops - only accept exact matching cmp predicates
m_SpecificCmp allowed equivalent predicate+flags which don't necessarily work after being folded from "shuffle (cmpop), (cmpop)" into "cmpop (shuffle), (shuffle)"

Fixes #121110
2024-12-28 09:21:31 +00:00
Fangrui Song
edc42b2dc1 [SLP] Migrate away from PointerUnion::get 2024-12-27 21:01:09 -08:00
Zequan Wu
4d8f9594b2 Revert "Reland "[LoopVectorizer] Add support for partial reductions" (#120721)"
This reverts commit c858bf620c3ab2a4db53e84b9365b553c3ad1aa6 as it casuse optimization crash on -O2, see https://github.com/llvm/llvm-project/pull/120721#issuecomment-2563192057
2024-12-27 11:51:54 -08:00
Florian Hahn
8caeb2e0c2
[VPlan] Always create initial blocks in constructor (NFC).
Update C++ unit tests to use VPlanTestBase to construct initial VPlan,
using a constructor that creates the VP blocks directly in the
constructor.

Split off from and in preparation for
https://github.com/llvm/llvm-project/pull/120918.
2024-12-27 17:43:22 +00:00
Alexey Bataev
07ba457525 [SLP][NFC]Add dump of combined entries, where applicable 2024-12-27 07:56:10 -08:00
Hassnaa Hamdi
ccfe0de0e1
[LV]: Teach LV to recursively (de)interleave. (#89018)
Currently available intrinsics are only ld2/st2, which don't support interleaving factor > 2.
This patch teaches the LV to use ld2/st2 recursively to support high
interleaving factors.
2024-12-27 12:42:07 +00:00
Elvis Wang
47e1c87a61
[VPlan] Set debug location for VPReduction/VPWidenIntrinsicRecipe. (#120054)
This patch add missing debug location for
VPReduction/VPWidenIntrinsicRecipe.
2024-12-27 10:37:21 +08:00
Florian Hahn
2dfe1b4042
[VPlan] Remove stray space when printing reverse vector pointer.
printFlags() takes care of printing the required space, remove the extra
printed space between flags and operands.
2024-12-26 21:26:17 +00:00
Alexey Bataev
889215a30e [SLP]Followup fix for the poisonous logical op in reductions
If the VectorizedTree still may generate poisonous value, but it is not
the original operand of the reduction op, need to check if Res still the
operand, to generate correct code.

Fixes #114905
2024-12-26 05:11:26 -08:00
DaPorkchop_
cea738bc9a
[SimplifyCFG] Replace unreachable switch lookup table holes with poison (#94990)
As discussed in #94468, this causes switch lookup table entries which
are unreachable to be poison instead of filling them with a value from
one of the reachable cases.

---------

Co-authored-by: DianQK <dianqk@dianqk.net>
2024-12-26 07:47:26 +08:00
Usman Nadeem
5fb57131b7
[DFAJumpThreading] Don't bail early after encountering unpredictable values (#119774)
After #96127 landed, mshockwave reported that the pass was no longer
threading SPEC2006/perlbench.

After 96127 we started bailing out in `getStateDefMap` and rejecting the
transformation because one of the unpredictable values was coming from
inside the loop. There was no fundamental change in that function except
that we started calling `Loop->contains(IncomingBB)` instead of
`LoopBBs.count(IncomingBB)`. After some analysis I came to the
conclusion that even before 96127 we would reject the transformation if
we provided large enough limits on the path traversal (large enough so
that LoopBBs contained blocks corresponding to that unpredictable
value).

In this patch I changed `getStateDefMap` to not terminate early on
finding an unpredictable value, this is because
`getPathsFromStateDefMap`, later, actually has checks to ensure that the
final list of paths only have predictable values. As a result we can now
partially thread functions like `negative6` in the tests that have some
predictable paths.

This patch does not really have any compile-time impact on the test
suite without `-dfa-early-exit-heuristic=false` (early exit is enabled
by default).

Change-Id: Ie1633b370ed4a0eda8dea52650b40f6f66ef49a3
2024-12-25 01:29:01 -08:00
LiqinWeng
b5f0ec80d5
[VPlan] Remove redundant printing final in VPlan::execute (#121048)
Multiple prints will cause problems when testing ir-bb
2024-12-25 10:11:02 +08:00
Alexey Bataev
07d284d4eb
[SLP]Add cost estimation for gather node reshuffling
Adds cost estimation for the variants of the permutations of the scalar
values, used in gather nodes. Currently, SLP just unconditionally emits
shuffles for the reused buildvectors, but in some cases better to leave
them as buildvectors rather than shuffles, if the cost of such
buildvectors is better.

X86, AVX512, -O3+LTO
Metric: size..text

Program                                                                        size..text
                                                                               results     results0    diff
                 test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   912998.00   913238.00  0.0%
 test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   203070.00   203102.00  0.0%
     test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1396320.00  1396448.00  0.0%
      test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1396320.00  1396448.00  0.0%
                       test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   309790.00   309678.00 -0.0%
      test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12477607.00 12470807.00 -0.1%

CINT2006/445.gobmk - extra code vectorized
MiBench/consumer-lame - small variations
CFP2017speed/638.imagick_s
CFP2017rate/538.imagick_r - extra vectorized code
Benchmarks/Bullet - extra code vectorized
CFP2017rate/526.blender_r - extra vector code

RISC-V, sifive-p670, -O3+LTO
CFP2006/433.milc - regressions, should be fixed by https://github.com/llvm/llvm-project/pull/115173
CFP2006/453.povray - extra vectorized code
CFP2017rate/508.namd_r - better vector code
CFP2017rate/510.parest_r - extra vectorized code
SPEC/CFP2017rate - extra/better vector code
CFP2017rate/526.blender_r - extra vectorized code
CFP2017rate/538.imagick_r - extra vectorized code
CINT2006/403.gcc - extra vectorized code
CINT2006/445.gobmk - extra vectorized code
CINT2006/464.h264ref - extra vectorized code
CINT2006/483.xalancbmk - small variations
CINT2017rate/525.x264_r - better vectorization

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/115201
2024-12-24 15:35:29 -05:00
Florian Hahn
2d038caeeb
[VPlan] Remove stray space when printing VPWidenCastRecipe.
printFlags() already takes care of printing a single space if there are
no flags. Remove the extra space when printing a recipe without flags.
2024-12-24 20:23:48 +00:00
Alexey Bataev
852feea820 [SLP]Propagate AssumptionCache where possible 2024-12-24 09:20:26 -08:00
Alexey Bataev
0d6cb0ae9d [SLP]Fix strict weak ordering criterion in comparators
Fixes #121019
2024-12-24 08:13:57 -08:00
Alexey Bataev
f0f8dab712 [SLP]Check if the first reduced value requires freeze/swap, if it may be too poisonous
If several reduced values are combined and the first reduced value is
just the original reduced value of the bool logical op, need to freeze
it to prevent the propagation of the poison value.

Fixes #114905
2024-12-24 07:40:35 -08:00
Sam Tebbs
c858bf620c
Reland "[LoopVectorizer] Add support for partial reductions" (#120721)
This re-lands the reverted #92418 

When the VF is small enough so that dividing the VF by the scaling
factor results in 1, the reduction phi execution thinks the VF is scalar
and sets the reduction's output as a scalar value, tripping assertions
expecting a vector value. The latest commit in this PR fixes that by
using `State.VF` in the scalar check, rather than the divided VF.

---------

Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>
2024-12-24 12:08:17 +00:00
Alexey Bataev
030829a7e5 [SLP]Drop samesign flag if the vector node has reduced bitwidth
If the operands of the icmp instructions has reduced bitwidth after
MinBitwidth analysis, need to drop samesign flag to preserve correctness
of the transformation.

Fixes #120823
2024-12-23 16:55:11 -08:00
Benjamin Maxwell
9ab5474e56
[LV] Rename ToVectorTy to toVectorTy (NFC) (#120404)
This is for consistency with other helpers (and also follows the LLVM
naming conventions).
2024-12-23 23:33:44 +00:00
Florian Hahn
c7a777322d
[VPlan] Replace else-if dyn_cast with cast (NFC).
The recipes handled here are either VPWidenIntrinsic or VPWidenCast, so
replace the else-if dyn_cast with a single else + cast.
2024-12-23 19:46:22 +00:00
Simon Pilgrim
e3f8c229f5 [VectorCombine] foldInsExtVectorToShuffle - inserting into a poison base vector can be modelled as a single src shuffle
We already canonicalized an undef base vector to the RHS to improve further folding, this extends this to improve the shuffle cost estimate of the single src shuffle
2024-12-23 15:49:17 +00:00
Simon Pilgrim
29c89d7265
[VectorCombine] foldShuffleOfShuffles - fold "shuffle (shuffle x, y, m1), (shuffle y, x, m2)" -> "shuffle x, y, m3" (#120959)
foldShuffleOfShuffles currently only folds unary shuffles to ensure we don't end up with a merged shuffle with more than 2 sources, but this prevented cases where both shuffles were sharing sources.

This patch generalizes the merge process to find up to 2 sources as it merges with the inner shuffles, it also moves the undef/poison handling stages into the merge loop as well.

Fixes #120764
2024-12-23 14:56:15 +00:00
Han-Kuan Chen
11676da808
[SLP] Normalize debug messages for newTreeEntry. (#119514)
A debug message should follow after newTreeEntry.
Make ExtractValueInst and ExtractElementInst use setOperand directly.
2024-12-23 21:42:02 +08:00
Haopeng Liu
8daba2c13d
Skip negative length while inferring initializes attr (#120874)
Bail out negative length while inferring initializes attr. Otherwise it
causes an assertion error:
`Attribute 'initializes' does not support unordered ranges`
2024-12-22 19:01:52 -08:00
LiqinWeng
b1fab4f849
[LV][VPlan] Initialize the variable 'VPID' of the createEVLRecipe (#120926)
Resolve the compilation error caused by the merge issue: #119510
2024-12-23 09:23:22 +08:00
LiqinWeng
8a51471d83
[LV][VPlan] Extract the implementation of transform Recipe to EVLRecipe into a small function. NFC (#119510) 2024-12-23 08:28:19 +08:00
Simon Pilgrim
bf873aa3ec [VectorCombine] foldShuffleToIdentity - add debug message for match
Helps with debugging to show to that the fold found the match.
2024-12-22 17:21:44 +00:00
Simon Pilgrim
f96337e04e [VectorCombine] foldConcatOfBoolMasks - add debug message for match + cost-comparison
Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.
2024-12-22 16:21:02 +00:00
Florian Hahn
e1833e3a7e
[VPlan] Simplify redundant VPDerivedIVRecipe (NFC).
Split DerivedIV simplification off from
https://github.com/llvm/llvm-project/pull/112145 and use to remove the
need for extra checks in createScalarIVSteps. Required an extra
simplification run after IV transforms.
2024-12-22 09:39:19 +00:00
LiqinWeng
86fa35ce7e
[LV][VPlan] Use opcode to retrieve the VPID of the CallRecipe, rather than underlying instruction (#120816)
This patch may cause the flags in the CallRecipe to be lost after EVL
transformation, and it has been addressed in the patch: #119847
2024-12-22 10:28:20 +08:00
Florian Hahn
9b496deb90
[VPlan] Set and use debug location for VPPredInstPHIRecipe.
Update the recipe it always set its debug location and use it during IR
generation.
2024-12-21 21:57:47 +00:00
GrumpyPigSkin
f7ba2bdf86
[LLVM][SLSR] Add a debug counter (#119981)
Added debug counter and test for SLSR.

Fixes: https://github.com/llvm/llvm-project/issues/119770
2024-12-21 12:37:44 -05:00
Florian Hahn
bb86c5dd4d
[VPlan] Use inferScalarType in VPInstruction::ResumePhi codegen (NFC).
Use VPlan-based type analysis to retrieve type of phi node. Also adds
missing type inference for ResumePhi and ComputeReductionResult opcodes.
2024-12-21 15:55:21 +00:00
vporpo
7a38445ee2
[SandboxVec][DAG] Register move instr callback (#120146)
This patch implements the move instruction notifier for the DAG.
Whenever an instruction moves the notifier will maintain the DAG.
2024-12-20 23:10:24 -08:00
Kazu Hirata
adf0c817f3
[memprof] Undrift MemProf profile even when some frames are missing (#120500)
This patch makes the MemProf undrifting process a little more lenient.
Consider an inlined call hierarchy:

  foo -> bar -> ::new

If bar tail-calls ::new, the profile appears to indicate that foo
directly calls ::new.  This is a problem because the perceived call
hierarchy in the profile looks different from what we can obtain from
the inline stack in the IR.

Recall that undrifting works by constructing and comparing a list of
direct calls from the profile and that from the IR.  This patch
modifies the construction of the latter.  Specifically, if foo calls
bar in the IR, but bar is missing the profile, we pretend that foo
directly calls some heap allocation function.  We apply this
transformation only in the inline stack leading to some heap
allocation function.
2024-12-20 15:40:08 -08:00
Owen Anderson
bc8fa9c443
Revert "SimplifyLibCalls: Use default globals address space when building new global strings. (#118729)" (#119616)
This reverts commit cfa582e8aaa791b52110791f5e6504121aaf62bf.
2024-12-21 09:33:39 +13:00
Teresa Johnson
c7451ffcb9
[MemProf] Supporting hinting mostly-cold allocations after cloning (#120633)
Optionally unconditionally hint allocations as cold or not cold during
the cloning step if the percentage of bytes allocated is at least that
of the given threshold. This is similar to PR120301 which supports this
during matching, but enables the same behavior during cloning, to reduce
the false positives that can be addressed by cloning at the cost of
carrying the additional size metadata/summary.
2024-12-20 11:27:54 -08:00
Thurston Dang
5bb650345d
Remove -bounds-checking-unique-traps (replace with -fno-sanitize-merge=local-bounds) (#120682)
#120613 removed -ubsan-unique-traps and replaced it with
-fno-sanitize-merge (introduced in #120511), which allows fine-grained
control of which UBSan checks to prevent merging. This analogous patch
removes -bound-checking-unique-traps, and allows it to be controlled via
-fno-sanitize-merge=local-bounds.

Most of this patch is simply plumbing through the compiler flags into
the bounds checking pass.

Note: this patch subtly changes -fsanitize-merge (the default) to also
include -fsanitize-merge=local-bounds. This is different from the
previous behavior, where -fsanitize-merge (or the old
-ubsan-unique-traps) did not affect local-bounds (requiring the separate
-bounds-checking-unique-traps). However, we argue that the new behavior
is more intuitive.

Removing -bounds-checking-unique-traps and merging its functionality
into -fsanitize-merge breaks backwards compatibility; we hope that this
is acceptable since '-mllvm -bounds-checking-unique-traps' was an
experimental flag.
2024-12-20 10:07:44 -08:00
Simon Pilgrim
82b5bda42c [VectorCombine] Add "VC: Erasing" debug message to help the log show when dead WorkList instructions are erased. 2024-12-20 17:59:14 +00:00
Simon Pilgrim
e3157d3f0d [VectorCombine] foldBitcastShuffle - add debug message for match + cost-comparison
Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.
2024-12-20 17:59:13 +00:00