35632 Commits

Author SHA1 Message Date
Alexey Bataev
ef7f6aca14 [SLP][NFC]Add some extra checks/reorganize the code to improve compile time, NFC. 2024-02-01 10:53:39 -08:00
Nikita Popov
62ae7d976f [LoopUnroll] Fix missing sign extension
For integers larger than 64-bit, this would zero-extend a -1
value, instead of sign-extending it.

Fixes https://github.com/llvm/llvm-project/issues/80289.
2024-02-01 16:08:25 +01:00
Alexey Bataev
15295d0135 [SLP][NFC]Introduce and use computeCommonAlignment function, NFC. 2024-02-01 06:13:39 -08:00
Florian Hahn
da437330be
[SCEVExp] Keep NUW/NSW if both original inc and isomporphic inc agree. (#79512)
We are replacing with a wider increment. If both OrigInc and
IsomorphicInc are NUW/NSW, then we can preserve them on the wider
increment; the narrower IsomorphicInc would wrap before the wider
OrigInc, so the replacement won't make IsomorphicInc's uses more
poisonous.

PR: https://github.com/llvm/llvm-project/pull/79512
2024-02-01 11:01:29 +00:00
Philip Reames
f264da4322
[lsr][term-fold] Restrict transform to low cost expansions (#74747)
This is a follow up to an item I noted in my submission comment for
e947f95. I don't have a real world example where this is triggering
unprofitably, but avoiding the transform when we estimate the loop to be
short running from profiling seems quite reasonable. It's also now come
up as a possibility in a regression twice in two days, so I'd like to
get this in to close out the possibility if nothing else.

The original review dropped the threshold for short trip count loops. I
will return to that in a separate review if this lands.
2024-01-31 14:48:20 -08:00
Nikita Popov
4f32f5d572
[AA][JumpThreading] Don't use DomTree for AA in JumpThreading (#79294)
JumpThreading may perform AA queries while the dominator tree is not up
to date, which may result in miscompilations.

Fix this by adding a new AAQI option to disable the use of the dominator
tree in BasicAA.

Fixes https://github.com/llvm/llvm-project/issues/79175.
2024-01-31 15:23:53 +01:00
Florian Hahn
cec24f0d7e
[VPlan] Update stale test after 9536a6286, fix formatting. 2024-01-31 13:45:38 +00:00
Florian Hahn
9536a6286e
[VPlan] Preserve original induction order when creating scalar steps.
Update createScalarIVSteps to take an insert point as parameter. This
ensures that the inserted scalar steps are in the same order as the
recipes they replace (vs in reverse order as currently). This helps to
reduce the diff for follow-up changes.
2024-01-31 13:31:28 +00:00
Yingwei Zheng
817d0cb485
[InstCombine] Simplify commutative compares of symmetric pairs (#80134)
Fixes #78038.
2024-01-31 21:21:27 +08:00
Nikita Popov
cb6240d247 [BDCE] Also drop poison-generating metadata
The comment was incorrect: !range also applies to calls, and we
do need to drop it in some cases.
2024-01-31 12:22:58 +01:00
Yingwei Zheng
50e80e06d1
[ValueTracking] Merge cannotBeOrderedLessThanZeroImpl into computeKnownFPClass (#76360)
This patch merges the logic of `cannotBeOrderedLessThanZeroImpl` into
`computeKnownFPClass` to improve the signbit inference.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2024-01-31 18:26:50 +08:00
Nikita Popov
b210cbbd0e [BDCE] Fix clearing of poison-generating flags
If the demanded bits of an instruction are full, we don't have to
recurse to its users, but we may still have to clear flags on the
instruction itself.

Fixes https://github.com/llvm/llvm-project/issues/80113.
2024-01-31 11:24:13 +01:00
Yingwei Zheng
f292f90bc2
[InstCombine] Fold select with signbit idiom into fabs (#76342)
This patch folds:
```
((bitcast X to int) <s 0 ? -X : X) -> fabs(X)
((bitcast X to int) >s -1 ? X : -X) -> fabs(X)
((bitcast X to int) <s 0 ? X : -X) -> -fabs(X)
((bitcast X to int) >s -1 ? -X : X) -> -fabs(X)
```
Alive2: https://alive2.llvm.org/ce/z/rGepow
2024-01-31 15:42:09 +08:00
Yingwei Zheng
f2816ff60c
[InstCombine] Simplify and/or by replacing operands with constants (#77231)
This patch tries to simplify `X | Y` by replacing occurrences of `Y` in
`X` with 0. Similarly, it tries to simplify `X & Y` by replacing
occurrences of `Y` in `X` with -1.

Alive2: https://alive2.llvm.org/ce/z/cNjDTR
Note: As the current implementation is too conservative in the one-use
checks, I cannot remove other existing hard-coded simplifications if
they involves more than two instructions (e.g, `A & ~(A ^ B) --> A &
B`).

Compile-time impact:
http://llvm-compile-time-tracker.com/compare.php?from=a085402ef54379758e6c996dbaedfcb92ad222b5&to=9d655c6685865ffce0ad336fed81228f3071bd03&stat=instructions%3Au

|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.01%|-0.00%|+0.00%|-0.02%|+0.01%|+0.02%|-0.01%|

Fixes #76554.
2024-01-31 14:30:55 +08:00
Yingwei Zheng
a034e65e97
[CVP] Check whether the default case is reachable (#79993)
This patch eliminates unreachable default cases using context-sensitive
range information.
2024-01-31 13:11:10 +08:00
Fangrui Song
9b91c54d9b
[msan] Unpoison indirect outputs for userspace using memset for large operands (#79924)
Modify #77393 to clear shadow memory using `llvm.memset.*` when the size
is large, similar to `shouldUseBZeroPlusStoresToInitialize` in clang for
`-ftrivial-auto-var-init=`. The intrinsic, if lowered to libcall, will
use the msan interceptor.

The instruction selector lowers a `StoreInst` to multiple stores, not
utilizing `memset`. When the size is large (e.g.
`store { [100 x i32] } zeroinitializer, ptr %12, align 1`), the
generated code will be long (and `CodeGenPrepare::optimizeInst` will
even crash for a huge size).

```
// Test stack size
template <class T>
void DoNotOptimize(const T& var) { // deprecated by https://github.com/google/benchmark/pull/1493
  asm volatile("" : "+m"(const_cast<T&>(var)));
}

int main() {
  using LargeArray = std::array<int, 1000000>;
  auto large_stack = []() { DoNotOptimize(LargeArray()); };
  /////// CodeGenPrepare::optimizeInst triggers an assertion failure when creating an integer type with a bit width>2**23
  large_stack();
}
```
2024-01-30 13:45:47 -08:00
Alexey Bataev
285bc69846 [SLP]Fix PR80027: Fix costs processing for minbitwidth types.
Need to switch the types, the destination is first in getCastInstrCost
function.
2024-01-30 10:32:55 -08:00
Alexey Bataev
976374d982 [SLP][NFC]Use MutableArrayRef instead of SmallVectorImpl&, NFC. 2024-01-30 06:21:47 -08:00
ampandey-1995
67f0a6917c
[ASan][AMDGPU] Fix Assertion Failure. (#79795)
Assertion failure `(i >= FTy->getNumParams() || FTy->getParamType(i) ==
Args[i]->getType()) && "Calling a function with a bad signature!"'. The
'llvm.memcpy' intercepted by ASan instrumentation pass is implemented by
it's own __asan_memcpy implementation. The second argument of
llvm.memcpy accepts ptr to addrspace(4), __asan_memcpy also has to
follow ptr to addrspace(4) convention.

---------

Co-authored-by: Amit Pandey <amit.pandey@amd.com>
2024-01-30 12:31:40 +05:30
Nilanjana Basu
c492eb6b28
[LV] Update interleaving count computation when scalar epilogue loop needs to run at least once (#79651)
Update loop interleaving count computation to address loops that require at least one scalar iteration in the epilogue loop. For this case, the available trip count for interleaving the loop is one less.
2024-01-29 13:41:15 -08:00
Alexey Bataev
8d89dd4a58 [SLP]Fix PR79743: Check that all users are demoted before trying to
demote the tree entry.

Need to check if all user nodes are marked for demotion before demoting
the node. Otherwise, some data info might be lost after vectorization.
2024-01-29 10:51:20 -08:00
Antonio Frighetto
20737825c9 [BDCE] Handle multi-use binary ops upon demanded bits
Simplify multi-use `and`/`or`/`xor` when these last
do not affect the demanded bits being considered.

Fixes: https://github.com/llvm/llvm-project/issues/78596.

Proofs: https://alive2.llvm.org/ce/z/EjuWHa.
2024-01-29 19:03:24 +01:00
Florian Hahn
743946e8ef
[VPlan] Replace VPRecipeOrVPValue with VP2VP recipe simplification. (#76090)
Move simplification of VPBlendRecipes from early VPlan construction to
VPlan-to-VPlan based recipe simplification. This simplifies initial
construction.

Note that some in-loop reduction tests are failing at the moment, due to
the reduction predicate being created after the reduction recipe. I will
provide a patch for that soon.

PR: https://github.com/llvm/llvm-project/pull/76090
2024-01-29 09:52:05 +00:00
Kazu Hirata
fc15731183 [Transforms] Use a range-based for loop (NFC) 2024-01-28 18:03:35 -08:00
Florian Hahn
2d0d65b3ba
[VPlan] Create edge masks all cases up front needed.(NFC)
Similarly to how block masks are created up front and later only
retrieved also make sure masks are created in cases where edge masks are
needed, i.e. blend recipes.

Creating block-in masks for all blocks in the loop also ensures edge
masks for all relevant edges have been created. Later, the new
getEdgeMask can be used to look up cached edge masks.

This makes sure edge masks are available in all cases for
https://github.com/llvm/llvm-project/pull/76090.
2024-01-28 21:20:18 +00:00
Florian Hahn
1b37e8087e
[VPlan] use getVPValueOrAddLiveIn in VPlan::duplicate.
Instead of creating live-ins manually, use getOrAddLiveIn which
automatically takes care of adding them to VPLiveInsToFree. Also use it
to create the VPValue for the trip-count. This fixes a leak:
https://lab.llvm.org/buildbot/#/builders/168/builds/18308/steps/10/logs/stdio
2024-01-28 12:39:39 +00:00
Kazu Hirata
687136e7cd [Transforms] Use a range-based for loop (NFC) 2024-01-27 22:20:25 -08:00
Florian Hahn
7c03d5d41d
[VPlan] Use unique_ptr to clean up duplicated plan. 2024-01-27 20:51:55 +00:00
Kazu Hirata
f1cee6b0ba [Transforms] Use a range-based for loop (NFC) 2024-01-27 09:32:19 -08:00
Florian Hahn
ec402a2e53
[VPlan] Implement cloning of VPlans. (#73158)
This patch implements cloning for VPlans and recipes. Cloning is used in
the epilogue vectorization path, to clone the VPlan for the main vector
loop. This means we won't re-use a VPlan when executing the VPlan for
the epilogue vector loop, which in turn will enable us to perform
optimizations based on UF & VF.
2024-01-27 13:30:52 +00:00
Mikhail Gudim
701ec45f2f
[InstCombine] Fix a comment. (#79422) 2024-01-26 23:10:19 -05:00
Craig Topper
55c6d91034
[InstCombine] Preserve nuw/nsw/exact flags when transforming (C shift (A add nuw C1)) --> ((C shift C1) shift A). (#79490)
If we weren't shifting out any non-zero bits or changing the sign before the transform, we
shouldn't be after.

Alive2: https://alive2.llvm.org/ce/z/mB-rWz
2024-01-26 11:33:53 -08:00
Krzysztof Drewniak
63fe80fb18
[SeperateConstOffsetFromGEP] Handle or disjoint flags (#76997)
This commit extends separate-const-offset-from-gep to look at the
newly-added `disjoint` flag on `or` instructions so as to preserve
additional opportunities for optimization.

The tests were pre-committed in #76972.
2024-01-26 09:56:06 -06:00
David Sherwood
962fbafecf
[LoopVectorize] Refine runtime memory check costs when there is an outer loop (#76034)
When we generate runtime memory checks for an inner loop it's
possible that these checks are invariant in the outer loop and
so will get hoisted out. In such cases, the effective cost of
the checks should reduce to reflect the outer loop trip count.

This fixes a 25% performance regression introduced by commit

49b0e6dcc296792b577ae8f0f674e61a0929b99d

when building the SPEC2017 x264 benchmark with PGO, where we
decided the inner loop trip count wasn't high enough to warrant
the (incorrect) high cost of the runtime checks. Also, when
runtime memory checks consist entirely of diff checks these are
likely to be outer loop invariant.
2024-01-26 14:43:48 +00:00
Florian Hahn
731c2049a4
[VPlan] Relax IV user assertion after 0ab539f for epilogue vec.
After 0ab539fd6748adf2f638e10514dd9419597d8863, the canonical IV in the
epilogue vector loop may be used by a trunc. Relax the corresponding
assert.

This should fix some build-bot failures, including
    https://lab.llvm.org/buildbot/#/builders/187/builds/14113
    https://lab.llvm.org/buildbot/#/builders/98/builds/32350
    https://lab.llvm.org/buildbot/#/builders/239/builds/5473
2024-01-26 13:19:25 +00:00
lifengxiang1025
6ccb06a7ab
[MemProf] Fix assert when exists direct recursion (#78264)
Fix assert in `MemProfContextDisambiguation::applyImport` when exists
direct recursion.
2024-01-26 20:55:44 +08:00
Graham Hunter
d4c0171423
[LV] Fix handling of interleaving linear args (#78725)
Currently when interleaving vector calls with linear arguments,
the Part is ignored and all vector calls use the initial value
from the first lane of the current iteration.

Fix this to extract from the correct part of the linear vector.
2024-01-26 11:30:35 +00:00
Florian Hahn
0ab539fd67
[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113)
Add a new recipe to model scalar cast instructions, without relying on
an underlying instruction.

This allows creating scalar casts, without relying on an underlying
instruction (like the current VPReplicateRecipe). The new recipe is 
used to explicitly model both truncating the induction step and the
VPDerivedIVRecipe, thus simplifying both the recipe and code
needed to introduce it.

Truncating VPWidenIntOrFpInductionRecipes should also be modeled using
the new recipe, as follow-up.

PR: https://github.com/llvm/llvm-project/pull/78113
2024-01-26 11:13:05 +00:00
Kazu Hirata
d7ff7c3d18 [Transforms] Use llvm::pred_size and llvm::pred_successors (NFC) 2024-01-25 18:17:20 -08:00
Enna1
e0ade45991
[MemProf][NFC] Rename DefaultShadowGranularity to DefaultMemGranulari… (#79412)
…ty in instrumentation code, be consistent with runtime

In runtime code, the size of memory block mapped to a single shadow
location is called MEM_GRANULARITY.
In instrumentation code, the size of memory block mapped to a single
shadow location is called DefaultShadowGranularity.
Actually, the SHADOW_GRANULARITY is 8 (1 << SHADOW_SCALE), and the
MEM_GRANULARITY is 64.
The wording of DefaultShadowGranularity in instrumentation code is a bit
misleading, this patch renames DefaultShadowGranularity to
DefaultMemGranularity, be consistent with runtime.
2024-01-26 10:04:48 +08:00
Jeremy Morse
19b65a9c02
[DebugInfo][RemoveDIs] Add a DPValue implementation for instcombine sinking (#77930)
In instcombine, when we sink an instruction into a successor block, we try
to clone and salvage all the variable assignments that use that Value. This
is a behaviour that's (IMO) flawed, but there are important use cases where
we want to avoid regressions, thus we're implementing this for the
non-instruction debug-info representation.

This patch refactors the dbg.value sinking code into it's own function, and
installs a parallel implementation for DPValues, the non-instruction
debug-info container. This is mostly identical to the dbg.value
implementation, except that we don't have an easy-to-access ordering
between DPValues, and have to jump through extra hoops to establish one in
the (rare) cases where that ordering is required.

The test added represents a common use-case in LLVM where these behaviours
are important: a loop has been completely optimised away, leaving several
dbg.values in a row referring to an instruction that's going to sink. The
dbg.values should sink in both dbg.value and RemoveDIs mode, and
additionally only the last assignment should sink.
2024-01-25 23:28:56 +00:00
Florian Hahn
d88e3658ce
[SCEVExp] Move logic to replace congruent IV increments to helper (NFC).
Move logic to replace congruent IV increments to helper function, to
reduce the indentation by using early returns. This is in preparation
for a follow-up patch.
2024-01-25 21:40:31 +00:00
Alexey Bataev
92ae2ca12b [SLP][NFC]Improve BottomTopTop reordering of orders for multi-iterations
attempts, NFC.

If several iterations of reodering of orders is required, need to use
different algorithm.
2024-01-25 13:04:01 -08:00
Kazu Hirata
28a2b85602
[DeadStoreElimination] Use SmallSetVector (NFC) (#79410)
The use of SmallSetVector saves 0.58% of heap allocations during the
compilation of a large preprocessed file, namely X86ISelLowering.cpp,
for the X86 target.  During the experiment, the final size of ToCheck
was 8 or less 88% of the time.
2024-01-25 11:01:11 -08:00
Jeremy Morse
a19629dae7 Reapply 215b8f1e252, reverted in c3f7fb1421e
Turns out I was using DbgMarker::getDbgValueRange rather than the helper
utility in Instruction::getDbgValueRange, which checks for null-ness.
Original commit message follows.

[DebugInfo][RemoveDIs] Convert debug-info modes when loading bitcode (#78967)

As part of eliminating debug-intrinsics in LLVM, we'll shortly be
pushing the conversion from "old" dbg.value mode to "new" DPValue mode
out from when the pass manager runs, to when modules are loaded. This
patch adds that conversion process and some (temporary) options to
llvm-lto{,2} to help test it.

Specifically: now whenever we load a bitcode module, consider a flag of
whether to "upgrade" it into the new debug-info mode, and if we're
lazily materializing functions then do that lazily too. Doing this
exposes an error in the IRLinker/materializer handling of DPValues,
where we need to transfer the debug-info format flag correctly, and in
ValueMapper we need to remap the Values that DPValues point at.

I've added some test coverage in the modified tests; these will be
exercised by our llvm-new-debug-iterators buildbot.

This upgrading of debug-info won't be happening for the llvm18 release,
instead we'll turn it on after the branch date, thenbe push the boundary
of where "new" debug-info starts and ends down into the existing
debug-info upgrade path over the course of the next release.
2024-01-25 18:37:13 +00:00
Alexey Bataev
6fe21bc1da [SLP]Fix PR79229: Do not erase extractelement, if it used in
multiregister node.

If the node can be span between several registers and same
extractelement instruction is used in several parts, it may be required
to keep such extractelement instruction to avoid compiler crash.
2024-01-25 06:20:53 -08:00
Jeremy Morse
c3f7fb1421 Revert "[DebugInfo][RemoveDIs] Convert debug-info modes when loading bitcode (#78967)"
This reverts commit 215b8f1e252b4f30cf1b734faa370c0ac4b88659.

Numerous builders exploded from this X_X, for example

  https://lab.llvm.org/buildbot/#/builders/46/builds/62657
2024-01-25 14:18:31 +00:00
John Brawn
a04d4a03f7
[LoopFlatten] Use loop versioning when overflow can't be disproven (#78576)
Implement the TODO in loop flattening to version the loop when we can't
prove that the trip count calculation won't overflow.
2024-01-25 13:57:19 +00:00
Jeremy Morse
215b8f1e25
[DebugInfo][RemoveDIs] Convert debug-info modes when loading bitcode (#78967)
As part of eliminating debug-intrinsics in LLVM, we'll shortly be
pushing the conversion from "old" dbg.value mode to "new" DPValue mode
out from when the pass manager runs, to when modules are loaded. This
patch adds that conversion process and some (temporary) options to
llvm-lto{,2} to help test it.

Specifically: now whenever we load a bitcode module, consider a flag of
whether to "upgrade" it into the new debug-info mode, and if we're
lazily materializing functions then do that lazily too. Doing this
exposes an error in the IRLinker/materializer handling of DPValues,
where we need to transfer the debug-info format flag correctly, and in
ValueMapper we need to remap the Values that DPValues point at.

I've added some test coverage in the modified tests; these will be
exercised by our llvm-new-debug-iterators buildbot.

This upgrading of debug-info won't be happening for the llvm18 release,
instead we'll turn it on after the branch date, thenbe push the boundary
of where "new" debug-info starts and ends down into the existing
debug-info upgrade path over the course of the next release.
2024-01-25 13:27:40 +00:00
Florian Hahn
a04f615291
[LV] Check for innermost loop instead of EnableVPlanNativePath in CM.
Replace EnableVPlanNativePath checks in the cost-model by assertions
that the code is only called for innermost loops. This ensures that the
cost model isn't used in the VPlanNativePath, which is only used for
outer-loop vectorization.

Even with EnableVPlanNativePath, inner loops are processed by the
inner loop vectorization path, not the native path, so checking for
EnableVPlanNativePath may impact decisions for inner loops and can
cause crashes, like in the attached test case.
2024-01-25 12:49:52 +00:00