38904 Commits

Author SHA1 Message Date
AZero13
ffd2633061
[InstCombine] Fold mul (shr exact (X, N)), 2^N + 1 -> add (X , shr exact (X, N)) (#112407)
Alive2 Proofs:
https://alive2.llvm.org/ce/z/aJnxyp
https://alive2.llvm.org/ce/z/dyeGEv
2025-02-13 14:25:09 +08:00
Thurston Dang
df07121d54
[hwasan][NFCI] Rename ClRandomSkipRate to ClRandomKeepRate (#126990)
The meaning of ClRandomSkipRate was inverted in
https://github.com/llvm/llvm-project/pull/88070 but the variable name
was not changed. This patch fixes it to avoid confusion.

Additionally, it elaborates the flag description to mention the
interaction between the random keep rate and hotness cutoff.
2025-02-12 18:43:00 -08:00
Thurston Dang
51d8255203
[msan] Handle Arm NEON saturating extract and narrow (#125742)
This handles NEON saturating extract and narrow (Intrinsic::aarch64_neon_{sqxtn, sqxtun, uqxtn}) by (ab)using handleShadowOr() to perform the shadow cast. Previously, these were unknown intrinsics handled suboptimally by visitInstruction.

Updates the tests from https://github.com/llvm/llvm-project/pull/125288 and https://github.com/llvm/llvm-project/pull/125140
2025-02-12 16:22:49 -08:00
vporpo
1c207f1b6e
[SandboxVec][DAG] Fix DAG when old interval is mem free (#126983)
This patch fixes a bug in `DependencyGraph::extend()` when the old
interval contains no memory instructions. When this is the case we
should do a full dependency scan of the new interval.
2025-02-12 15:06:30 -08:00
vporpo
31cb807537
[SanbdoxVec][BottomUpVec] Fix diamond shuffle with multiple vector inputs (#126965)
When the operand comes from multiple inputs then we need additional
packing code. When the operands are scalar then we can use a single
InsertElementInst. But when the operands are vectors then we need a
chain of ExtractElementInst and InsertElementInst instructions to insert
the vector value into the destination vector. This is what this patch
implements.
2025-02-12 14:33:05 -08:00
Thurston Dang
0d95631a3a
[msan] Handle llvm.[us]cmp (starship operator) (#125804)
Apply handleShadowOr to llvm.[us]cmp. Previously, llvm.[su]cmp was correctly handled heuristically when each parameter type is the same as the return type (e.g., `call i8 @llvm.ucmp.i8.i8(i8 %x, i8 %y)`) but handled incorrectly by visitInstruction when the return type is different e.g., (`call i8 @llvm.ucmp.i8.i62(i62 %x, i62 %y)`, `call <4 x i8> @llvm.ucmp.v4i8.v4i32(<4 x i32> %x, <4 x i32> %y)`).

Updates the tests from https://github.com/llvm/llvm-project/pull/125790
2025-02-12 13:38:45 -08:00
Thurston Dang
e9e6ba6a5e
[msan] Handle single-parameter Arm NEON vector convert intrinsics (#126136)
This handles the following llvm.aarch64.neon intrinsics, which were suboptimally handled by visitInstruction:
- fcvtas, fcvtau
- fcvtms, fcvtmu
- fcvtns, fcvtnu
- fcvtps, fcvtpu
- fcvtzs, fcvtzu

The old instrumentation checked that the shadow of every element of the input vector was fully initialized, and aborted otherwise. The new instrumentation propagates the shadow: for each element of the output, the shadow is initialized iff the corresponding element of the input is *fully* initialized (since these are floating-point to integer conversions).

Updates the tests from https://github.com/llvm/llvm-project/pull/126095
2025-02-12 13:20:22 -08:00
Vasileios Porpodas
e75e61728e [SandboxVec] Fix warnings introduced by 7a7f9190d03e 2025-02-12 12:43:24 -08:00
vporpo
7a7f9190d0
[SandboxVec][Legality] Fix mask on diamond reuse with shuffle (#126963)
This patch fixes a bug in the creation of shuffle masks when vectorizing
vectors in case of a diamond reuse with shuffle. The mask needs to
enumerate all elements of a vector, not treat the original vector value
as a single element. That is: if vectorizing two <2 x float> vectors
into a <4 x float> the mask needs to have 4 indices, not just 2.
2025-02-12 12:29:09 -08:00
vporpo
6d7a84d72b
[SandboxVec][Scheduler] Fix top of schedule (#126820)
This patch fixes the way the top-of-schedule variable gets set and
updated. Before this patch it used to get updated whenever we scheduled
a bundle, which is wrong, as the top-of-schedule needs to be maintained
across scheduling attempts.

It should get reset only when we clear the schedule or when we destroy
the current schedule and re-schedule.
2025-02-12 11:52:01 -08:00
Harald van Dijk
23209eb1d9 Revert "[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059)"
This reverts commit 3ec9f7494b31f2fe51d5ed0e07adcf4b7199def6.
2025-02-12 17:50:39 +00:00
Harald van Dijk
3ec9f7494b
[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059)
After #124287 updated several functions to return iterators rather than
Instruction *, it was no longer straightforward to pass their result to
DIBuilder. This commit updates DIBuilder methods to accept an
InsertPosition instead, so that they can be called with an iterator
(preferred), or with a deprecation warning an Instruction *, or a
BasicBlock *. This commit also updates the existing calls to the
DIBuilder methods to pass in iterators.
2025-02-12 17:38:59 +00:00
Alexey Bataev
bb3d789dfe [SLP][NFC]Improve dump of the ScheduleData, NFC 2025-02-12 06:51:30 -08:00
Alexey Bataev
e1935a2b15 Revert "[SLP][NFC]Improve dump of the ScheduleData, NFC"
This reverts commit 108e6bca693e5f44d2d17da5a6e06203a0290de7 to fix
error revealed by buildbots https://lab.llvm.org/buildbot/#/builders/159/builds/15888.
2025-02-12 06:34:27 -08:00
Alexey Bataev
108e6bca69 [SLP][NFC]Improve dump of the ScheduleData, NFC 2025-02-12 06:25:04 -08:00
David Sherwood
3e62321ed9
[LoopVectorize] Make collectInLoopReductions more efficient (#126769)
We call collectInLoopReductions in multiple places asking
the same question with exactly the same answer. For
example, this was being called from a loop in
calculateRegisterUsage and this patch hoists the call out
to above the loop. In addition I've changed
collectInLoopReductions so that it bails out if we've
already built up a list.
2025-02-12 14:05:34 +00:00
Jie Fu
a0fbc19ad6 [MemorySanitizer] Silence an unused-variable warning (NFC)
/llvm-project/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:2622:22:
 error: unused variable 'ReturnType' [-Werror,-Wunused-variable]
    FixedVectorType *ReturnType = cast<FixedVectorType>(I.getType());
                     ^
1 error generated.
2025-02-12 11:32:51 +08:00
Thurston Dang
bfbe5319a8
[msan] Add handlePairwiseShadowOrIntrinsic and use it to handle Arm NEON pairwise add (#126008)
This patch adds a function, handlePairwiseShadowOrIntrinsic that ORs
pairs of adjacent shadow values; this is suitable for propagating shadow
for 1- or 2-vector intrinsics that combine adjacent fields. It then
applies handlePairwiseShadowOrIntrinsic to Arm NEON pairwise add:
llvm.aarch64.neon.{addhn, raddhn} (currently incorrectly handled) and
llvm.aarch64.neon.{saddlp, uaddlp} (currently suboptimally handled).

Updates the tests from https://github.com/llvm/llvm-project/pull/125820.
2025-02-11 19:13:18 -08:00
Alexey Bataev
10844fb9b0 [SLP]Fix attempt to build the reorder mask for non-adjusted reuse mask
When building the reorder for non-single use reuse mask, need to check
if the size of the mask is multiple of the number of unique scalars.
  Otherwise, the compiler may crash when trying to reorder nodes.

Fixes #126304
2025-02-11 13:41:25 -08:00
Alireza Torabian
3c74430320
[DependenceAnalysis][NFC] Removing PossiblyLoopIndependent parameter (#124615)
Parameter PossiblyLoopIndependent has lost its intended purpose. This
flag is always set to true in all cases when depends() is called, hence
we want to reconsider the utility of this variable and remove it from
the function signature entirely. This is an NFC patch.
2025-02-11 16:23:28 -05:00
Kazu Hirata
042e860a8a
[Vectorize] Avoid repeated hash lookups (NFC) (#126681) 2025-02-11 09:09:43 -08:00
Florian Hahn
e258bca950
[VPlan] Only skip expansion for SCEVUnknown if it isn't an instruction. (#125235)
Update getOrCreateVPValueForSCEVExpr to only skip expansion of
SCEVUnknown if the underlying value isn't an instruction. Instructions
may be defined in a loop and using them without expansion may break
LCSSA form. SCEVExpander will take care of preserving LCSSA if needed.

We could also try to pass LoopInfo, but there are some users of the
function where it won't be available and main benefit from skipping
expansion is slightly more concise VPlans.

Note that SCEVExpander is now used to expand SCEVUnknown with floats.
Adjust the check in expandCodeFor to only check the types and casts if
the type of the value is different to the requested type. Otherwise we
crash when trying to expand a float and requesting a float type.

Fixes https://github.com/llvm/llvm-project/issues/121518.

PR: https://github.com/llvm/llvm-project/pull/125235
2025-02-11 13:03:12 +01:00
Florian Hahn
3706dfef66
[LV] Forget LCSSA phi with new pred before other SCEV invalidation. (#119897)
`forgetLcssaPhiWithNewPredecessor` performs additional invalidation if
there is an existing SCEV for the phi, but earlier
`forgetBlockAndLoopDispositions` or `forgetLoop` may already invalidate
the SCEV for the phi.

Change the order to first call `forgetLcssaPhiWithNewPredecessor` to
ensure it runs before its SCEV gets invalidated too eagerly.

Fixes https://github.com/llvm/llvm-project/issues/119665.

PR: https://github.com/llvm/llvm-project/pull/119897
2025-02-10 16:29:42 +00:00
Kazu Hirata
2f88672414
[Coroutines] Avoid repeated hash lookups (NFC) (#126466) 2025-02-10 07:50:32 -08:00
Nikita Popov
2d31a12dbe
[DSE] Don't use initializes on byval argument (#126259)
There are two ways we can fix this problem, depending on how the
semantics of byval and initializes should interact:

* Don't infer initializes on byval arguments. initializes on byval
refers to the original caller memory (or having both attributes is made
a verifier error).
* Infer initializes on byval, but don't use it in DSE. initializes on
byval refers to the callee copy. This matches the semantics of readonly
on byval. This is slightly more powerful, for example, we could do a
backend optimization where byval + initializes will allocate the full
size of byval on the stack but not copy over the parts covered by
initializes.

I went with the second variant here, skipping byval + initializes in DSE
(FunctionAttrs already doesn't propagate initializes past byval). I'm
open to going in the other direction though.

Fixes https://github.com/llvm/llvm-project/issues/126181.
2025-02-10 10:34:03 +01:00
Ricardo Jesus
5f84b6edd9
[AArch64] Add MATCH loops to LoopIdiomVectorizePass (#101976)
This patch adds a new loop to LoopIdiomVectorizePass, enabling it to
recognise and vectorise loops such as:
```cpp
template<class InputIt, class ForwardIt>
InputIt find_first_of(InputIt first, InputIt last,
                      ForwardIt s_first, ForwardIt s_last)
{
  for (; first != last; ++first)
    for (ForwardIt it = s_first; it != s_last; ++it)
      if (*first == *it)
        return first;
  return last;
}
```

These loops match the C++ standard library function `std::find_first_of`.
2025-02-10 08:23:34 +00:00
Elvis Wang
2e3729bf40
[LV] Prevent query the computeCost() when VF=1 in emitInvalidCostRemarks(). (#117288)
We should only query the computeCost() when the VF is vector.
2025-02-10 08:40:28 +08:00
Kazu Hirata
df25511f0e
[Coroutines] Avoid repeated hash lookups (NFC) (#126432) 2025-02-09 13:35:12 -08:00
Hassnaa Hamdi
e9a20f77ee
Reland "[LV]: Teach LV to recursively (de)interleave." (#125094)
This patch relands the changes from "[LV]: Teach LV to recursively
(de)interleave.#122989"
    Reason for revert:
- The patch exposed an assert in the vectorizer related to VF difference
between
legacy cost model and VPlan-based cost model because of uncalculated
cost for
VPInstruction which is created by VPlanTransforms as a replacement to
'or disjoint'
       instruction.
VPlanTransforms do that instructions change when there are memory
interleaving and
predicated blocks, but that change didn't cause problems because at most
cases the cost
      difference between legacy/new models is not noticeable.
    - Issue is fixed by #125434

Original patch: https://github.com/llvm/llvm-project/pull/89018
Reviewed-by: paulwalker-arm, Mel-Chen
2025-02-09 19:21:54 +00:00
Florian Hahn
32c4493d5f
[VPlan] Add incoming values for all predecessor to ResumePHI (NFCI).
Follow-up as discussed when using VPInstruction::ResumePhi for all resume
values (#112147). This patch explicitly adds incoming values for each
predecessor in VPlan. This simplifies codegen and allows transformations
adjusting the predecessors of blocks with

NFC modulo incoming block order in phis.
2025-02-09 11:20:20 +00:00
vporpo
69b8cf4f06
[SandboxVec][BottomUpVec] Add cost estimation and tr-accept-or-revert pass (#126325)
The TransactionAcceptOrRevert pass is the final pass in the Sandbox
Vectorizer's default pass pipeline. It's job is to check the cost
before/after vectorization and accept or revert the IR to its original
state.

Since we are now starting the transaction in BottomUpVec, tests that run
a custom pipeline need to accept the transaction. This is done with the
help of the TransactionAlwaysAccept pass (tr-accept).
2025-02-08 08:34:18 -08:00
Florian Hahn
6ff8a06de9
[VPlan] Run recipe removal and simplification after optimizeForVFAndUF. (#125926)
Run recipe simplification and dead recipe removal after VPlan-based
unrolling and optimizeForVFAndUF, to clean up any redundant or dead
recipes introduced by them. Currently this is NFC, as it removes the
corresponding removeDeadRecipes run in optimizeForVFAndUF and no
additional simplifications kick in after unrolling yet. That is changing
with https://github.com/llvm/llvm-project/pull/123655.

Note that with this change, pattern-matching is now applied after
EVL-based recipes have been introduced.

Trying to match VPWidenEVLRecipe when not explicitly requested might
apply a pattern with 2 operands to one with 3 due to the extra EVL
operand and VPWidenEVLRecipe being a subclass of VPWidenRecipe.

To prevent this, update Recipe_match::match to only match
VPWidenEVLRecipe if it is in the requested recipe types (RecipeTy).

PR: https://github.com/llvm/llvm-project/pull/125926
2025-02-08 13:33:46 +00:00
Florian Hahn
ee806646ad
[VPlan] Consistently use hasScalarVFOnly (NFC).
Consistently use hasScalarVFOnly instead of using
hasVF(ElementCount::getFixed(1)). Also add an assert to ensure all cases
are covered by hasScalarVFOnly.
2025-02-08 12:19:25 +00:00
Florian Hahn
16df836a52
[VPlan] Mark hasVF & hasScalableVF as const (NFC). 2025-02-08 11:32:23 +00:00
Kazu Hirata
5901bda5a0
[Vectorize] Avoid repeated hash lookups (NFC) (#126345) 2025-02-08 00:48:51 -08:00
Kazu Hirata
80a4718200
[GVNHoist] Avoid repeated hash lookups (NFC) (#126189) 2025-02-07 07:59:53 -08:00
Florian Hahn
1611059f5d
[VPlan] Compute cost for binary op VPInstruction with underlying values. (#125434)
As exposed by https://github.com/llvm/llvm-project/pull/125094, we are
missing cost computation for some binary VPInstructions we created based
on original IR instructions. Their cost should be considered.

PR: https://github.com/llvm/llvm-project/pull/125434
2025-02-07 15:27:31 +00:00
David Sherwood
3872e55758
[LoopVectorize] Fix build error (#126218)
Fixes issue caused by 1930524bbde3cd26ff527bbdb5e1f937f484edd6

Unused variable UsesMask in LoopVectorize.cpp
2025-02-07 10:16:32 +00:00
David Sherwood
1930524bbd
[LoopVectorize] Fix cost model assert when vectorising calls (#125716)
The legacy and vplan cost models did not agree because
VPWidenCallRecipe::computeCost only calculates the cost of the
call instruction, whereas
LoopVectorizationCostModel::setVectorizedCallDecision in some
cases adds on the cost of a synthesised mask argument. However,
this mask is always 'splat(i1 true)' which should be hoisted out
of the loop during codegen. In order to synchronise the two cost
models I have two options:

1) Also add the cost of the splat to the vplan model, or
2) Remove the cost of the splat from the legacy model.

I chose 2) because I feel this more closely represents what the
final code will look like. There is an argument that we should
take account of such broadcast costs in the preheader when
deciding if it's profitable to vectorise a loop, however there
isn't currently a mechanism to do this. We currently only take
account of the runtime checks when assessing profitability and
what the minimum trip count should be. However, I don't believe
this work needs doing as part of this PR.
2025-02-07 09:36:52 +00:00
James Chesterman
ac158aa13b
[LoopVectorizer] Allow partial reductions to be made in predicated loops (#124268)
Does a select on the input rather than the output. This way the mask has
the same number of lanes as the other operand in the select instruction.
2025-02-07 09:09:10 +00:00
Kazu Hirata
b7feccb31d
[memprof] Dump call site matching information (#125130)
MemProfiler.cpp annotates the IR with the memory profile so that we
can later duplicate context. This patch dumps the entire inline call
stack
for each call site match.
2025-02-06 23:37:10 -08:00
Mel Chen
4d3148d926
[LV][EVL] Fix the check for legality of folding with EVL. (#125678)
The current legality check for folding with EVL has incomplete
verification for VF.
This patch fixes the VF check, ensuring that tail folding with EVL is
enabled only when a scalable VF is available. This allows loops that
prefer tail folding with EVL but cannot use scalable VF vectorization to
still be vectorized using a fixed VF, rather than abandoning
vectorization entirely.
2025-02-07 12:53:10 +08:00
Yingwei Zheng
9cd83d6ea2
[InstCombine] Drop samesign in foldLogOpOfMaskedICmps (#125829)
Alive2: https://alive2.llvm.org/ce/z/6zLAYp

Note: We can also apply this fix to the logic below (`if (Mask &
AMask_NotAllOnes)`), but it seems unreachable.
2025-02-07 11:56:52 +08:00
Joseph Huber
d9500f5032 [OpenMP] Fix the OpenMPOpt pass incorrectly optimizing if definition was missing
Summary:
This code is intended to block transformations if the call isn't
present, however the way it's coded it silently lets it pass if the
definition doesn't exist at all. This previously was always valid since
we included the runtime as one giant blob so everything was always
there, but now that we want to move towards separate ones, it's not
quite correct.
2025-02-06 21:38:36 -06:00
Luke Lau
d0f122b9c5
[LV] Update incoming blocks in VPWidenPHIRecipe in reassociateBlocks (#125481)
This is extracted from #118638

After c7ebe4f we will crash in fixNonInductionPHIs if we use a
VPWidenPHIRecipe with the vector preheader as an incoming block, because
the phi will reference the old non-IRBB vector preheader.

This fixes this by updating VPBlockUtils::reassociateBlocks to update
any VPWidenPHIRecipes's incoming blocks.

This assumes that if the VPWidenPHIRecipe is in a VPRegionBlock, it's in
the entry block, and that we are replacing a VPBasicBlock with another
VPBasicBlock.
2025-02-07 08:50:35 +08:00
vporpo
a0d86b23c0
[SandboxVec][Scheduler] Notify scheduler about instruction creation (#126141)
This patch implements the vectorizer's callback for getting notified
about new instructions being created. This updates the scheduler state,
which may involve removing dependent instructions from the ready list
and update the "scheduled" flag.
Since we need to remove elements from the ready list, this patch also
implements the `remove()` operation.
2025-02-06 15:45:44 -08:00
vporpo
166b2e8837
[SandboxVec][DAG] Update DAG when a new instruction is created (#126124)
The DAG will now receive a callback whenever a new instruction is
created and will update itself accordingly.
2025-02-06 14:12:03 -08:00
Teresa Johnson
1dbfbb5ce6
[MemProf] Stop cloning traversal on single allocation type (#126131)
We were previously checking this after recursing on all callers, but if
we already have a single allocation type there is no need to even look
at any callers. Didn't show a significant improvement overall, but it
does reduce the count of times we enter the identifyClones and do other
checks.
2025-02-06 13:21:02 -08:00
Florian Hahn
049aa179dc
[VPlan] Simplify operand tuple matching in VPlanPatternMatch (NFC).
Remove some indirection when matching recipe and matcher operands by
directly using fold over parameter pack.
2025-02-06 21:00:44 +00:00
David Pagan
a5fc7c3ac1
[clang][OpenMP] New OpenMP 6.0 assumption clause, 'no_openmp_constructs' (#125933)
Add initial parsing/sema support for new assumption clause so clause can
be specified. For now, it's ignored, just like the others.

Added support for 'no_openmp_construct' to release notes.

Testing
- Updated appropriate LIT tests.
- Testing: check-all
2025-02-06 12:41:10 -08:00