33436 Commits

Author SHA1 Message Date
Florian Hahn
4fc190351e
[VPlan] Remove uneeded NeedsVectorIV from VPWidenIntOrFpInduction.
After recent improvements, all instances of
VPWidenIntOrFpInductionRecipe should needs a vector IV and there's no
need for a separate field.
2023-04-17 13:38:00 +01:00
Bjorn Pettersson
3e38187662 Revert "[Passes] Remove legacy PM versions of InstructionNamer and MetaRenamer"
This reverts commit 981ec1faeb508a364cc47c8246b72fc89dd8c1d8.

It broke polly build bots. Polly still uses -instnamer with legacy PM.
2023-04-17 14:24:50 +02:00
Nikita Popov
6f7e5c0f1a Reapply [SimplifyCFG][LICM] Preserve nonnull, range and align metadata when speculating
This exposed a miscompile in GVN, which was fixed by D148129.

-----

After D141386, violation of nonnull, range and align metadata
results in poison rather than immediate undefined behavior,
which means that these are now safe to retain when speculating.
We only need to remove UB-implying metadata like noundef.

This is done by adding a dropUBImplyingAttrsAndMetadata() helper,
which lists the metadata which is known safe to retain on speculation.

Differential Revision: https://reviews.llvm.org/D146629
2023-04-17 14:15:14 +02:00
Bjorn Pettersson
981ec1faeb [Passes] Remove legacy PM versions of InstructionNamer and MetaRenamer 2023-04-17 13:54:20 +02:00
Bjorn Pettersson
21a6890856 [Vectorize] Clean up Transforms/Vectorize.h
Removed definitions of vectorizeBasicBlock and VectorizeConfig
(possibly a remnant from the BBVectorize pass that was removed
way back in 2017).

Also reduced amount of include dependencies to Transforms/Vectorize.h.
2023-04-17 13:54:19 +02:00
Bjorn Pettersson
a20f7efbc5 Remove several no longer needed includes. NFCI
Mostly removing includes of InitializePasses.h and Pass.h in
passes that no longer has support for the legacy PM.
2023-04-17 13:54:19 +02:00
Florian Hahn
02369b75fd
[VPlan] Mark recurrence recipes as not having side-effects.
Add support for FirstOrderRecurrenceSplice and VPFirstOrderRecurrencePHI
recipes to mayHaveSideEffects. They both don't have side-effects.
2023-04-17 12:30:52 +01:00
Nikita Popov
8cdca96690 [GVN] Adjust metadata for coerced load CSE
When reusing a load in a way that requires coercion (i.e. casts or
bit extraction) we currently fail to adjust metadata. Unfortunately,
none of our existing tooling for this is really suitable, because
combineMetadataForCSE() expects both loads to have the same type.
In this case we may work on loads of different types and possibly
offset memory location.

As such, what this patch does is to simply drop all metadata, with
the following exceptions:

* Metadata for which violation is known to always cause UB.
* If the load is !noundef, keep all metadata, as this will turn
  poison-generating metadata into UB as well.

This fixes the miscompile that was exposed by D146629.

Differential Revision: https://reviews.llvm.org/D148129
2023-04-17 12:52:31 +02:00
David Sherwood
69ee653313 [LoopVectorize] Take vscale into account when deciding to create epilogues
In LoopVectorizationCostModel::isEpilogueVectorizationProfitable we
check to see if the chosen main vector loop VF >= 16. If so, we
decide to create a vector epilogue loop. However, this doesn't
take VScaleForTuning into account because we could be targeting a
CPU where vscale > 1, and hence the runtime VF would be a multiple
of the known minimum value.

This patch multiplies scalable VFs by VScaleForTuning and several
tests have been updated that now produce vector epilogues.

Differential Revision: https://reviews.llvm.org/D147522
2023-04-17 10:49:40 +00:00
Florian Hahn
83ab5708d1
[LV] Don't sink scalar instructions that may read from memory.
The current sinking code doesn't prevent us from sinking a load past an
aliasing store. Skip sinking instructions that may read from memory to
avoid a mis-compile.

See @minimal_bit_widths_with_aliasing_store for an example where 2 loads
are sunk past aliasing stores before this fix.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147259
2023-04-17 09:30:25 +01:00
Zain Jaffal
721ecc9d41 [ConstraintElimination] Transfer info from sgt %a, %b to ugt %a, %b if %b > 0
Differential Revision: https://reviews.llvm.org/D148326
2023-04-17 09:27:33 +01:00
Kazu Hirata
7b014a0732 [Scalar] Use range-based for loops (NFC) 2023-04-16 09:05:20 -07:00
Kazu Hirata
c83c4b58d1 [Transforms] Apply fixes from performance-for-range-copy (NFC) 2023-04-16 08:25:28 -07:00
Florian Hahn
668045eb77
[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI).
Before this patch, a VPlan contained 2 mappings for Values -> VPValue:
1) Value2VPValue and 2) VPExternalDefs.

This duplication is unnecessary and there are already cases where
external defs are added to Value2VPValue. This patch replaces all uses
of VPExternalDefs with Value2VPValue.

It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue)
and addVPValue (to addExternalVPValue).

At the moment, this is NFC, but will enable additional simplifications
in D147783.

Depends on D147891.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147892
2023-04-16 15:38:31 +01:00
DianQK
2832d7941f
[SROA] Remove UB-implying metadata when promoting speculative instruction.
After D138238 introduced the then/else blocks, we should remove UB-implying metadata for the promoted speculative instruction.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D148456
2023-04-16 22:35:52 +08:00
Florian Hahn
2db031528e
[VPlan] Check VPValue step in isCanonical (NFCI).
Update the isCanonical() implementations to check the VPValue step
operand instead of the step in the induction descriptor.

At the moment this is NFC, but it enables further optimizations if the
step is replaced by a constant in D147783.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147891
2023-04-16 14:48:03 +01:00
Kazu Hirata
4bac5f8344 Apply fixes from performance-faster-string-find (NFC) 2023-04-16 00:51:27 -07:00
Kazu Hirata
1ca496bd61 Remove redundant initialization of std::optional (NFC) 2023-04-16 00:40:05 -07:00
Kazu Hirata
804467de94 Use isNegative (NFC) 2023-04-15 14:26:24 -07:00
Kazu Hirata
d775fc390d [InstCombine] Generate better code for std::bit_floor from libstdc++
Without this patch, std::bit_floor<uint32_t> in libstdc++ is compiled
as:

  %eq0 = icmp eq i32 %x, 0
  %lshr = lshr i32 %x, 1
  %ctlz = tail call i32 @llvm.ctlz.i32(i32 %lshr, i1 false)
  %sub = sub i32 32, %ctlz
  %shl = shl i32 1, %sub
  %sel = select i1 %eq0, i32 0, i32 %shl

With this patch:

  %eq0 = icmp eq i32 %x, 0
  %ctlz = call i32 @llvm.ctlz.i32(i32 %x, i1 false)
  %lshr = lshr i32 -2147483648, %1
  %sel = select i1 %eq0, i32 0, i32 %lshr

This patch recognizes the specific pattern emitted for std::bit_floor
in libstdc++.

https://alive2.llvm.org/ce/z/piMdFX

This patch fixes:

https://github.com/llvm/llvm-project/issues/61183

Differential Revision: https://reviews.llvm.org/D145890
2023-04-15 11:32:33 -07:00
Vasileios Porpodas
7e67a9473d [SLP][NFC] Remove Limit from tryToVectorizeSequence() arguments.
Limit turns out to be implemented in the exact same way for all calls to
tryToVectorizeSequence(). So this patch removes it and implements it internally
as a lambda function.

Differential Revision: https://reviews.llvm.org/D148382
2023-04-14 14:58:57 -07:00
Noah Goldstein
82f0827613 [InstCombine] Make FoldOpIntoSelect handle non-constants and use condition to deduce constants.
Make the fold use the information present in the condition for deducing constants i.e:
```
%c = icmp eq i8 %x, 10
%s = select i1 %c, i8 3, i8 2
%r = mul i8 %x, %s
```

If we fold the `mul` into the select, on the true side we insert `10` for `%x` in the `mul`.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D146349
2023-04-14 13:14:32 -05:00
Bjorn Pettersson
0b911a3dc3 [passes] Remove the legacy PM version of IRCE
Differential Revision: https://reviews.llvm.org/D148338
2023-04-14 18:56:20 +02:00
Bjorn Pettersson
b74e89c0d4 [passes] Remove the legacy PM version of AlignmentFromAssumptions
Differential Revision: https://reviews.llvm.org/D148337
2023-04-14 18:56:20 +02:00
Bjorn Pettersson
40c60c025c [Passes] Remove the legacy DemandedBitsWrapperPass
Last user of DemandedBitsWrapperPass was the BDCE pass. Since
the legacy PM version of BDCE was removed in an earlier commit, this
patch removes the now unused DemandedBitsWrapperPass.

Differential Revision: https://reviews.llvm.org/D148336
2023-04-14 18:56:20 +02:00
Bjorn Pettersson
fb93f98ffa [Passes] Remove legacy PM version of BDCE (aka BitTrackingDCEPass)
BDCE is not used by the codegen pipeline so we should not need the
legacy PM version of the pass any longer.

Differential Revision: https://reviews.llvm.org/D148335
2023-04-14 18:56:20 +02:00
Joseph Huber
46ee1021d9 [OpenMP] Replace HeapToShared's initial value with poison
There's a desire to move away from `undef` in LLVM. Currently we want to
have the `addressspace(3)` variables use `poison` instead.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D147719
2023-04-14 09:39:32 -05:00
Florian Hahn
98e50881e9
[Matrix] Refine cost estimate for dot-product.
Adjust lowerDotProduct cost estimate to include the cost benefits of:
 * emitting a wide load
 * emitting a wide multiply.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D147330
2023-04-14 11:35:01 +01:00
Nikita Popov
62ef97e063 [llvm-c] Remove PassRegistry and initialization APIs
Remove C APIs for interacting with PassRegistry and pass
initialization. These are legacy PM concepts, and are no longer
relevant for the new pass manager.

Calls to these initialization functions can simply be dropped.

Differential Revision: https://reviews.llvm.org/D145043
2023-04-14 12:12:48 +02:00
Nikita Popov
9fe78db4cd [FunctionAttrs] Fix nounwind inference for landingpads
Currently, FunctionAttrs treats landingpads as non-throwing, and
will infer nounwind for functions with landingpads (assuming they
can't unwind in some other way, e.g. via resum). There are two
problems with this:

* Non-cleanup landingpads with catch/filter clauses do not
  necessarily catch all exceptions. Unless there are catch ptr null
  or filter [0 x ptr] zeroinitializer clauses, we should assume
  that we may unwind past this landingpad. This seems like an
  outright bug.
* Cleanup landingpads are skipped during phase one unwinding, so
  we effectively need to support unwinding past them. Marking these
  nounwind is technically correct, but not compatible with how
  unwinding works in reality.

Fixes https://github.com/llvm/llvm-project/issues/61945.

Differential Revision: https://reviews.llvm.org/D147694
2023-04-14 11:46:00 +02:00
Nikita Popov
a759745169 [InstCombine] Support multiple comparisons in foldAllocaCmp()
foldAllocaCmp() needs to fold all comparisons of an alloca at the
same time, to ensure that there is a consistent view of the alloca
address. Currently, it folds "all" comparisons by limiting to the
case where there is only one. This patch switches the algorithm to
instead actually collect and fold all comparisons.

Something we need to be careful about here is that there may be
comparisons where both sides of the icmp are based on the alloca.
Such comparisons are comparing offsets of the alloca, and as such
can be ignored here, but shouldn't be folded to false.

Differential Revision: https://reviews.llvm.org/D144492
2023-04-14 11:32:58 +02:00
Nikita Popov
e4251fc6bb [LangRef][Local] dereferenceable metadata violation is UB
I believe !dereferencable violation is immediate undefined behavior,
but this was not explicitly spelled out in LangRef. We already
assume that !dereferenceable is implicitly !noundef and cannot
return poison in isGuaranteedNotToBeUndefOrPoison().

The reason why we made dereferenceable implicitly noundef is that
the purpose of this metadata is to allow speculation, and that
would not be legal on a potential poison pointer.

Differential Revision: https://reviews.llvm.org/D148202
2023-04-14 10:54:01 +02:00
Nikita Popov
c508e93327 [InstSimplify] Remove unused ORE argument (NFC) 2023-04-14 10:38:32 +02:00
Nikita Popov
243e62b9d8 [Coroutines] Directly remove unnecessary lifetime intrinsics
The insertSpills() code will currently skip lifetime intrinsic users
when replacing the alloca with a frame reference. Rather than
leaving behind the dead lifetime intrinsics working on the old
alloca, directly remove them. This makes sure the alloca can be
dropped as well.

I noticed this as a regression when converting tests to opaque
pointers. Without opaque pointers, this code didn't really do
anything, because there would usually be a bitcast in between.
The lifetimes would get rewritten to the frame pointer. With
opaque pointers, this code now triggers and leaves behind users
of the old allocas.

Differential Revision: https://reviews.llvm.org/D148240
2023-04-14 10:22:30 +02:00
Max Kazantsev
a39b807d41 [IRCE][NFC] Refactor parseRangeCheckICmp to compute SCEVs instead of Values
The motivation is to make an opportunity to compute and return
expressions after parsing ICmp into a range check (e.g. Length + 1).

Patch by Aleksandr Popov!

Differential Revision: https://reviews.llvm.org/D148205
2023-04-14 12:58:51 +07:00
Florian Hahn
7fc0b3049d
[VPlan] Switch to checking sinking legality for recurrences in VPlan.
Building on D142885 and D142589, retire the SinkAfter map from the
recurrence handling code. It is replaced by checking whether it is
possible to sink all users of a recurrence directly in VPlan. This
results in simpler code overall and allows to handle additional cases
(see the improvements in @test_crash).

Depends on D142885.
Depends on D142589.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142886
2023-04-13 22:00:52 +01:00
Craig Topper
8bba57b1f1 [LoopIdiomRecognize] Remove NUW flag from SCEV in getTripCount.
Based on the conversation in D147355.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D148170
2023-04-13 11:58:10 -07:00
Alexey Bataev
a4eff2b56c [SLP][NFC]Remove extra semicolons after function definitions, NFC 2023-04-13 11:33:25 -07:00
Florian Hahn
e6ab86a887
[Matrix] Fix IsSupported check in lowerDotProduct.
The check incorrectly checks the RHS while LHS is transformed later.
Update to check LHS, which fixes a crash in the newly added test cases.
2023-04-13 19:00:30 +01:00
Alexey Bataev
f82eb7e066 [SLP]Introduce gather cost estimation function.
Introduced BoUpSLP::ShuffleCostEstimator::gather function as an initial
implementation of the gather/buildvector cost estimation for buildvector
nodes. It will allow to use general codegen infrastructure for better
cost estimation + it improves the cost estimation for the
gathers/buildvectors.

Improved part of D110978.

Differential Revision: https://reviews.llvm.org/D148174
2023-04-13 10:16:00 -07:00
Simon Pilgrim
b3480d5ede [SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel
By default these will expand back to cmp/sel, but some targets (X86) has optimized costs for scalar integer min/max patterns which are lower than the default expansion (pre-SSE41 is particularly weak for vector min/max support).

Differential Revision: [SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel
2023-04-13 17:00:39 +01:00
Simon Pilgrim
aa754f7e0f [IR] llvm::createMinMaxOp - create integer min/max intrinsics instead of icmp/sel
Based off D148215, when expanding a min/max reduction we should be creating min/max intrinsics directly instead of relying on instcombine to fold them back together.

This patch handles integer min/max cases. Hopefully we can add floating point support soon (at least for fastmath/nnan cases) - but we're missing some of the plumbing to pass the correct FMF to the intrinsic at the moment.

Differential Revision: https://reviews.llvm.org/D148221
2023-04-13 16:40:43 +01:00
Simon Pilgrim
9e30b87afb [TTI] getMinMaxReductionCost - add FastMathFlag argument
Similar to the getArithmeticReductionCost / getExtendedReductionCost calls (which really don't need to use std::optional<>).

This will be necessary to correct recognize fast/nnan fmax/fmul reductions which can avoid nan handling - which will allow us to remove the fmax/fmin special case in X86TTIImpl::getMinMaxCost and use getIntrinsicInstrCost like we do for integer reductions (63c3895327839ba5b57f5b99ec9e888abf976ac6).

Differential Revision: https://reviews.llvm.org/D148149
2023-04-13 10:42:42 +01:00
Jun Zhang
e3175f7f1b
[InstCombine] icmp(X | OrC, C) --> icmp(X, 0)
We can eliminate the or operation based on the predicate and the
relation between OrC and C.

sge: X | OrC s>= C --> X s>= 0 iff OrC s>= C s>= 0
sgt: X | OrC s>  C --> X s>= 0 iff OrC s>  C s>= 0
sle: X | OrC s<= C --> X s<  0 iff OrC s>  C s>= 0
slt: X | OrC s<  C --> X s<  0 iff OrC s>= C s>= 0

Alive2 links:
sge: https://alive2.llvm.org/ce/z/W-6FHE
sgt: https://alive2.llvm.org/ce/z/TKK2yJ
sle: https://alive2.llvm.org/ce/z/vURQGM
slt: https://alive2.llvm.org/ce/z/JAsVfw

Related issue: https://github.com/llvm/llvm-project/issues/61538

Signed-off-by: Jun Zhang <jun@junz.org>

Differential Revision: https://reviews.llvm.org/D147597
2023-04-13 17:26:24 +08:00
Max Kazantsev
2124505fe4 [IRCE] Relax restrictions on IRCE's latch exit count
It seems that existing logic is too strict about latch block exit count.
It is required to be computable, however it is not used in any computations,
and effectively the only thing it is used for is to get the type of computed
exit count.

Sometimes the exit count for latch block is not known, but the loop is still
finite because of other exits, and safe bounds are still computable. In this case,
we miss an opportunity to apply IRCE.
We could instead use a more relaxed version - max symbolic exit count, which,
if exists, is enough to say that the loop is finite, and its type should be good enough.

There is a subtlety with type: we do not support latch count type wider than range
check type. Because of that, we want to have the narrowest type available. So if it
can be computed from latch block immediately, take it. Otherwise, take whatever whole
loop provides and hope that it's type isn't too wide.

Differential Revision: https://reviews.llvm.org/D147910
Reviewed By: danilaml
2023-04-13 16:00:19 +07:00
Bjorn Pettersson
410775ecfd [Transforms][LTO] Remove some redundant includes. NFC
No need to include CallGraphSCCPass.h from the IPO/Inliner.

Also removed the include of LegacyPassManager.h in a couple of files
that do not really depend on that header file.

Differential Revision: https://reviews.llvm.org/D148083
2023-04-13 10:12:00 +02:00
Max Kazantsev
246f8d4be5 [NFC][IRCE] Remove meaningless local variable 2023-04-13 13:04:45 +07:00
Max Kazantsev
d093d34c33 [IRCE][NFC] Remove unused variable IsSigned
Patch by Aleksandr Popov!

Differential Revision: https://reviews.llvm.org/D148113
2023-04-13 12:08:46 +07:00
Yashwant Singh
aea2a14736 [LoopUnroll] Prevent LoopFullUnrollPass to perform partial/runtime unrolling
FullLoopUnroll was performing runtime unrolling in certain cases when
'#pragma unroll' was specified. Patch to fix this by introducing new parameter
to tryToUnrollLoop() to differentiate between LoopUnrollPass and
FullLoopUnrollPass. Based on the discussion here
(https://discourse.llvm.org/t/loop-unroller-fails-to-unroll-loop/69834)

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D148071
2023-04-13 10:21:24 +05:30
Craig Topper
4b47d875a1 [LV] Optimize trip count SCEV.
To calculate the trip count we need to add 1 to the backedge
taken count. If we need to widen the backedge count, it's better
to do the add before the widening if we can guarantee it won't
overflow.

The code here is based on similar code I found in
LoopIdiomRecognize.

This is the vectorizer version of this InstCombine patch D142783.
Looking at the IR diffs, this does look like it gets more cases
than the InstCombine patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D147355
2023-04-12 16:17:58 -07:00