This exposed a miscompile in GVN, which was fixed by D148129.
-----
After D141386, violation of nonnull, range and align metadata
results in poison rather than immediate undefined behavior,
which means that these are now safe to retain when speculating.
We only need to remove UB-implying metadata like noundef.
This is done by adding a dropUBImplyingAttrsAndMetadata() helper,
which lists the metadata which is known safe to retain on speculation.
Differential Revision: https://reviews.llvm.org/D146629
Removed definitions of vectorizeBasicBlock and VectorizeConfig
(possibly a remnant from the BBVectorize pass that was removed
way back in 2017).
Also reduced amount of include dependencies to Transforms/Vectorize.h.
When reusing a load in a way that requires coercion (i.e. casts or
bit extraction) we currently fail to adjust metadata. Unfortunately,
none of our existing tooling for this is really suitable, because
combineMetadataForCSE() expects both loads to have the same type.
In this case we may work on loads of different types and possibly
offset memory location.
As such, what this patch does is to simply drop all metadata, with
the following exceptions:
* Metadata for which violation is known to always cause UB.
* If the load is !noundef, keep all metadata, as this will turn
poison-generating metadata into UB as well.
This fixes the miscompile that was exposed by D146629.
Differential Revision: https://reviews.llvm.org/D148129
In LoopVectorizationCostModel::isEpilogueVectorizationProfitable we
check to see if the chosen main vector loop VF >= 16. If so, we
decide to create a vector epilogue loop. However, this doesn't
take VScaleForTuning into account because we could be targeting a
CPU where vscale > 1, and hence the runtime VF would be a multiple
of the known minimum value.
This patch multiplies scalable VFs by VScaleForTuning and several
tests have been updated that now produce vector epilogues.
Differential Revision: https://reviews.llvm.org/D147522
The current sinking code doesn't prevent us from sinking a load past an
aliasing store. Skip sinking instructions that may read from memory to
avoid a mis-compile.
See @minimal_bit_widths_with_aliasing_store for an example where 2 loads
are sunk past aliasing stores before this fix.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147259
Before this patch, a VPlan contained 2 mappings for Values -> VPValue:
1) Value2VPValue and 2) VPExternalDefs.
This duplication is unnecessary and there are already cases where
external defs are added to Value2VPValue. This patch replaces all uses
of VPExternalDefs with Value2VPValue.
It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue)
and addVPValue (to addExternalVPValue).
At the moment, this is NFC, but will enable additional simplifications
in D147783.
Depends on D147891.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147892
After D138238 introduced the then/else blocks, we should remove UB-implying metadata for the promoted speculative instruction.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D148456
Update the isCanonical() implementations to check the VPValue step
operand instead of the step in the induction descriptor.
At the moment this is NFC, but it enables further optimizations if the
step is replaced by a constant in D147783.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147891
Limit turns out to be implemented in the exact same way for all calls to
tryToVectorizeSequence(). So this patch removes it and implements it internally
as a lambda function.
Differential Revision: https://reviews.llvm.org/D148382
Make the fold use the information present in the condition for deducing constants i.e:
```
%c = icmp eq i8 %x, 10
%s = select i1 %c, i8 3, i8 2
%r = mul i8 %x, %s
```
If we fold the `mul` into the select, on the true side we insert `10` for `%x` in the `mul`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D146349
Last user of DemandedBitsWrapperPass was the BDCE pass. Since
the legacy PM version of BDCE was removed in an earlier commit, this
patch removes the now unused DemandedBitsWrapperPass.
Differential Revision: https://reviews.llvm.org/D148336
BDCE is not used by the codegen pipeline so we should not need the
legacy PM version of the pass any longer.
Differential Revision: https://reviews.llvm.org/D148335
There's a desire to move away from `undef` in LLVM. Currently we want to
have the `addressspace(3)` variables use `poison` instead.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D147719
Remove C APIs for interacting with PassRegistry and pass
initialization. These are legacy PM concepts, and are no longer
relevant for the new pass manager.
Calls to these initialization functions can simply be dropped.
Differential Revision: https://reviews.llvm.org/D145043
Currently, FunctionAttrs treats landingpads as non-throwing, and
will infer nounwind for functions with landingpads (assuming they
can't unwind in some other way, e.g. via resum). There are two
problems with this:
* Non-cleanup landingpads with catch/filter clauses do not
necessarily catch all exceptions. Unless there are catch ptr null
or filter [0 x ptr] zeroinitializer clauses, we should assume
that we may unwind past this landingpad. This seems like an
outright bug.
* Cleanup landingpads are skipped during phase one unwinding, so
we effectively need to support unwinding past them. Marking these
nounwind is technically correct, but not compatible with how
unwinding works in reality.
Fixes https://github.com/llvm/llvm-project/issues/61945.
Differential Revision: https://reviews.llvm.org/D147694
foldAllocaCmp() needs to fold all comparisons of an alloca at the
same time, to ensure that there is a consistent view of the alloca
address. Currently, it folds "all" comparisons by limiting to the
case where there is only one. This patch switches the algorithm to
instead actually collect and fold all comparisons.
Something we need to be careful about here is that there may be
comparisons where both sides of the icmp are based on the alloca.
Such comparisons are comparing offsets of the alloca, and as such
can be ignored here, but shouldn't be folded to false.
Differential Revision: https://reviews.llvm.org/D144492
I believe !dereferencable violation is immediate undefined behavior,
but this was not explicitly spelled out in LangRef. We already
assume that !dereferenceable is implicitly !noundef and cannot
return poison in isGuaranteedNotToBeUndefOrPoison().
The reason why we made dereferenceable implicitly noundef is that
the purpose of this metadata is to allow speculation, and that
would not be legal on a potential poison pointer.
Differential Revision: https://reviews.llvm.org/D148202
The insertSpills() code will currently skip lifetime intrinsic users
when replacing the alloca with a frame reference. Rather than
leaving behind the dead lifetime intrinsics working on the old
alloca, directly remove them. This makes sure the alloca can be
dropped as well.
I noticed this as a regression when converting tests to opaque
pointers. Without opaque pointers, this code didn't really do
anything, because there would usually be a bitcast in between.
The lifetimes would get rewritten to the frame pointer. With
opaque pointers, this code now triggers and leaves behind users
of the old allocas.
Differential Revision: https://reviews.llvm.org/D148240
The motivation is to make an opportunity to compute and return
expressions after parsing ICmp into a range check (e.g. Length + 1).
Patch by Aleksandr Popov!
Differential Revision: https://reviews.llvm.org/D148205
Building on D142885 and D142589, retire the SinkAfter map from the
recurrence handling code. It is replaced by checking whether it is
possible to sink all users of a recurrence directly in VPlan. This
results in simpler code overall and allows to handle additional cases
(see the improvements in @test_crash).
Depends on D142885.
Depends on D142589.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D142886
Introduced BoUpSLP::ShuffleCostEstimator::gather function as an initial
implementation of the gather/buildvector cost estimation for buildvector
nodes. It will allow to use general codegen infrastructure for better
cost estimation + it improves the cost estimation for the
gathers/buildvectors.
Improved part of D110978.
Differential Revision: https://reviews.llvm.org/D148174
By default these will expand back to cmp/sel, but some targets (X86) has optimized costs for scalar integer min/max patterns which are lower than the default expansion (pre-SSE41 is particularly weak for vector min/max support).
Differential Revision: [SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel
Based off D148215, when expanding a min/max reduction we should be creating min/max intrinsics directly instead of relying on instcombine to fold them back together.
This patch handles integer min/max cases. Hopefully we can add floating point support soon (at least for fastmath/nnan cases) - but we're missing some of the plumbing to pass the correct FMF to the intrinsic at the moment.
Differential Revision: https://reviews.llvm.org/D148221
Similar to the getArithmeticReductionCost / getExtendedReductionCost calls (which really don't need to use std::optional<>).
This will be necessary to correct recognize fast/nnan fmax/fmul reductions which can avoid nan handling - which will allow us to remove the fmax/fmin special case in X86TTIImpl::getMinMaxCost and use getIntrinsicInstrCost like we do for integer reductions (63c3895327839ba5b57f5b99ec9e888abf976ac6).
Differential Revision: https://reviews.llvm.org/D148149
It seems that existing logic is too strict about latch block exit count.
It is required to be computable, however it is not used in any computations,
and effectively the only thing it is used for is to get the type of computed
exit count.
Sometimes the exit count for latch block is not known, but the loop is still
finite because of other exits, and safe bounds are still computable. In this case,
we miss an opportunity to apply IRCE.
We could instead use a more relaxed version - max symbolic exit count, which,
if exists, is enough to say that the loop is finite, and its type should be good enough.
There is a subtlety with type: we do not support latch count type wider than range
check type. Because of that, we want to have the narrowest type available. So if it
can be computed from latch block immediately, take it. Otherwise, take whatever whole
loop provides and hope that it's type isn't too wide.
Differential Revision: https://reviews.llvm.org/D147910
Reviewed By: danilaml
No need to include CallGraphSCCPass.h from the IPO/Inliner.
Also removed the include of LegacyPassManager.h in a couple of files
that do not really depend on that header file.
Differential Revision: https://reviews.llvm.org/D148083
FullLoopUnroll was performing runtime unrolling in certain cases when
'#pragma unroll' was specified. Patch to fix this by introducing new parameter
to tryToUnrollLoop() to differentiate between LoopUnrollPass and
FullLoopUnrollPass. Based on the discussion here
(https://discourse.llvm.org/t/loop-unroller-fails-to-unroll-loop/69834)
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D148071
To calculate the trip count we need to add 1 to the backedge
taken count. If we need to widen the backedge count, it's better
to do the add before the widening if we can guarantee it won't
overflow.
The code here is based on similar code I found in
LoopIdiomRecognize.
This is the vectorizer version of this InstCombine patch D142783.
Looking at the IR diffs, this does look like it gets more cases
than the InstCombine patch.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D147355