This exposed a miscompile in GVN, which was fixed by D148129.
-----
After D141386, violation of nonnull, range and align metadata
results in poison rather than immediate undefined behavior,
which means that these are now safe to retain when speculating.
We only need to remove UB-implying metadata like noundef.
This is done by adding a dropUBImplyingAttrsAndMetadata() helper,
which lists the metadata which is known safe to retain on speculation.
Differential Revision: https://reviews.llvm.org/D146629
When reusing a load in a way that requires coercion (i.e. casts or
bit extraction) we currently fail to adjust metadata. Unfortunately,
none of our existing tooling for this is really suitable, because
combineMetadataForCSE() expects both loads to have the same type.
In this case we may work on loads of different types and possibly
offset memory location.
As such, what this patch does is to simply drop all metadata, with
the following exceptions:
* Metadata for which violation is known to always cause UB.
* If the load is !noundef, keep all metadata, as this will turn
poison-generating metadata into UB as well.
This fixes the miscompile that was exposed by D146629.
Differential Revision: https://reviews.llvm.org/D148129
After D138238 introduced the then/else blocks, we should remove UB-implying metadata for the promoted speculative instruction.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D148456
BDCE is not used by the codegen pipeline so we should not need the
legacy PM version of the pass any longer.
Differential Revision: https://reviews.llvm.org/D148335
Remove C APIs for interacting with PassRegistry and pass
initialization. These are legacy PM concepts, and are no longer
relevant for the new pass manager.
Calls to these initialization functions can simply be dropped.
Differential Revision: https://reviews.llvm.org/D145043
The motivation is to make an opportunity to compute and return
expressions after parsing ICmp into a range check (e.g. Length + 1).
Patch by Aleksandr Popov!
Differential Revision: https://reviews.llvm.org/D148205
It seems that existing logic is too strict about latch block exit count.
It is required to be computable, however it is not used in any computations,
and effectively the only thing it is used for is to get the type of computed
exit count.
Sometimes the exit count for latch block is not known, but the loop is still
finite because of other exits, and safe bounds are still computable. In this case,
we miss an opportunity to apply IRCE.
We could instead use a more relaxed version - max symbolic exit count, which,
if exists, is enough to say that the loop is finite, and its type should be good enough.
There is a subtlety with type: we do not support latch count type wider than range
check type. Because of that, we want to have the narrowest type available. So if it
can be computed from latch block immediately, take it. Otherwise, take whatever whole
loop provides and hope that it's type isn't too wide.
Differential Revision: https://reviews.llvm.org/D147910
Reviewed By: danilaml
No need to include CallGraphSCCPass.h from the IPO/Inliner.
Also removed the include of LegacyPassManager.h in a couple of files
that do not really depend on that header file.
Differential Revision: https://reviews.llvm.org/D148083
FullLoopUnroll was performing runtime unrolling in certain cases when
'#pragma unroll' was specified. Patch to fix this by introducing new parameter
to tryToUnrollLoop() to differentiate between LoopUnrollPass and
FullLoopUnrollPass. Based on the discussion here
(https://discourse.llvm.org/t/loop-unroller-fails-to-unroll-loop/69834)
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D148071
GVN load widening was disabled in D24096. This removes various
support code that is no longer relevant.
The way this works nowadays is that we return PartialAlias with
an offset from BasicAA and this gets passed on as a clobber by
MDA. However, PartialAlias will only be returned if the load is
properly nested inside the other load.
This just removes the bulk of the code, but some additional
cleanup can be done here now that we don't need to distinguish
between load and store cases.
Perform dot-product lowering before instruction fusion to avoid crash in
newly added test. Also update lowerDotProduct to properly mark optimized
matmul as fused.
Some dbg.assigns using poison become un-poisoned in SROA. The reason this
happens at all is because dbg.assigns linked to memory intrinsics use poison to
indicate they can't describe the stored value, but the value becomes available
after some optimisations. This needs reworking eventually, but for now we need
to ensure that when it does occur we don't create invalid expressions.
D147312 prevented this occuring when the dbg.assign uses DIArgLists, but that
wasn't a complete fix. We also need to ensure we avoid un-poisoning when the
existing expression uses more than one location operand (DW_OP_arg, n).
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D148020
We have now seen two miscompiles because of widening widenable
conditions at incorrect IR points and thereby changing a branch's loop
invariant condition to a loop-varying one (see PR60234 and PR61963).
This patch adds asserts in common guard utilities that we use for
widening to proactively catch these bugs in future.
Note that these asserts will not fire if we were to sink a widenable
condition from out of a loop into a loop (that's also incorrect for the
same reason as above).
Tested this without the fix for PR60234 (guard widening miscompile) and
confirmed the assert fires.
WARNING: Sometimes, the assert can fire if we failed to hoist the
invariant condition out of the loop. This is a pass-ordering issue or a
limitation in LICM, which would need an investigation. See details in
review.
Differential Revision: https://reviews.llvm.org/D147752
Avoid divergence b/w different kinds of hoisting with reassociation.
Make them all collect general stat NumHoisted and also specific stats
for each particular transform.
In this case the source GEP might not be hoisted even though it
has invariant operands. For now just bail out, but we might need
additional checks for AllowSpeculation in these special-case
reassociation folds.
Reassociate gep (gep ptr, idx1), idx2 to gep (gep ptr, idx2), idx1
if this would make the inner GEP loop invariant and thus hoistable.
This is intended to replace an InstCombine fold that does this (in
04f61fb73d/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (L2006)).
The problem with the InstCombine fold is that LoopInfo is an optional
dependency, so it is not performed reliably.
Differential Revision: https://reviews.llvm.org/D146813
Loop predication's predicateLoopExit pass does two incorrect things:
It sinks the widenable call into the loop, thereby converting an invariant condition to a variant one
It widens the widenable call at a branch thereby converting the branch into a loop-varying one.
The latter is problematic when the branch may have been loop-invariant
and prior optimizations (such as indvars) may have relied on this
fact, and updated the deopt state accordingly.
Now, when we widen this with a loop-varying condition, the deopt state
is no longer correct.
https://github.com/llvm/llvm-project/issues/61963 fixed.
Differential Revision: https://reviews.llvm.org/D147662
It is not actually used for any computations. Its only purpose is to
check that the loop is finite and find out the type of computed exit
count. Refactor code so that we only store this type.
Use clone to keep the metadata, the issue is reported
by aeubanks on D141188.
Reviewed By: nikic, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D146702
Correctly handle the case of splitting an alloca which backs contiguous
distinct variables, where a slice's size equals the size of a backed variable.
We need to ensure that we don't generate fragments expressions with fragments
of the same size as the variable as this is a verifier error.
Prior to this patch a fragment expression would be created in this
situation. e.g. splitting an alloca i64 with two adjacent 32-bit variables into
two 32-bit allocas, the new dbg.assign expressions would contain
(DW_OP_LLVM_fragment, 0, 32) and (DW_OP_LLVM_fragment, 32, 32) even though
those fragments cover each variable entirely.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D147696
Re-write the code to avoid iteration over users of
constants and global values.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D147450
This patch is making sure that we use getTopMostExitingLoop when
finding out which loops to forget, when dealing with
unswitchNontrivialInvariants and unswitchTrivialSwitch. It seems
to at least be needed for unswitchNontrivialInvariants as detected
by the included test case.
Note that unswitchTrivialBranch already used getTopMostExitingLoop.
This was done in commit 4a9cde5a791cd49b96993e6. The commit
message in that commit says "If the patch makes sense, I will also
update those places to a similar approach ...", referring to these
functions mentioned above. As far as I can tell that never happened,
but this is an attempt to finally fix that.
Fixes https://github.com/llvm/llvm-project/issues/61080
Differential Revision: https://reviews.llvm.org/D147058
Limit to dot product lowering to column major matrixes for now. This
simplifies the code and reasoning for upcoming planned improvements.
Support for row-major matrixes can be added later as extension.
createFragmentExpression will fail if it determines that the expression cannot
be split over fragments. Handle this case in SROA. Similarly to D147312 this
should be a rare occurrence as the `dbg.assign` will usually reference the
`Value` being stored without modifying it with a `DIExpression`.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D147431
LICM currently requests optimized use MSSA form. This is wasteful,
because LICM doesn't actually care about most uses, only those of
invariant pointers in loops. Everything else doesn't need to be
optimized.
LICM already uses the clobber walker in most places. This patch
adjusts one place that was using getDefiningAccess() to use it as
well, so we no longer have a dependence on pre-optimized uses.
This change is not NFC in that the fallback on the defining access
when there are too many clobber calls may now fall back to an
unoptimized use. In practice, I've not seen any problems with this
though. If desired, we could also increase licm-mssa-optimization-cap
to a higher value (increasing this from 100 to 200 has no impact on
average compile-time -- but also doesn't appear to have any impact
on LICM quality either).
This makes for a 0.9% geomean compile-time improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D147437
Since memory does not have an intrinsic type, we do not need to require value type matching on stores in order to sink them. To facilitate that, this patch finds stores which are sinkable, but have conflicting types, and bitcasts the ValueOperand so they are easily sinkable into a PHINode. Rather than doing fancy analysis to optimally insert the bitcast, we always insert right before the relevant store in the diamond branch. The assumption is that later passes (e.g. GVN, SimplifyCFG) will clean up bitcasts as needed.
Differential Revision: https://reviews.llvm.org/D147348
After D141386, violation of nonnull, range and align metadata
results in poison rather than immediate undefined behavior,
which means that these are now safe to retain when speculating.
We only need to remove UB-implying metadata like noundef.
This is done by adding a dropUBImplyingAttrsAndMetadata() helper,
which lists the metadata which is known safe to retain on speculation.
Differential Revision: https://reviews.llvm.org/D146629