The DominatorTree version is marked for deprecation, so we use the
DomTreeUpdater version. We also update sinkRegion() to iterate over
basic blocks instead of DomTreeNodes. The loop body calls
SplitBlockPredecessors. The DTU version calls
DomTreeUpdater::apply_updates(), which may call DominatorTree::reset().
This invalidates the worklist of DomTreeNodes to iterate over.
Extend hoistBOAssociation smoothly to handle the case when the inner
BinaryOperator is in the RHS of the outer BinaryOperator. This completes
the generalization of hoistBOAssociation, and the only limitation after
this patch is the fact that only Add and Mul are hoisted.
Use IRBuilder when creating the new invariant instruction, so that the
constant-folder has an opportunity to constant-fold the new Instruction
that we desire to create.
These are the final few places in LLVM where we use instruction pointers
to identify the position that we're inserting something. We're trying to
get away from that with a view to deprecating those methods, thus use
iterators in all these places. I believe they're all debug-info safe.
The sketchiest part is the ExtractValueInst copy constructor, where we
cast nullptr to a BasicBlock pointer, so that we take the non-default
insert-into-no-block path for instruction insertion, instead of the
default nullptr-instruction path for UnaryInstruction. Such a hack is
necessary until we get rid of the instruction constructor entirely.
This limits folding and hoisting associative binary ops to cases where
the intermediate op has at most two uses.
The more uses the intermediate op has, the more new ops we have to
create to potentially reduce the loop's critical path. We keep the limit
to two uses to minimise undesirable increases in code size.
This reapplies a more strict version of
f2ccf80136.
Perform the transformation
"(LV op C1) op C2" ==> "LV op (C1 op C2)"
where op is an associative binary op, LV is a loop variant, and C1 and
C2 are loop invariants, and hoist (C1 op C2) into the preheader.
For now this fold is restricted to ADDs.
Perform the transformation
"(LV op C1) op C2" ==> "LV op (C1 op C2)"
where op is an associative binary op, LV is a loop variant, and C1 and
C2 are loop invariants to hoist.
Similar patterns could be folded (left in comment) but this one seems to
be the most impactful.
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
While reassociating expressions, LICM is required to invalidate SCEV
results, as otherwise subsequent passes in the pipeline that leverage
LICM foldings (e.g. IndVars), may reason on invalid expressions; thus
miscompiling. This is achieved by rewriting the reassociable
instruction from scratch.
Fixes: https://github.com/llvm/llvm-project/issues/91957.
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.
There are two general flavours of update:
* Almost all call-sites just call getIterator on an instruction
* Several make use of an existing iterator (scenarios where the code is
actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.
Noteworthy changes:
* FindInsertedValue now takes an optional iterator rather than an
instruction pointer, as we need to always insert with iterators,
* I've added a few iterator-taking versions of some value-tracking and
DomTree methods -- they just unwrap the iterator. These are purely
convenience methods to avoid extra syntax in some passes.
* A few calls to getNextNode become std::next instead (to keep in the
theme of using iterators for positions),
* SeparateConstOffsetFromGEP has it's insertion-position field changed.
Noteworthy because it's not a purely localised spelling change.
All this should be NFC.
With a fix for build bot failure. I was accessing the type of a deleted
Instruction.
Original message:
The reassociation this is trying to repair can happen for integer types
too.
This patch adds support for integer mul/add to hoistFPAssociation. The
function has been renamed to hoistMulAddAssociation. I've used separate
statistics and limits for integer to allow tuning flexibility.
The reassociation this is trying to repair can happen for integer types
too.
This patch adds support for integer mul/add to hoistFPAssociation. The
function has been renamed to hoistMulAddAssociation. I've used separate
statistics and limits for integer to allow tuning flexibility.
This changes the AliasSetTracker to track memory locations instead of
pointers in its alias sets. The motivation for this is outlined in an RFC
posted on LLVM discourse:
https://discourse.llvm.org/t/rfc-dont-merge-memory-locations-in-aliassettracker/73336
In the data structures of the AST implementation, I made the choice to
replace the linked list of `PointerRec` entries (that had to go anyway)
with a simple flat vector of `MemoryLocation` objects, but for the
`AliasSet` objects referenced from a lookup table, I retained the
mechanism of a linked list, reference counting, forwarding, etc. The
data structures could be revised in a follow-up change.
As part of RemoveDIs, we need instruction insertion to be done with
iterators rather than instruction pointers, so that we can communicate
some debug-info facts about the position. This patch is an entirely
mechanical replacement of Instruction * with BasicBlock::iterator, plus
using insertBefore to insert some instructions because we don't have
iterator-taking constructors yet.
Sadly it's not NFC because it causes dbg.value intrinsics / their
DPValue equivalents to shift location.
Because we're storing some extra debug-info information in the iterator
class, we need to insert new LICM-created stores using such iterators.
Switch LICM to storing iterators instead of pointers when it promotes
variables in loops, add a test for the desired behaviour, and enable
RemoveDIs instrumentation on a variety of other LICM tests for good
measure.
(This would appear to be the only pass in LLVM that needs to store
iterators on the heap).
This adds a writable attribute, which in conjunction with
dereferenceable(N) states that a spurious store of N bytes is
introduced on function entry. This implies that this many bytes
are writable without trapping or introducing data races. See
https://llvm.org/docs/Atomics.html#optimization-outside-atomic for
why the second point is important.
This attribute can be added to sret arguments. I believe Rust will
also be able to use it for by-value (moved) arguments. Rust likely
won't be able to use it for &mut arguments (tree borrows does not
appear to allow spurious stores).
In this patch the new attribute is only used by LICM scalar promotion.
However, the actual motivation for this is to fix a correctness issue
in call slot optimization, which needs this attribute to avoid
optimization regressions.
Followup to the discussion on D157499.
Differential Revision: https://reviews.llvm.org/D158081
Since we no longer support typed pointers in LLVM IR, the PtrASXTy
in isLoadInvariantInLoop was set to be equal to Addr->getType() (an
opaque ptr in the same address space). That made the loop looking
through bitcasts redundant.
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.
At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152537
This matches the check done by the Reassociate pass that we're
trying to reverse.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D158042
Consider the following piece of code:
```
void innermost_loop(int i, double d1, double d2, double delta, int n, double cells[n])
{
int j;
const double d1d = d1 * delta;
const double d2d = d2 * delta;
for (j = 0; j <= i; j++)
cells[j] = d1d * cells[j + 1] + d2d * cells[j];
}
```
When compiling at -Ofast level, after the "Reassociate expressions"
pass, this code is transformed into an equivalent of:
```
int j;
for (j = 0; j <= i; j++)
cells[j] = (d1 * cells[j + 1] + d2 * cells[j]) * delta;
```
Effectively, the computation of those loop invariants isn't done
before the loop anymore, we have one extra multiplication on each
loop iteration instead. Sadly, this results in a significant
performance hit.
Similarly, specifically crafted user code will also experience
inability to hoist those invariants.
This patch is solving this issue by adding the ability to undo such
reassociation into the LICM pass. Note that for doing such
transformation this pass requires the same conditions as the
"Reassociate expressions" pass, namely, the involved binary operators
must have the reassociations allowed (e.g. by specifying the `fast`
attribute) and they must have single use only.
Some parts of this patch were suggested by Nikita Popov.
Reviewed By: huntergr, nikic, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D152281
Building the given test case with 'clang -O2 -g' the call to
'getInOrder' is sunk out of the loop by LICM, but the source
location is not dropped.
Reviewed By: aprantl, fdeazeve
Differential Revision: https://reviews.llvm.org/D152691
The last use was removed by:
commit d623b2f95fd559901f008a0588dddd0949a8db01
Author: Arthur Eubanks <aeubanks@google.com>
Date: Fri Mar 10 17:24:19 2023 -0800
LICM could reassociate mixed variant/invariant comparison/arithmetic operations
and hoist invariant parts out of loop if it can prove that they can be computed
without overflow. Motivating example here:
```
INV1 - VAR1 < INV2
```
can be turned into
```
VAR > INV1 - INV2
```
if we can prove no-signed-overflow here. Then `INV1 - INV2` can be computed
out of loop, so we save one arithmetic operation in-loop.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D148001
This patch allows LICM to reassociate and hoist following expressions:
```
loop:
%sum = add nsw %iv, %C1
%cmp = icmp <signed pred> %sum, C2
```
where `C1` and `C2` are loop invariants. The reassociated version looks like
```
preheader:
%inv_sum = C2 - C1
...
loop:
%cmp = icmp <signed pred> %iv, %inv_sum
```
In order to prove legality, we need both initial addition and the newly created subtraction
to happen without overflow.
Differential Revision: https://reviews.llvm.org/D149132
Reviewed By: skatkov
This commit removes constness from DILocation::getMergedLocation and
fixes all its users accordingly.
Having constness on the parameters forced the return type to be const
as well, which does force usage of `const_cast` when the location needs
to be used in metadata nodes.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D149942
D37076 makes LICM duplicate instructions into exit blocks if the
instruction is free. For GEPs, the motivation appears to be that
this allows the GEP to be folded into addressing modes, while
non-foldable users outside the loop might prevent this. TBH I don't
think LICM is the place to do this (why doesn't CGP apply this
heuristic itself?) but at least I understand the motivation.
However, the transform is also applied to all other "free"
instructions, which are just that (removed during lowering and not
"folded" in some way). For such instructions, this transform seems
somewhere between useless, counter-productive (undoing CSE/GVN) and
actively incorrect. For example, this transform can duplicate freeze
instructions, which is illegal.
This patch limits the transform to just foldable GEPs, though we
might want to drop it from LICM entirely as a followup.
This is a small compile-time improvement, because querying TTI cost
model for every single instruction is expensive.
Differential Revision: https://reviews.llvm.org/D149136
This was introduced in 030f02021b6359ec5641622cf1aa63d873ecf55a as
an alleged compile-time optimization. In reality, trying to constant
fold instructions is more expensive than just hoisting them. In a
standard pipeline, LICM tends to run either after a run of
LoopInstSimplify or InstCombine, so LICM doesn't really see constant
foldable instructions in the first place, and the attempted fold
is futile.
This makes for a very minor compile-time improvement.
Differential Revision: https://reviews.llvm.org/D149134
As we are moving the instruction without changing its value, it
is sufficient to only invalidate the loop/block dispositions.
This is the same we do in LoopSink.
This exposed another miscompile in GVN, which was fixed by
20e9b31f88149a1d5ef78c0be50051e345098e41.
-----
After D141386, violation of nonnull, range and align metadata
results in poison rather than immediate undefined behavior,
which means that these are now safe to retain when speculating.
We only need to remove UB-implying metadata like noundef.
This is done by adding a dropUBImplyingAttrsAndMetadata() helper,
which lists the metadata which is known safe to retain on speculation.
Differential Revision: https://reviews.llvm.org/D146629