Extend hoistBOAssociation smoothly to handle the case when the inner
BinaryOperator is in the RHS of the outer BinaryOperator. This completes
the generalization of hoistBOAssociation, and the only limitation after
this patch is the fact that only Add and Mul are hoisted.
Use IRBuilder when creating the new invariant instruction, so that the
constant-folder has an opportunity to constant-fold the new Instruction
that we desire to create.
MustExec has special logic to determine whether the first loop iteration
will always be executed, by simplifying the IV comparison with the start
value. Currently, this code assumes that the IV is on the LHS of the
comparison, but this is not guaranteed. Make sure it handles the
commuted variant as well.
The changed PhaseOrdering test previously performed peeling to make the
loads dereferenceable -- as a side effect, this also reduced the exit
count by one, avoiding the awkward <= MAX case.
Now we know up-front the the loads are dereferenceable and can be simply
hoisted. As such, we retain the original exit count and now have to
handle it by widening the exit count calculation to i128. This is a
regression, but at least it preserves the vectorization, which was the
original goal. I'm not sure what else can be done about that test.
This limits folding and hoisting associative binary ops to cases where
the intermediate op has at most two uses.
The more uses the intermediate op has, the more new ops we have to
create to potentially reduce the loop's critical path. We keep the limit
to two uses to minimise undesirable increases in code size.
This reapplies a more strict version of
f2ccf80136.
Perform the transformation
"(LV op C1) op C2" ==> "LV op (C1 op C2)"
where op is an associative binary op, LV is a loop variant, and C1 and
C2 are loop invariants, and hoist (C1 op C2) into the preheader.
For now this fold is restricted to ADDs.
Perform the transformation
"(LV op C1) op C2" ==> "LV op (C1 op C2)"
where op is an associative binary op, LV is a loop variant, and C1 and
C2 are loop invariants to hoist.
Similar patterns could be folded (left in comment) but this one seems to
be the most impactful.
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records.
If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.
For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
While reassociating expressions, LICM is required to invalidate SCEV
results, as otherwise subsequent passes in the pipeline that leverage
LICM foldings (e.g. IndVars), may reason on invalid expressions; thus
miscompiling. This is achieved by rewriting the reassociable
instruction from scratch.
Fixes: https://github.com/llvm/llvm-project/issues/91957.
At the moment, Clang is rather liberal in assuming that 0 (and by extension unqualified) is always a safe default. This does not work for targets that actually use a different value for the default / generic AS (for example, the SPIRV that obtains from HIPSPV or SYCL). This patch is a first, fairly safe step towards trying to clear things up by querying a modules' default AS from the target, rather than assuming it's 0, alongside fixing a few places where things break / we encode the 0 == DefaultAS assumption. A bunch of existing tests are extended to check for non-zero default AS usage.
With a fix for build bot failure. I was accessing the type of a deleted
Instruction.
Original message:
The reassociation this is trying to repair can happen for integer types
too.
This patch adds support for integer mul/add to hoistFPAssociation. The
function has been renamed to hoistMulAddAssociation. I've used separate
statistics and limits for integer to allow tuning flexibility.
Similar to d39b4ce3ce8a3c256e01bdec2b140777a332a633
Using "eabi" or "gnueabi" for aarch64 targets is a common mistake and
warned by Clang Driver. We want to avoid them elsewhere as well. Just
use the common "aarch64" without other triple components.
The reassociation this is trying to repair can happen for integer types
too.
This patch adds support for integer mul/add to hoistFPAssociation. The
function has been renamed to hoistMulAddAssociation. I've used separate
statistics and limits for integer to allow tuning flexibility.
This changes the AliasSetTracker to track memory locations instead of
pointers in its alias sets. The motivation for this is outlined in an RFC
posted on LLVM discourse:
https://discourse.llvm.org/t/rfc-dont-merge-memory-locations-in-aliassettracker/73336
In the data structures of the AST implementation, I made the choice to
replace the linked list of `PointerRec` entries (that had to go anyway)
with a simple flat vector of `MemoryLocation` objects, but for the
`AliasSet` objects referenced from a lookup table, I retained the
mechanism of a linked list, reference counting, forwarding, etc. The
data structures could be revised in a follow-up change.
Add the `dead_on_unwind` attribute, which states that the caller will
not read from this argument if the call unwinds. This allows eliding
stores that could otherwise be visible on the unwind path, for example:
```
declare void @may_unwind()
define void @src(ptr noalias dead_on_unwind %out) {
store i32 0, ptr %out
call void @may_unwind()
store i32 1, ptr %out
ret void
}
define void @tgt(ptr noalias dead_on_unwind %out) {
call void @may_unwind()
store i32 1, ptr %out
ret void
}
```
The optimization is not valid without `dead_on_unwind`, because the `i32
0` value might be read if `@may_unwind` unwinds.
This attribute is primarily intended to be used on sret arguments. In
fact, I previously wanted to change the semantics of sret to include
this "no read after unwind" property (see D116998), but based on the
feedback there it is better to keep these attributes orthogonal (sret is
an ABI attribute, dead_on_unwind is an optimization attribute). This is
a reboot of that change with a separate attribute.
AST checks aliasing with MustAlias sets by only checking the
representative pointer (getSomePointer). This is only correct if the
Size and AATags information of that pointer also includes the
Size/AATags of all other pointers in the set.
When we add a new pointer to the AliasSet, we do perform this update
(see the code in AliasSet::addPointer). However, if a pointer already in
the MustAlias set is used with a new size, we currently do not update
the representative pointer, resulting in miscompilations. Fix this by
adding the missing update.
This is a targeted fix using the current representation. There are a
couple of alternatives:
* For MustAlias sets, don't store per-pointer Size/AATags at all. This
would make it clear that there is only one set of common Size/AATags for
all pointers.
* Check against all pointers in the set even for MustAlias. This is what
https://github.com/llvm/llvm-project/pull/65731 proposes to do as part
of a larger change to AST representation.
Fixes https://github.com/llvm/llvm-project/issues/64897.
Because we're storing some extra debug-info information in the iterator
class, we need to insert new LICM-created stores using such iterators.
Switch LICM to storing iterators instead of pointers when it promotes
variables in loops, add a test for the desired behaviour, and enable
RemoveDIs instrumentation on a variety of other LICM tests for good
measure.
(This would appear to be the only pass in LLVM that needs to store
iterators on the heap).
Debugify is extremely useful as a testing and debugging tool, and a good
number of LLVM-IR transform tests use it. We need it to support "new"
non-instruction debug-info to get test coverage, but it's not important
enough to completely convert right now (and it'd be a large
undertaking). Thus: convert to/from dbg.value/DPValue mode on entry and
exit of the pass, which gives us the functionality without any further
work. The cost is compile-time, but again this is only happening during
tests.
Tested by: the large set of debugify tests enabled here. Note the
InstCombine test (cast-mul-select.ll) that hasn't been fully enabled:
this is because there's a debug-info sinking piece of code there that
hasn't been instrumented.
This adds a writable attribute, which in conjunction with
dereferenceable(N) states that a spurious store of N bytes is
introduced on function entry. This implies that this many bytes
are writable without trapping or introducing data races. See
https://llvm.org/docs/Atomics.html#optimization-outside-atomic for
why the second point is important.
This attribute can be added to sret arguments. I believe Rust will
also be able to use it for by-value (moved) arguments. Rust likely
won't be able to use it for &mut arguments (tree borrows does not
appear to allow spurious stores).
In this patch the new attribute is only used by LICM scalar promotion.
However, the actual motivation for this is to fix a correctness issue
in call slot optimization, which needs this attribute to avoid
optimization regressions.
Followup to the discussion on D157499.
Differential Revision: https://reviews.llvm.org/D158081
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:
* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
One of the main user of these kind of coroutines is swift. There yield-once (`retcon.once`) coroutines are used to temporary "expose" pointers to internal fields of various objects creating borrow scopes.
However, in some cases it might be useful also to allow these coroutines to produce a normal result, but there is no convenient way to represent this (as compared to switched-resume kind of coroutines where C++ `co_return`
is transformed to a member / callback call on promise object).
The extension is simple: we allow continuation function to have a non-void result and accept optional extra arguments via a special `llvm.coro.end.result` intrinsic that would essentially forward them as normal results.
isLegalToHoistInto() currently return true for callbr instructions.
That means that a callbr with one successor will be considered a
proper loop preheader, which may result in instructions that use
the callbr return value being hoisted past it.
Fix this by adding callbr to isExceptionTerminator (with a rename
to isSpecialTerminator), which also fixes similar assumptions in
other places.
Fixes https://github.com/llvm/llvm-project/issues/64215.
Differential Revision: https://reviews.llvm.org/D158609
This matches the check done by the Reassociate pass that we're
trying to reverse.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D158042
Consider the following piece of code:
```
void innermost_loop(int i, double d1, double d2, double delta, int n, double cells[n])
{
int j;
const double d1d = d1 * delta;
const double d2d = d2 * delta;
for (j = 0; j <= i; j++)
cells[j] = d1d * cells[j + 1] + d2d * cells[j];
}
```
When compiling at -Ofast level, after the "Reassociate expressions"
pass, this code is transformed into an equivalent of:
```
int j;
for (j = 0; j <= i; j++)
cells[j] = (d1 * cells[j + 1] + d2 * cells[j]) * delta;
```
Effectively, the computation of those loop invariants isn't done
before the loop anymore, we have one extra multiplication on each
loop iteration instead. Sadly, this results in a significant
performance hit.
Similarly, specifically crafted user code will also experience
inability to hoist those invariants.
This patch is solving this issue by adding the ability to undo such
reassociation into the LICM pass. Note that for doing such
transformation this pass requires the same conditions as the
"Reassociate expressions" pass, namely, the involved binary operators
must have the reassociations allowed (e.g. by specifying the `fast`
attribute) and they must have single use only.
Some parts of this patch were suggested by Nikita Popov.
Reviewed By: huntergr, nikic, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D152281
This change allows sinking defs from loop preheader with PHI-use into loop body. Loop sink can now see through PHI-use and select incoming blocks of value being used as candidate sink destination.
It makes loop sink more effective so more LICM can be undone if proven unprofitable with profile info. It addresses the motivating case in D87551, without resorting to profile guided LICM which breaks canonicalization.
This is the 2nd attempt after D152772.
Building the given test case with 'clang -O2 -g' the call to
'getInOrder' is sunk out of the loop by LICM, but the source
location is not dropped.
Reviewed By: aprantl, fdeazeve
Differential Revision: https://reviews.llvm.org/D152691
This change allows sinking defs from loop preheader with PHI-use into loop body. Loop sink can now see through PHI-use and select incoming blocks of value being used as candidate sink destination.
It makes loop sink more effective so more LICM can be undone if proven unprofitable with profile info. It addresses the motivating case in D87551, without resorting to profile guided LICM which breaks canonicalization.
Differential Revision: https://reviews.llvm.org/D152772