It can happen that the call is originally created as a MemoryDef,
and then later transforms show it is actually read-only and could
be a MemoryUse -- however, this is not guaranteed to be reflected
in MSSA.
The profiling - related metadata information for the hoisted conditional branch should be copied from the original branch, not from the current terminator of the block it's hoisted to.
The patch adds a way to disable the fix just so we can do an ablation test, after which the flag will be removed. The same flag will be reused for other similar fixes.
(This was identified through `profcheck` (see Issue #147390), and this PR addresses most of the test failures (when running under profcheck) under `Transforms/LICM`.)
A coroutine function may be split to ramp function and resume function,
and they have different stack frames, so a pointer to stack objects may
have different addresses depending on where it is used, so it's not a
loop invariant.
It temporarily fixes https://github.com/llvm/llvm-project/issues/149604.
LICM tries to reassociate GEPs in order to hoist an invariant GEP.
Currently, it also does this in the case where the GEP has a constant
offset.
This is usually undesirable. From a back-end perspective, constant GEPs
are usually free because they can be folded into addressing modes, so
this just increases register pressume. From a middle-end perspective,
keeping constant offsets last in the chain makes it easier to analyze
the relationship between multiple GEPs on the same base, especially
after CSE.
The worst that can happen here is if we start with something like
```
loop {
p + 4*x
p + 4*x + 1
p + 4*x + 2
p + 4*x + 3
}
```
And LICM converts it into:
```
p.1 = p + 1
p.2 = p + 2
p.3 = p + 3
loop {
p + 4*x
p.1 + 4*x
p.2 + 4*x
p.3 + 4*x
}
```
Which is much worse than leaving it for CSE to convert to:
```
loop {
p2 = p + 4*x
p2 + 1
p2 + 2
p2 + 3
}
```
Adds support for hoisting `writeonly` calls in LICM.
This patch adds a missing optimization that allows hoisting of
`writeonly` function calls out of loops when it is safe to do so.
Previously, such calls were conservatively retained inside the loop
body, and the redundant calls were only reduced through unrolling,
relying on target-dependent heuristics.
Closes#143267
Testing:
- Modified previously negative tests for hoisting writeonly calls to be
instead positive
- Added test cases for hoisting of two writeonly calls where the
pointers do/do not alias
- Added a test case for not argmemonly writeonly calls.
The code checking whether a readonly call is safe to hoist is
currently limited to only argmemonly calls. However, the actual
implementation does not depend on this in any way. It either
does an MSSA clobber walk on the memory access (which will take
all locations accessed by the call into account), or it will
look at all MemoryDefs in an entirely location-independent manner.
The current restriction dates back to the time when LICM still
supported AST, in which case this code *did* reason about the
individual pointer arguments.
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.
This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.
Co-authored-by: Nikita Popov <github@npopov.com>
Part of the coverage-tracking feature, following #107279.
In order for DebugLoc coverage testing to work, we firstly have to set
annotations for intentionally-empty DebugLocs, and secondly we have to
ensure that we do not drop these annotations as we propagate DebugLocs
throughout compilation. As the annotations exist as part of the DebugLoc
class, and not the underlying DILocation, they will not survive a
DebugLoc->DILocation->DebugLoc roundtrip. Therefore this patch modifies
a number of places in the compiler to propagate DebugLocs directly
rather than via the underlying DILocation. This has no effect on the
output of normal builds; it only ensures that during coverage builds, we
do not drop incorrectly annotations and therefore create false
positives.
The bulk of these changes are in replacing
DILocation::getMergedLocation(s) with a DebugLoc equivalent, and in
changing the IRBuilder to store a DebugLoc directly rather than storing
DILocations in its general Metadata array. We also use a new function,
`DebugLoc::orElse`, which selects the "best" DebugLoc out of a pair
(valid location > annotated > empty), preferring the current DebugLoc on
a tie - this encapsulates the existing behaviour at a few sites where we
_may_ assign a DebugLoc to an existing instruction, while extending the
logic to handle annotation DebugLocs at the same time.
Following the work in PR #107279, this patch applies the annotative
DebugLocs, which indicate that a particular instruction is intentionally
missing a location for a given reason, to existing sites in the compiler
where their conditions apply. This is NFC in ordinary LLVM builds (each
function `DebugLoc::getFoo()` is inlined as `DebugLoc()`), but marks the
instruction in coverage-tracking builds so that it will be ignored by
Debugify, allowing only real errors to be reported. From a developer
standpoint, it also communicates the intentionality and reason for a
missing DebugLoc.
Some notes for reviewers:
- The difference between `I->dropLocation()` and
`I->setDebugLoc(DebugLoc::getDropped())` is that the former _may_ decide
to keep some debug info alive, while the latter will always be empty; in
this patch, I always used the latter (even if the former could
technically be correct), because the former could result in some
(barely) different output, and I'd prefer to keep this patch purely NFC.
- I've generally documented the uses of `DebugLoc::getUnknown()`, with
the exception of the vectorizers - in summary, they are a huge cause of
dropped source locations, and I don't have the time or the domain
knowledge currently to solve that, so I've plastered it all over them as
a form of "fixme".
This implements the result of the discussion at:
https://discourse.llvm.org/t/rfc-report-fatal-error-and-the-default-value-of-gencrashdialog/73587
There are two different use cases for report_fatal_error, so replace it
with two functions reportFatalInternalError() and
reportFatalUsageError(). The former indicates a bug in LLVM and
generates a crash dialog. The latter does not. The names have been
suggested by rnk and people seemed to like them.
This replaces a lot of the usages that passed an explicit value for
GenCrashDiag. I did not bulk replace remaining report_fatal_error usage
-- they probably require case by case review for which function to use.
We can use *Set::insert_range to collapse:
for (auto Elem : Range)
Set.insert(E);
down to:
Set.insert_range(Range);
In some cases, we can further fold that into the set declaration.
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch uses insert_range in
conjunction with llvm::{predecessors,successors} and
MachineBasicBlock::{predecessors,successors}.
The implementation doesn't use it, and is unlikely to use it in
the future.
The places that do set StoreCaptures=false, do so incorrectly and
would be broken if the parameter actually did anything.
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.
This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).
---------
Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.
This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
Follow up on 4a0d53a (PatternMatch: migrate to CmpPredicate) to get rid
of one of the FIXMEs it introduced by replacing a predicate comparison
with CmpPredicate::getMatching.
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.
This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
Fixes#116809.
After running some passes (SimpleLoopUnswitch, LoopInstSimplify, etc.),
MemorySSA might be outdated, and the instruction `I` may have become a
non-memory touching instruction.
LICM has already handled this, but it does not pass
`CreationMustSucceed=false` to `createDefinedAccess`.
The DominatorTree version is marked for deprecation, so we use the
DomTreeUpdater version. We also update sinkRegion() to iterate over
basic blocks instead of DomTreeNodes. The loop body calls
SplitBlockPredecessors. The DTU version calls
DomTreeUpdater::apply_updates(), which may call DominatorTree::reset().
This invalidates the worklist of DomTreeNodes to iterate over.
Extend hoistBOAssociation smoothly to handle the case when the inner
BinaryOperator is in the RHS of the outer BinaryOperator. This completes
the generalization of hoistBOAssociation, and the only limitation after
this patch is the fact that only Add and Mul are hoisted.
Use IRBuilder when creating the new invariant instruction, so that the
constant-folder has an opportunity to constant-fold the new Instruction
that we desire to create.
These are the final few places in LLVM where we use instruction pointers
to identify the position that we're inserting something. We're trying to
get away from that with a view to deprecating those methods, thus use
iterators in all these places. I believe they're all debug-info safe.
The sketchiest part is the ExtractValueInst copy constructor, where we
cast nullptr to a BasicBlock pointer, so that we take the non-default
insert-into-no-block path for instruction insertion, instead of the
default nullptr-instruction path for UnaryInstruction. Such a hack is
necessary until we get rid of the instruction constructor entirely.
This limits folding and hoisting associative binary ops to cases where
the intermediate op has at most two uses.
The more uses the intermediate op has, the more new ops we have to
create to potentially reduce the loop's critical path. We keep the limit
to two uses to minimise undesirable increases in code size.
This reapplies a more strict version of
f2ccf80136.
Perform the transformation
"(LV op C1) op C2" ==> "LV op (C1 op C2)"
where op is an associative binary op, LV is a loop variant, and C1 and
C2 are loop invariants, and hoist (C1 op C2) into the preheader.
For now this fold is restricted to ADDs.
Perform the transformation
"(LV op C1) op C2" ==> "LV op (C1 op C2)"
where op is an associative binary op, LV is a loop variant, and C1 and
C2 are loop invariants to hoist.
Similar patterns could be folded (left in comment) but this one seems to
be the most impactful.
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
While reassociating expressions, LICM is required to invalidate SCEV
results, as otherwise subsequent passes in the pipeline that leverage
LICM foldings (e.g. IndVars), may reason on invalid expressions; thus
miscompiling. This is achieved by rewriting the reassociable
instruction from scratch.
Fixes: https://github.com/llvm/llvm-project/issues/91957.