Introduce llvm::CmpPredicate, an abstraction over a floating-point
predicate, and a pack of an integer predicate with samesign information,
in order to ease extending large portions of the codebase that take a
CmpInst::Predicate to respect the samesign flag.
We have chosen to demonstrate the utility of this new abstraction by
migrating parts of ValueTracking, InstructionSimplify, and InstCombine
from CmpInst::Predicate to llvm::CmpPredicate. There should be no
functional changes, as we don't perform any extra optimizations with
samesign in this patch, or use CmpPredicate::getMatching.
The design approach taken by this patch allows for unaudited callers of
APIs that take a llvm::CmpPredicate to silently drop the samesign
information; it does not pose a correctness issue, and allows us to
migrate the codebase piece-wise.
Instead of only trying to constant fold the select arms, try to simplify
them. This subsumes https://github.com/llvm/llvm-project/pull/115969
which implements this for extractvalue only.
This is still fairly limited in that we will usually only call
FoldOpIntoSelect in the first place if we have a constant operand. This
can be relaxed in the future if worthwhile.
Using LazyValueInfo, it is possible to compute valuable information for
allocation functions, GEP and alloca, even in the presence of dynamic
information.
llvm.objectsize plays an important role in _FORTIFY_SOURCE definitions,
so improving its diagnostic in turns improves the security of compiled
application.
As a side note, as a result of recent optimization improvements, clang
no longer passes
https://github.com/serge-sans-paille/builtin_object_size-test-suite This
commit restores the situation and greatly improves the scope of code
handled by the static version of __builtin_object_size.
Type::isScalableTy and StructType::containsScalableVectorType failed to
detect some cases of structs containing scalable vectors because
containsScalableVectorType did not call back into isScalableTy to check
the element types. Fix this, which requires sharing the same Visited set
in both functions. Also change the external API so that callers are
never required to pass in a Visited set, and normalize the naming to
isScalableTy.
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
This is another small but hopefully not performance negative step to
canonicalizing towards i8 geps. We looks for geps with a constant offset
base pointer of the form `gep (gep @glob, C1), x, C2` and expand the gep
instruction, so that the constant can hopefully be combined together (or
the x offset can be computed in common).
When InstCombine replaces an old instruction with a new instruction, it
copies !dbg and !annotation metadata from old to new. For some
InstCombine patterns we set a specific DILocation on the new instruction
prior to insertion, however, which more accurately reflects the new
instruction. This more specific DILocation may be overwritten on
insertion by a less appropriate one, resulting in a less correct line
mapping. This patch changes this behaviour to only copy the DILocation
from old to new if the new instruction has no existing DILocation (which
will always be the case for a new instruction unless InstCombine has
specifically set one).
When combining two geps into one by adding the offsets, we have
to take some care when intersecting the flags, because nusw flags
cannot be straightforwardly preserved.
Add a helper for this on GEPNoWrapFlags so we won't have to repeat
this logic in various places.
This extends the optimisation implemented in #107769 by relaxing the
condtions to make it happen. Now, the value produced by `ucmp`/`scmp`
doesn't need to be one-use, but only one-user, meaning it can be present
in a single phi node more than once.
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
There is already a comment on the member and documentation in the
InstCombine contributor guide, but also rename it to make add
an additional speed bump.
This replaces some uses of isSafeToSpeculativelyExecute() with
isSafeToSpeculativelyExecuteWithVariableReplaced(), in cases where we
are guarding against operand changes rather plain speculation.
I believe that this is NFC with the current implementation of the
function (as it only does something different from loads), but this
makes us more defensive against future generalizations.
When we have a `phi` instruction with more than one of its incoming
values being a call to `ucmp` or `scmp`, which is then compared with an
integer constant, we can move the comparison through the `phi` into the
incoming basic blocks because we know that a comparison of `ucmp`/`scmp`
with a constant will be simplified by the next iteration of InstCombine.
There's a high chance that other similar patterns can be identified, in
which case they can be easily handled by the same code by moving the
check for "simplifiable" instructions into a lambda.
These transforms all perform a variant of (gep (gep p, x), y)
to (gep p, (x + y)). We can preserve both inbounds and nuw
during such transforms (https://alive2.llvm.org/ce/z/Stu4cN), but
not nusw, which would require proving that the new add is nsw.
For the constant offset case, I've conservatively retained the
logic that checks for negative intermediate offsets, though I'm
not sure it's still reachable nowadays.
This was modifying the GEP in place, with code to adjust the
inbounds flag. This was correct at the time, but now fails to
account for other GEP flags like nuw, leading to miscompilations.
Remove the special case, and always create a new GEP instruction.
Logic for preserving nuw in the cases where it is valid will be
added in a followup patch.
The foldOpIntoPhi() transforms requires all operands to be
phi-translatable. This can be the case either because they are phi nodes
in the same block, or because the operand dominates the block.
Currently, most callers of foldOpIntoPhi() satisfy this pre-condition by
requiring a constant operand, which trivially dominates everything. Only
selects had handling for variable operands.
Move this logic into foldOpIntoPhi(), so things are handled correctly if
other callers are generalized. Also make the implementation a bit more
general by querying the dominator tree.
The op of phi transform wants to prevent moving an operation across a
backedge, as this may lead to an infinite combine loop.
Currently, this is done using isPotentiallyReachable(). The problem with
that is that all blocks inside a loop are reachable from each other.
This means that the op of phi transform is effectively completely
disabled for code inside loops, even when it's not actually operating on
a loop phi (just a phi that happens to be in a loop).
Fix this by explicitly computing the backedges inside the function
instead. Do this via RPOT, which is a bit more efficient than using
FindFunctionBackedges() (which does it without any pre-computed
analyses).
For irreducible cycles, there may be multiple possible choices of
backedge, and this just picks one of them. This is still sufficient to
prevent combine loops.
This also removes the last use of LoopInfo in InstCombine -- I'll drop
the analysis in a followup.
This patch replaces all dominated uses of condition with true/false to
improve context-sensitive optimizations. It eliminates a bunch of
branches in llvm-opt-benchmark.
As a side effect, it may introduce new phi nodes in some corner cases.
See the following case:
```
define i1 @test(i1 %cmp, i1 %cond) {
entry:
br i1 %cond, label %bb1, label %bb2
bb1:
br i1 %cmp, label %if.then, label %if.else
if.then:
br %bb2
if.else:
br %bb2
bb2:
%res = phi i1 [%cmp, %entry], [%cmp, %if.then], [%cmp, %if.else]
ret i1 %res
}
```
It will be simplified into:
```
define i1 @test(i1 %cmp, i1 %cond) {
entry:
br i1 %cond, label %bb1, label %bb2
bb1:
br i1 %cmp, label %if.then, label %if.else
if.then:
br %bb2
if.else:
br %bb2
bb2:
%res = phi i1 [%cmp, %entry], [true, %if.then], [false, %if.else]
ret i1 %res
}
```
I am planning to fix this in late pipeline/CGP since this problem exists
before the patch.
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.
Work towards TODO comment to remove `raw_string_ostream::str()`.
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
This is a small canonicalization for `gep i32, p, (mul x, C)` -> `gep
i8, p, (mul x, C*4)`, so that the mul can combine both of the constant
multiplications, and we take a small step towards canonicalizing more
geps to i8.
It currently doesn't attempt to check for multiple uses on the mul, but
that should be possible if it sounds better. Let me know what you think
of the idea in general.
Add overloads of GetElementPtrInst::Create() that accept
GEPNoWrapFlags, and switch the bool parameters in IRBuilder to
accept it instead as well.
As a sample use, switch GEP i8 canonicalization in InstCombine to
preserve the original flags.
This preserves the flags if a constexpr GEP is created (at least
as long as they don't get dropped later -- the test cases uses a
constexpr index to avoid that).
Use ICmpInst::compare() where possible, ConstantFoldCompareInstOperands
in other places. This only changes places where the either the fold is
guaranteed to succeed, or the code doesn't use the resulting compare if
we fail to fold.
Canonicalize getelementptr instructions for scalable vector types into
ptradd representation with an explicit llvm.vscale call. This
representation has better support in BasicAA, which can reason about
llvm.vscale, but not plain scalable GEPs.
This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice
from the experimental namespace.
All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.
When canonicalizing gep+add into gep+gep we can preserve inbounds if the
add is also nsw and both add operands are non-negative (or both
negative, but I don't think that's practically relevant).
Proof: https://alive2.llvm.org/ce/z/tJLBta