In line with updated shufflevector semantics, this represents the
poison elements rather than undef elements now. This commit is a
pure rename, without any logic changes.
Part of the "RemoveDIs" project to remove debug intrinsics requires
passing block-positions around in iterators rather than as instruction
pointers, allowing some debug-info to reside in BasicBlock::iterator.
This means getInsertionPointAfterDef has to return an iterator, and as
it can return no-instruction that means returning an optional iterator.
This patch changes the signature for getInsertionPtAfterDef and then
patches up the various places that use it to handle the different type.
This would overall be an NFC patch, however in
InstCombinerImpl::freezeOtherUses I've started skipping any debug
intrinsics at the returned insert-position. This should not have any
_meaningful_ effect on the compiler output: at worst it means variable
assignments that are skipped will now cover the freeze instruction and
anything inserted before it, which should be inconsequential.
Sadly: this makes the function signature ugly. This is probably the
ugliest piece of fallout for the "RemoveDIs" work, but it serves the
overall purpose of improving compile times and not allowing `-g` to
affect compiler output, so should be worthwhile in the end.
This is nearly an NFC, the only change is potentially to order that
values are created/names.
Otherwise it is a slight speed boost/simplification to avoid having to
go through the `getFreelyInverted` recursive logic twice to simplify
the extra `not` op.
If we have assosiated attributes i.e `([ret_attrs] (ptrmask (ptrmask
p0, m0), m1))` we should preserve `[ret_attrs]` when combining the two
`llvm.ptrmask`s.
Differential Revision: https://reviews.llvm.org/D156638
We can deduce the former based on the mask / incoming pointer
alignment. We can set the latter based if know the result in non-zero
(this is essentially just caching our analysis result).
Differential Revision: https://reviews.llvm.org/D156636
Currently, we specify that the ptrmask intrinsic allows the mask to have
any size, which will be zero-extended or truncated to the pointer size.
However, what semantics of the specified GEP expansion actually imply is
that the mask is only meaningful up to the pointer type *index* size --
any higher bits of the pointer will always be preserved. In other words,
the mask gets 1-extended from the index size to the pointer size. This
is also the behavior we want for CHERI architectures.
This PR makes two changes:
* It spells out the interaction with the pointer type index size more
explicitly.
* It requires that the mask matches the pointer type index size. The
intention here is to make handling of this intrinsic more robust, to
avoid accidental mix-ups of pointer size and index size in code
generating this intrinsic. If a zero-extend or truncate of the mask is
desired, it should just be done explicitly in IR. This also cuts down on
the amount of testing we have to do, and things transforms needs to
check for.
As far as I can tell, we don't actually support pointers with different
index type size at the SDAG level, so I'm just asserting the sizes match
there for now. Out-of-tree targets using different index sizes may need
to adjust that code.
visitCallInst already looks for fixed width vector extracts where number of
elements in the source and destination types are equal. This patch modifies
the function to also identify scalable extracts which can be removed.
This reverts commit b5743d4798b250506965e07ebab806a3c2d767cc.
This causes some minor compile-time impact. Revert for now, better
to do the change more gradually.
As per my proposal for how to eliminate debug intrinsics [0], for various
places in InstCombine prefer to insert using an instruction iterator rather
than an instruction pointer. This is so that we can eventually pass more
information in the iterator class. These call-sites where I've changed the
spelling are those that necessary to build a stage2clang to produce an
identical binary in the coming no-debug-intrinsics mode.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152543
Fold select (fcmp oeq x, 0), (fmul x, y), x => x
This cleans up a pattern left behind by denormal range checks under
denormals are zero.
The pattern starts out as something like:
x = x < smallest_normal ? x * K : x;
The comparison folds to an == 0 when the denormal mode treats input
denormals as zero. This makes library denormal checks free after
linked into DAZ enabled code.
alive2 is mostly happy with this, but there are some issues. First,
there are many reported failures in some of the negative tests that
happen to trigger some preexisting canonicalize introducing
combine. Second, alive2 is incorrectly asserting that denormals must
be flushed with the DAZ modes. It's allowed to drop a canonicalize.
https://reviews.llvm.org/D157030
This is a better canonical form. fcmp and fabs are more widely
understood and fabs can fold for free into some sources.
Addresses todo from D146170
https://reviews.llvm.org/D159084
`@llvm.ptrmask` is basically just `and` with a `ptr` operand. This is
a trivial combine to do with `and` (many others could also be added).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D154006
cttz(-a & a) is the same as cttz(a). -a & a is an idiom to extract
the lowest set bit, which naturally does not affect the number of
trailing zeroes.
Proof: https://alive2.llvm.org/ce/z/Yp26x7
The problem here is overflow or underflow which would have occurred in
the inner operation, which the exponent offsetting avoids. We can do
this if we know the two exponents are in the same direction, or
reassoc flags allow unsafe reassociates.
Move `AttributeMask` out of `llvm/IR/Attributes.h` to a new file
`llvm/IR/AttributeMask.h`. After doing this we can remove the
`#include <bitset>` and `#include <set>` directives from `Attributes.h`.
Since there are many headers including `Attributes.h`, but not needing
the definition of `AttributeMask`, this causes unnecessary bloating of
the translation units and slows down compilation.
This commit adds in the include directive for `llvm/IR/AttributeMask.h`
to the handful of source files that need to see the definition.
This reduces the total number of preprocessing tokens across the LLVM
source files in lib from (roughly) 1,917,509,187 to 1,902,982,273 - a
reduction of ~0.76%. This should result in a small improvement in
compilation time.
Differential Revision: https://reviews.llvm.org/D153728
The inserted instructions can usually be simplified. Make sure this
happens in the same InstCombine iteration by adding them to the
worklist.
We happen to get some better optimization in two cases, but this is
just a lucky accident. https://github.com/llvm/llvm-project/issues/63472
tracks implementing a fold for that case.
This doesn't track all inserted instructions yet, for that we would
also have to include those created by ObjectSizeOffsetEvaluator.
The behaviors of violating assume instruction or !nonnull metadata is
different. The former is immediate undefined behavior, but the latter is
returning poison value. This patch adds !noundef to trigger immediate
undefined behavior if !nonnull is violated.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D153400
The ORE argument threaded through ValueTracking is used only in a
single, untested place. It is also essentially never passed: The
only places that do so have been added very recently as part of the
KnownFPClass migration, which is vanishingly unlikely to hit this
code path. Remove this effectively dead argument.
Differential Revision: https://reviews.llvm.org/D151562
This patch utilizes the helper function implemented in D149699 and thus folds the following cases:
```
bitreverse(logic_op(x, bitreverse(y))) -> logic_op(bitreverse(x), y)
bitreverse(logic_op(bitreverse(x), y)) -> logic_op(x, bitreverse(y))
bitreverse(logic_op(bitreverse(x), bitreverse(y))) -> logic_op(x, y) in multiuse case
```
Reviewed By: goldstein.w.n, RKSimon
Differential Revision: https://reviews.llvm.org/D151246
Issue was an assertion failure due to an unchecked `cast`. Fix is to
check the operator is `BinaryOperator` before cast so that we won't
match `ConstExpr`
Reviewed By: goldstein.w.n, RKSimon
Differential Revision: https://reviews.llvm.org/D149699
The generic cast to `BinaryOperator` can break if `V` is not a
`BinaryOperator` (i.e a `ConstantExpr`). This occurs in things like
PPC linux build.
This reverts commit fe733f54da6faca95070b36b1640dbca3e43d396.
The patch implements a helper function that matches and fold the following cases in the InstCombine pass:
bswap(logic_op(x, bswap(y))) -> logic_op(bswap(x), y)
bswap(logic_op(bswap(x), y)) -> logic_op(x, bswap(y))
bswap(logic_op(bswap(x), bswap(y))) -> logic_op(x, y) in multiuse case, which still reduces the number of instructions.
The helper function accepts bswap and bitreverse intrinsics. This patch folds the bswap cases and remain the bitreverse optimization for the future
Differential Revision: https://reviews.llvm.org/D149699
Following the change in shufflevector semantics,
poison will be used to represent undefined elements in shufflevector masks.
Differential Revision: https://reviews.llvm.org/D149256
The various isKnownNever* calls can be merged into one. This also introduces
the new ability to remove zero/sub/normal checks. Also start passing the
AssumptionCache arguments.