Same we way mark a path unreachable if it may cause a nullptr
dereference, div/rem by zero or signed div/rem of INT_MIN by -1 cause
immediate UB.
Closes#109008
Add a new getSCEVExprForVPValue utility which can be used to get a SCEV
expression for a VPValue. The initial implementation only returns SCEVs
for live-in IR values (by constructing a SCEV based on the live-in IR
value) and VPExpandSCEVRecipe. This is enough to serve its first use,
getting a SCEV for a VPlan's trip count, but will be extended in the
future.
It also removes createTripCountSCEV, as the new helper can be used to
retrieve the SCEV from the VPlan.
PR: https://github.com/llvm/llvm-project/pull/94464
This attempts to fix the ASan buildbot, which is detecting that CI is used
after it is removed in substituteInParent. The idea was to make sure it was
removed even if it had side-effects writing errno, but that appears to happen
if we return FRem directly as usual.
The motivation of this patch is to fold more generalized patterns like
`icmp ult (shl nuw 16, X), 64 -> icmp ult X, 2`.
Alive2: https://alive2.llvm.org/ce/z/gyqjQH
fmod will be folded to frem in clang under -fno-math-errno and can be constant
folded in llvm if the operands are known. It can be relatively common to have
fp code that handles special values before doing some calculation:
```
if (isnan(f))
return handlenan;
if (isinf(f))
return handleinf;
..
fmod(f, 2.0)
```
This patch enables the folding of fmod to frem in instcombine if the first
parameter is not inf and the second is not zero. Other combinations do not set
errno.
The same transform is performed for fmod with the nnan flag, which implies the
input is known to not be inf/zero.
In current visitPHINode function during InstCombine, it can remove dead
phi cycles (all phis have one use, which is another phi). However, it
cannot deal with the case when the phis form a web (all phis have one or
more uses, and all the uses are phi). This change extends the algorithm
so that it can also deal with the dead phi web.
The assertion that all out-edges of a BB can't be 0 is incorrect: they
can be, if that branch is on a cold subgraph.
Added validators and asserts about the expected proprerties of the
propagated counters.
This patch adds the basic API for the Legality component of the
vectorizer. It also adds some very basic code in the bottom-up
vectorizer that uses the API.
A region identifies a set of vector instructions generated by
vectorization passes. The vectorizer can then run a series of
RegionPasses on the region, evaluate the cost, and commit/reject the
transforms on a region-by-region basis, instead of an entire basic
block.
This is heavily based ov @vporpo's prototype. In particular, the doc
comment for the Region class is all his. The rest of this commit is
mostly boilerplate around a SetVector: getters, iterators, and some
debug helpers.
In some cases, if an undef value is the product of another instcombine
simplification, a bitcast of undef is simplified to a zeroinitializer
vector instead of undef.
Since we are using the Scalarizer pass in the backend we needed a way to
allow this pass to operate on Target intrinsics.
We achieved this by adding `TargetTransformInfo ` to the Scalarizer
pass. This allowed us to call a function available to the DirectX
backend to know if an intrinsic is a target intrinsic that should be
scalarized.
The sort used the block name as a tie-breaker, which will not work for
unnamed blocks and can result in a strict weak ordering violation.
Fix this by checking that all exiting blocks dominate the latch first,
which means that we have a total dominance order. This makes the code
structure here align with what optimizeLoopExits() does.
Fixes https://github.com/llvm/llvm-project/issues/108618.
I've renamed the variable in replaceVPBBWithIRVPBB from IRMiddleVPBB ->
IRVPBB, since the function is used for more than just replacing the
middle VP block.
The "clustered" nodes for buildvector nodes must be postponed in
accordance with the global flag, otherwise it may cause crash because of
the dependency between phi nodes.
If the operand node for truncs is not created during construction, but
one of the previous ones is reused instead, need to correctly identify
its index, to correctly emit the code.
Fixes https://github.com/llvm/llvm-project/issues/108700
During inter-procedural SCCP, also infer attributes on arguments, not
just return values. This allows other non-interprocedural passes to make
use of the information later.
We already pass a Type object into the VPTypeAnalysis constructor, which
can be used to obtain the context. While in the same area it also made
sense to avoid passing the context into the VPTransformState and
VPCostContext constructors.
Consider the following case:
```
define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) {
%19 = icmp eq <2 x i64> %vec.ind16, zeroinitializer
%20 = zext <2 x i1> %19 to <2 x i32>
%21 = lshr <2 x i32> %20, %broadcast.splat20
ret <2 x i32> %21
}
```
After https://github.com/llvm/llvm-project/pull/104606, we shrink the
lshr into:
```
define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) {
%1 = icmp eq <2 x i64> %vec.ind16, zeroinitializer
%2 = trunc <2 x i32> %broadcast.splat20 to <2 x i1>
%3 = lshr <2 x i1> %1, %2
%4 = zext <2 x i1> %3 to <2 x i32>
ret <2 x i32> %4
}
```
It is incorrect since `lshr i1 X, 1` returns `poison`.
This patch adds additional check on the shamt operand. The lshr will get
shrunk iff we ensure that the shamt is less than bitwidth of the smaller
type. As `computeKnownBits(&I, *DL).countMaxActiveBits() > BW` always
evaluates to true for `lshr(zext(X), Y)`, this check will only apply to
bitwise logical instructions.
Alive2: https://alive2.llvm.org/ce/z/j_RmTa
Fixes https://github.com/llvm/llvm-project/issues/108698.
alive2: ~~https://alive2.llvm.org/ce/z/Ag3Ki7~~https://alive2.llvm.org/ce/z/ywP5t2
related: #76438
This patch adds the following foldings: `floor(x) <= x --> true` and `x
<= ceil(x) --> true`. We leverage the properties of these math functions
and ensure there is no floating point input of `nan`.
---------
Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
Add a new VPIRInstruction recipe to wrap existing IR instructions not to
be modified during execution, execept for PHIs. For PHIs, a single
VPValue
operand is allowed, and it is used to add a new incoming value for the
single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any
operands.
Depends on https://github.com/llvm/llvm-project/pull/100658.
PR: https://github.com/llvm/llvm-project/pull/100735
This reverts commit 524a028f69cdf25503912c396ebda7ebf0065ed2, but
manually so that follow on PR108086 /
ae5f1a78d3a930466f927989faac8e0b9d820a7b
is retained (NFC patch to convert tuple to a struct).
Before trying to include the scalar into the list of
ExternallyUsedValues, need to check, if it was transformed in previous
iteration and use the transformed value, not the original one, to avoid
compiler crash when building external uses.
Fixes https://github.com/llvm/llvm-project/issues/108620
CoroSplit has a heuristic where the scope line for funclets is adjusted
to match the line of the suspend intrinsic that caused the split. This
is useful as it avoids a jump on the line table from the original
function declaration to the line where the split happens.
However, very often using the line of the split is not ideal: if we can
avoid it, we should not have a line entry for the split location, as
this would cause breakpoints by line to match against two functions: the
funclet before and the funclet after the split.
This patch adjusts the heuristics to look for the first instruction with
a non-zero line number after the split. In other words, this patch makes
breakpoints on `await foo()` lines behave much more like a regular
function call.
Extend VPBuilder to allow creating VPDerivedIVRecipe, VPScalarCastRecipe
and VPScalarIVStepsRecipe.
Use them to simplify the code to create scalar IV steps slightly.
This patch adds support for a user-defined pass-pipeline that overrides
the default pipeline of the vectorizer.
This will commonly be used by lit tests.
When we have a `phi` instruction with more than one of its incoming
values being a call to `ucmp` or `scmp`, which is then compared with an
integer constant, we can move the comparison through the `phi` into the
incoming basic blocks because we know that a comparison of `ucmp`/`scmp`
with a constant will be simplified by the next iteration of InstCombine.
There's a high chance that other similar patterns can be identified, in
which case they can be easily handled by the same code by moving the
check for "simplifiable" instructions into a lambda.
This patch implements a new empty pass for the Bottom-up vectorizer and
creates a pass pipeline that includes it.
The SandboxVectorizer LLVM pass runs the Sandbox IR pass pipeline.
* To create custom ABIs plugin libraries need access to CoroShape.
* As a step in enabling plugin libraries, move Shape into its own header
* The header will eventually be moved into include/llvm/Transforms/Coroutines
See RFC for more info:
https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057