The "clustered" nodes for buildvector nodes must be postponed in
accordance with the global flag, otherwise it may cause crash because of
the dependency between phi nodes.
If the operand node for truncs is not created during construction, but
one of the previous ones is reused instead, need to correctly identify
its index, to correctly emit the code.
Fixes https://github.com/llvm/llvm-project/issues/108700
During inter-procedural SCCP, also infer attributes on arguments, not
just return values. This allows other non-interprocedural passes to make
use of the information later.
We already pass a Type object into the VPTypeAnalysis constructor, which
can be used to obtain the context. While in the same area it also made
sense to avoid passing the context into the VPTransformState and
VPCostContext constructors.
Consider the following case:
```
define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) {
%19 = icmp eq <2 x i64> %vec.ind16, zeroinitializer
%20 = zext <2 x i1> %19 to <2 x i32>
%21 = lshr <2 x i32> %20, %broadcast.splat20
ret <2 x i32> %21
}
```
After https://github.com/llvm/llvm-project/pull/104606, we shrink the
lshr into:
```
define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) {
%1 = icmp eq <2 x i64> %vec.ind16, zeroinitializer
%2 = trunc <2 x i32> %broadcast.splat20 to <2 x i1>
%3 = lshr <2 x i1> %1, %2
%4 = zext <2 x i1> %3 to <2 x i32>
ret <2 x i32> %4
}
```
It is incorrect since `lshr i1 X, 1` returns `poison`.
This patch adds additional check on the shamt operand. The lshr will get
shrunk iff we ensure that the shamt is less than bitwidth of the smaller
type. As `computeKnownBits(&I, *DL).countMaxActiveBits() > BW` always
evaluates to true for `lshr(zext(X), Y)`, this check will only apply to
bitwise logical instructions.
Alive2: https://alive2.llvm.org/ce/z/j_RmTa
Fixes https://github.com/llvm/llvm-project/issues/108698.
alive2: ~~https://alive2.llvm.org/ce/z/Ag3Ki7~~https://alive2.llvm.org/ce/z/ywP5t2
related: #76438
This patch adds the following foldings: `floor(x) <= x --> true` and `x
<= ceil(x) --> true`. We leverage the properties of these math functions
and ensure there is no floating point input of `nan`.
---------
Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
Add a new VPIRInstruction recipe to wrap existing IR instructions not to
be modified during execution, execept for PHIs. For PHIs, a single
VPValue
operand is allowed, and it is used to add a new incoming value for the
single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any
operands.
Depends on https://github.com/llvm/llvm-project/pull/100658.
PR: https://github.com/llvm/llvm-project/pull/100735
This reverts commit 524a028f69cdf25503912c396ebda7ebf0065ed2, but
manually so that follow on PR108086 /
ae5f1a78d3a930466f927989faac8e0b9d820a7b
is retained (NFC patch to convert tuple to a struct).
Before trying to include the scalar into the list of
ExternallyUsedValues, need to check, if it was transformed in previous
iteration and use the transformed value, not the original one, to avoid
compiler crash when building external uses.
Fixes https://github.com/llvm/llvm-project/issues/108620
CoroSplit has a heuristic where the scope line for funclets is adjusted
to match the line of the suspend intrinsic that caused the split. This
is useful as it avoids a jump on the line table from the original
function declaration to the line where the split happens.
However, very often using the line of the split is not ideal: if we can
avoid it, we should not have a line entry for the split location, as
this would cause breakpoints by line to match against two functions: the
funclet before and the funclet after the split.
This patch adjusts the heuristics to look for the first instruction with
a non-zero line number after the split. In other words, this patch makes
breakpoints on `await foo()` lines behave much more like a regular
function call.
Extend VPBuilder to allow creating VPDerivedIVRecipe, VPScalarCastRecipe
and VPScalarIVStepsRecipe.
Use them to simplify the code to create scalar IV steps slightly.
This patch adds support for a user-defined pass-pipeline that overrides
the default pipeline of the vectorizer.
This will commonly be used by lit tests.
When we have a `phi` instruction with more than one of its incoming
values being a call to `ucmp` or `scmp`, which is then compared with an
integer constant, we can move the comparison through the `phi` into the
incoming basic blocks because we know that a comparison of `ucmp`/`scmp`
with a constant will be simplified by the next iteration of InstCombine.
There's a high chance that other similar patterns can be identified, in
which case they can be easily handled by the same code by moving the
check for "simplifiable" instructions into a lambda.
This patch implements a new empty pass for the Bottom-up vectorizer and
creates a pass pipeline that includes it.
The SandboxVectorizer LLVM pass runs the Sandbox IR pass pipeline.
* To create custom ABIs plugin libraries need access to CoroShape.
* As a step in enabling plugin libraries, move Shape into its own header
* The header will eventually be moved into include/llvm/Transforms/Coroutines
See RFC for more info:
https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057
Whilst trying to write some VPlan unit tests I realised
that we don't need to pass a ScalarEvolution object into
VPlanTransforms::optimize because the only thing we
actually need is a LLVMContext.
These transforms all perform a variant of (gep (gep p, x), y)
to (gep p, (x + y)). We can preserve both inbounds and nuw
during such transforms (https://alive2.llvm.org/ce/z/Stu4cN), but
not nusw, which would require proving that the new add is nsw.
For the constant offset case, I've conservatively retained the
logic that checks for negative intermediate offsets, though I'm
not sure it's still reachable nowadays.
This was modifying the GEP in place, with code to adjust the
inbounds flag. This was correct at the time, but now fails to
account for other GEP flags like nuw, leading to miscompilations.
Remove the special case, and always create a new GEP instruction.
Logic for preserving nuw in the cases where it is valid will be
added in a followup patch.
Need to select the max of CommonMask and V1 Mask size to correctly
perform reshuffling of the vectors, otherwise incorrect result is
generated.
Fixes https://github.com/llvm/llvm-project/issues/108421
As discussed in this
[proposal](https://github.com/llvm/wg-hlsl/pull/62/files?short_path=ac6e592#diff-ac6e59276afe8016e307eedc5c835f534c0cb353707760b44df0fa9d905a5cf8).
We had to bring back the legacy pass manager interface for the
scalarizer pass. Two reasons for this:
1. The DirectX backend is still using the legacy pass manager
2. The new PM isn't hooked up in clang yet via `BackendUtil.cpp`'s
`AddEmitPasses` That means even if we add a `buildCodeGenPipeline` we
won't be able to benefit from the new pass manager's scalarizer pass
interface.
The remaining changes are hooking up the scalarizer pass to the DirectX
backend, updating the DirectX test cases,
and allowing the `optdriver` to not block the legacy invocation of the
scalarizer pass.
Future work still needs to be done to allow the scalarizer pass to
handle target specific intrinsics.
closes#105178
* Move materialization out of CoroFrame to MaterializationUtils.h
* Move spill related utilities that were used by materialization to
SpillUtils
* Move isSuspendBlock (needed by materialization) to CoroInternal
See RFC for more info:
https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057
* Add asserts to verify normalization of the coroutine happened.
* This will be important when normalization becomes part of an ABI
object.
--- From a previous discussion here
https://github.com/llvm/llvm-project/pull/108076
Normalization performs these important steps:
split around each suspend, this adds BBs before/after each suspend so
each suspend now exists in its own block.
split around coro.end (similar to above)
break critical edges and add single-edge phis in the new blocks (also
removing other single-edge phis).
Each of these things can individually be tested
A) Check that each suspend is the only inst in its BB
B) Check that coro.end is the only inst in its BB
C) Check that each edge of a multi-edge phis is preceded by single-edge
phi in an immediate pred
For 1) and 2) I believe the purpose of the transform is in part for
suspend crossing info's analysis so it can specifically 'mark' the
suspend blocks and identify the end of the coroutine. There are some
existing places within suspend crossing info that visit the CoroSuspends
and CoroEnds so we could check A) and B) there.
For 3) I believe the purpose of this transform is for insertSpills to
work properly. Infact there is already a check for the result of this
transform!
assert(PN->getNumIncomingValues() == 1 &&
"unexpected number of incoming "
"values in the PHINode");
I think to verify the result of normalization we just need to add checks
A) and B) to suspend crossing info.
Fixes https://github.com/llvm/llvm-project/issues/107139.
We weren't updating the call graph properly in CoroSplit. This crash is
due to the await_suspend() function calling the coroutine, forming a
multi-node SCC. The issue bisected to
https://github.com/llvm/llvm-project/pull/79712 but I think this is red
herring. We haven't been properly updating the call graph.
Added an example of such code as a test case.
- the set used for targets under a callsite is simpler to use if iterators
are stable (it gets manipulated during updates)
- the set used to fetch the transitive closure of GUIDs under a node can
be left as a choice to the user.
At the moment, the full cost of all interleave group members is assigned
to the instruction at the group's insert position, even if the decision
was to not form an interleave group.
This can lead to inaccurate cost estimates, e.g. if the instruction at
the insert position is dead. If the decision is to not vectorize but
scalarize or scather/gather, then the cost will be to total cost for all
members. In those cases, assign individual the cost per member, to more
closely reflect to choice per instruction.
This fixes a divergence between legacy and VPlan-based cost model.
Fixes https://github.com/llvm/llvm-project/issues/108098.
The check for IV increments in collectUsersInEntryBlock currently
triggers for exit-block PHIs which use the IV start value, resulting in
us failing to add the input value for the middle block to these PHIs.
Fix this by amending the check for IV increments to only include
incoming values that are instructions inside the loop.
Fixes#108004