Prevent LowerFunnelShift from creating an invalid ISD::FSHR when
lowering "ISD::FSHL X, Y, 0". Such inputs are rare because it's a NOP
that DAGCombiner will optimise away. However, we should not rely on this
and so this PR mirrors the same optimisation.
Ensure LowerFunnelShift normalises constant shift amounts because isel
rules expect them to be in the range [0, src bit length).
NOTE: To simiplify testing, this PR also adds a command line option to
disable the DAG combiner (-combiner-disabled).
Fold (srl (lop x, (shl (zext y), c1)), c1) -> (lop (srl x, c1), (zext y)) where c1 <= leadingzeros(zext(y)).
This is equivalent of existing fold chain (srl (shl (zext y), c1), c1) -> (and (zext y), mask) -> (zext y), but logical op in the middle prevents it from combining.
Profit : Allow to reduce the number of instructions.
Original commit: #138290 / bbc5221
Previously reverted due to conflict in LIT test. Mainline changed
default version of load instruction to untyped version by this #137698 .
Updated test uses `ld.param.b64` instead of `ld.param.u64`.
Fold `(srl (lop x, (shl (zext y), c1)), c1) -> (lop (srl x, c1), (zext y))` where c1 <= leadingzeros(zext(y)).
This is equivalent of existing fold chain `(srl (shl (zext y), c1), c1) -> (and (zext y), mask) -> (zext y)`, but logical op in the middle prevents it from combining.
Profit : Allow to reduce the number of instructions.
---------
Signed-off-by: Alexander Peskov <apeskov@nvidia.com>
Follow up to 6e654caab, use the new routines in more places. Note that
I've excluded from this patch any case which uses a getConstant index
instead of a getVectorIdxConstant index just to minimize room for
error. I'll get those in a separate follow up.
Generic DAG combine for ISD::PARTIAL_REDUCE_U/SMLA to convert:
PARTIAL_REDUCE_*MLA(Acc, ZEXT(UnextOp1), Splat(1)) into
PARTIAL_REDUCE_UMLA(Acc, UnextOp1, TRUNC(Splat(1))) and
PARTIAL_REDUCE_*MLA(Acc, SEXT(UnextOp1), Splat(1)) into
PARTIAL_REDUCE_SMLA(Acc, UnextOp1, TRUNC(Splat(1))).
---------
Co-authored-by: James Chesterman <james.chesterman@arm.com>
Call with the individual subvector type, source vector and index operands instead of the original EXTRACT_SUBVECTOR node.
Some of these folds still assumed that EXTRACT_SUBVECTOR/INSERT_SUBVECTOR nodes could have variable indices, despite us moving to all constant indices some time ago - all of that code has now been simplified.
I've moved the narrowExtractedVectorBinOp call higher up, but it won't affect fold order - it didn't rely on the peekThroughBitcasts call, and worked on BinOps, not BUILD_VECTOR/INSERT_SUBVECTOR nodes.
Prep work to make it easier for more of these folds to work through BITCAST nodes.
When floating-point operations are legalized to operations of a higher
precision (e.g. f16 fadd being legalized to f32 fadd) then we get
narrowing then widening operations between each operation. With the
appropriate fast math flags (nnan ninf contract) we can eliminate these
casts.
Add lowering in tablegen for PARTIAL_REDUCE_U/SMLA ISD nodes. Only
happens when the combine has been performed on the ISD node. Also adds
in check to only do the DAG combine when the node can then eventually be
lowered, so changes neon tests too.
---------
Co-authored-by: James Chesterman <james.chesterman@arm.com>
Based off feedback for #129695 - we need to be able to determine the
load offset of smaller loads when trying to determine whether a multiple
use load should be split (in particular for AVX subvector extractions).
This patch adds a std::optional<unsigned> ByteOffset argument to
shouldReduceLoadWidth calls for where we know the constant offset to
allow targets to make use of it in future patches.
DAGCombiner::hoistLogicOpWithSameOpcodeHands will hoist
(or disjoint (ext a), (ext b)) -> (ext (or disjoint a, b))
So this adds patterns to match vwadd[u].v{v,x} in this case.
We have to teach the combine to preserve the disjoint flag.
Rename one signature of getAtomic to getAtomicLoad and pass LoadExtType.
Previously we had to set the extension type after the node was created,
but we don't usually modify SDNodes once they are created. It's possible
the node already existed and has been CSEd. If that happens, modifying
the node may affect the other users. It's therefore safer to add the
extension type at creation so that it is part of the CSE information.
I don't know of any failures related to the current implementation. I
only noticed that it doesn't match how we usually do things.
Fold an AND or OR of two NaN SETCC nodes into a single SETCC where
possible. This optimization already exists in InstCombine but adding in
here as well can allow for additional folding if more logical operations
are exposed.
Folds patterns such as:
unsigned foo(unsigned x, unsigned y) {
return x >= y ? x - y : x;
}
Before, on RISC-V:
sltu a2, a0, a1
addi a2, a2, -1
and a1, a1, a2
subw a0, a0, a1
Or, with Zicond:
sltu a2, a0, a1
czero.nez a1, a1, a2
subw a0, a0, a1
After, with Zbb:
subw a1, a0, a1
minu a0, a0, a1
Only applies to unsigned comparisons.
If `x >= y` then `x - y` is less than or equal `x`.
Otherwise, `x - y` wraps and is greater than `x`.
Almost all of the rotate idioms that are valid for an 'or' are also
valid when the halves are combined with an 'add'. Further, many of these
cases are not handled by common bits tracking meaning that the 'add' is
not converted to a 'disjoint or'.
Similar to what is done for visitEXTRACT_VECTOR_ELT - if all uses of a vector are EXTRACT_SUBVECTOR, then determine the accumulated demanded elts across all users and call SimplifyDemandedVectorElts in "AssumeSingleUse" use.
Second try after #133130 was reverted by #133331 due to it affecting reverted test files
Similar to what is done for visitEXTRACT_VECTOR_ELT - if all uses of a
vector are EXTRACT_SUBVECTOR, then determine the accumulated demanded
elts across all users and call SimplifyDemandedVectorElts in
"AssumeSingleUse" use.
This patch improves DAGCombiner's handling of potential store merges by
detecting function calls between loads and stores. When a function call
exists in the chain between a load and its corresponding store, we avoid
merging these stores if the spilling is unprofitable.
We had to implement a hook on TLI, since TTI is unavailable in
DAGCombine. Currently, it's only enabled for riscv.
This is the DAG equivalent of PR #129258
The pattern
```LLVM
%shl = shl i32 %x, 31
%ashr = ashr i32 %shl, 31
```
would be combined to `SIGN_EXTEND_INREG %x, ValueType:ch:i1` by
SelectionDAG.
However InstCombine normalizes this pattern to:
```LLVM
%and = and i32 %x, 1
%neg = sub i32 0, %and
```
This adds matching code to DAGCombiner to catch this variant as well.
This option was added to improve test coverage for SVE lowering code
that is impossible to reach otherwise. Given it is not possible to
trigger a bug without it and the generated code is universally worse
with it, I figure the option has no value and should be removed.
Add generic DAG combine for ISD::PARTIAL_REDUCE_U/SMLA nodes. Transforms
the DAG from:
PARTIAL_REDUCE_MLA(Acc, MUL(EXT(MulOpLHS), EXT(MulOpRHS)), Splat(1)) to
PARTIAL_REDUCE_MLA(Acc, MulOpLHS, MulOpRHS).
Since N2 will be reused in the fold, we cannot skip N2's undef elements
if the corresponding element in N1 is well-defined.
For example:
```
t2: v4i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
t24: v4i32 = BUILD_VECTOR undef:i32, undef:i32, Constant:i32<1>, undef:i32
t11: v4i32 = vselect t8, t2, t10
```
Before this patch, we fold t11 into:
```
t26: v4i32 = sign_extend t8
t27: v4i32 = add t26, t24
```
The last element of t27 is incorrect.
Closes https://github.com/llvm/llvm-project/issues/129181.