When a load or store accesses N bytes starting from a pointer P, and we want to
compute an offset pointer within these N bytes after P, we know that the
arithmetic to add the offset must be inbounds. This is for example relevant
when legalizing too-wide memory accesses, when lowering memcpy&Co., or when
optimizing "vector-load -> extractelement" into an offset load.
For SWDEV-516125.
This PR preserves the InBounds flag (#162477) where possible in PTRADD-related
DAGCombines. We can't preserve them in all the cases that we could in the
analogous GISel change (#152495) because SDAG usually represents pointers as
integers, which means that pointer provenance is not preserved between PTRADD
operations (see the discussion at PR #162477 for more details). This PR marks
the places in the DAGCombiner where this is relevant explicitly.
For SWDEV-516125.
For an insertelt with a dynamic index, the default handling in
DAGTypeLegalizer and LegalizeDAG will reserve a stack slot for the
vector, lower the insertelt to a store, then load the modified vector
back into temporaries. The vector store and load may be legalized into a
sequence of smaller operations depending on the target.
Let V = the vector size and L = the length of a chain of insertelts with
dynamic indices. In the worse case, this chain will lower to O(VL)
operations, which can increase code size dramatically.
Instead, identify such chains, reserve one stack slot for the vector,
and lower all of the insertelts to stores at once. This requires only
O(V + L) operations. This change only affects the default lowering
behavior.
[DAG] Fold mismatched widened avg idioms to narrow form (fixes half of
[llvm#147946](https://github.com/llvm/llvm-project/issues/147946))
1. `trunc(avgceilu(sext(x), sext(y))) -> avgceils(x, y)`
2. `trunc(avgceils(zext(x), zext(y))) -> avgceilu(x, y)`
When inputs are sign-extended, unsigned and signed averaging operations
produce identical results after truncation, allowing us to use the
semantically correct narrow operation.
alive2: https://alive2.llvm.org/ce/z/ZRbfHT
We're very careful not to truncate binary arithmetic ops if it will
affect legality, or cause additional truncation instructions, hence we
currently limit this to cases where one operand is constant.
But if both ops are the same (i.e. for some add/mul cases) then we
wouldn't increase the number of truncations, so can be slightly more
aggressive at folding the truncation.
simplifyDivRem does not work as well after type legalisation because
splatted constants can have a size mismatch between the scalar to splat
and the element type of the splatted result. simplifyDivRem does not
seem to care about this mismatch so I've updated the "is one" check
for the divisor to allow truncation.
Reland #161355, after fixing up the cross-projects-tests for the wasm
simd intrinsics.
Original commit message:
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions.
If we have FP16, then lower v8f16 fmuladds to FMA.
I've introduced an ISD node for fmuladd to maintain the rounding
ambiguity through legalization / combine / isel.
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions.
If we have FP16, then lower v8f16 fmuladds to FMA.
I've introduced an ISD node for fmuladd to maintain the rounding
ambiguity through legalization / combine / isel.
This patch updates the FP-to-Int conversion handling:
- For signed integers: use `ftrunc` followed by clamping to the target
integer range.
- For unsigned integers: apply `fabs` + `ftrunc`, then clamp.
This removes the previous dependence on `nsz` and ensures correct
lowering for both signed and unsigned cases.
I've tested the code generation of -mtriple=amdgcn. It seems that the
assembly code is expected, but I'm not sure how to write a general
testcase for every target.
Fixes#160623.
Add several improvements to DAGCombine patterns for fmin/fmax:
* Fix incorrect results due to minimumnum not being marked as IsMin
- e.g. nnan minimumnum(x, +inf) returned +inf instead of x
* Fix incorrect results checking maximumnum for vecreduce patterns
* Make maxnum/minnum return QNaN if one input is SNaN instead of X
* Quiet SNaN inputs when propagating them e.g. maximum(x, SNaN) = QNaN
* Update comments to mark when SNaN propagation is being ignored
Remove the `NoSignedZerosFPMath` use in `visitFNEG`. Now the only use of
`NoSignedZerosFPMath` is in `foldFPToIntToFP`, but adding fast-math
flags support for `uitofp` may introduce breaking changes.
If we can't fold a PTRADD's offset into its users, lowering them to
disjoint ORs is preferable: Often, a 32-bit OR instruction suffices
where we'd otherwise use a pair of 32-bit additions with carry.
This needs to be a DAGCombine (and not a selection rule) because its
main purpose is to enable subsequent DAGCombines for bitwise operations.
We don't want to just turn PTRADDs into disjoint ORs whenever that's
sound because this transform loses the information that the operation
implements pointer arithmetic, which AMDGPU for instance needs when
folding constant offsets.
For SWDEV-516125.
This PR adds a TargetLowering hook, canTransformPtrArithOutOfBounds,
that targets can use to allow transformations to introduce out-of-bounds
pointer arithmetic. It also moves two such transformations from the
AMDGPU-specific DAG combines to the generic DAGCombiner.
This is motivated by target features like AArch64's checked pointer
arithmetic, CPA, which does not tolerate the introduction of
out-of-bounds pointer arithmetic.
As reported in https://github.com/llvm/llvm-project/issues/141034
SelectionDAG::getNode had some unexpected
behaviors when trying to create vectors with UNDEF elements. Since
we treat both UNDEF and POISON as undefined (when using isUndef())
we can't just fold away INSERT_VECTOR_ELT/INSERT_SUBVECTOR based on
isUndef(), as that could make the resulting vector more poisonous.
Same kind of bug existed in DAGCombiner::visitINSERT_SUBVECTOR.
Here are some examples:
This fold was done even if vec[idx] was POISON:
INSERT_VECTOR_ELT vec, UNDEF, idx -> vec
This fold was done even if any of vec[idx..idx+size] was POISON:
INSERT_SUBVECTOR vec, UNDEF, idx -> vec
This fold was done even if the elements not extracted from vec could
be POISON:
sub = EXTRACT_SUBVECTOR vec, idx
INSERT_SUBVECTOR UNDEF, sub, idx -> vec
With this patch we avoid such folds unless we can prove that the
result isn't more poisonous when eliminating the insert.
Fixes https://github.com/llvm/llvm-project/issues/141034
Hi, I compared the following LLVM IR with GCC and Clang, and there is a small difference between the two. The LLVM IR is:
```
define i64 @test_smin_neg_one(i64 %a) {
%1 = tail call i64 @llvm.smin.i64(i64 %a, i64 -1)
%retval.0 = xor i64 %1, -1
ret i64 %retval.0
}
```
GCC generates:
```
cmp x0, 0
csinv x0, xzr, x0, ge
ret
```
Clang generates:
```
cmn x0, #1
csinv x8, x0, xzr, lt
mvn x0, x8
ret
```
Clang keeps flipping x0 through x8 unnecessarily.
So I added the following folds to DAGCombiner:
fold (xor (smax(x, C), C)) -> select (x > C), xor(x, C), 0
fold (xor (smin(x, C), C)) -> select (x < C), xor(x, C), 0
alive2: https://alive2.llvm.org/ce/z/gffoir
---------
Co-authored-by: Yui5427 <785369607@qq.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Checking `isOperationLegalOrCustom` instead of `isOperationLegal` allows
more optimization opportunities. In particular, if a target wants to
mark `extract_vector_elt` as `Custom` rather than `Legal` in order to
optimize some certain cases, this combiner would otherwise miss some
improvements.
Previously, using `isOperationLegalOrCustom` was avoided due to the risk
of getting stuck in infinite loops (as noted in
61ec738b60).
After testing, the issue no longer reproduces, but the coverage is
limited to the regression/unit tests and the test-suite.
Closes https://github.com/llvm/llvm-project/issues/155345.
In the original case, we have one frozen use and two unfrozen uses:
```
t73: i8 = select t81, Constant:i8<0>, t18
t75: i8 = select t10, t18, t73
t59: i8 = freeze t18 (combining)
t80: i8 = freeze t59 (another user of t59)
```
In `DAGCombiner::visitFREEZE`, we replace all uses of `t18` with `t59`.
After updating the uses, `t59: i8 = freeze t18` will be updated to `t59:
i8 = freeze t59` (`AddModifiedNodeToCSEMaps`) and CSEed into `t80: i8 =
freeze t59` (`ReplaceAllUsesWith`). As the previous call to
`AddModifiedNodeToCSEMaps` already removed `t59` from the CSE map,
`ReplaceAllUsesWith` cannot remove `t59` again.
For clarity, see the following call graph:
```
ReplaceAllUsesOfValueWith(t18, t59)
ReplaceAllUsesWith(t18, t59)
RemoveNodeFromCSEMaps(t73)
update t73
AddModifiedNodeToCSEMaps(t73)
RemoveNodeFromCSEMaps(t75)
update t75
AddModifiedNodeToCSEMaps(t75)
RemoveNodeFromCSEMaps(t59) <- first delection
update t59
AddModifiedNodeToCSEMaps(t59)
ReplaceAllUsesWith(t59, t80)
RemoveNodeFromCSEMaps(t59) <- second delection
Boom!
```
This patch unfreezes all the uses first to avoid triggering CSE when
introducing cycles.
If the srl+shl have the same shift amount and the shl has the nuw flag,
we can remove both.
In the affected test, the InterleavedAccess pass will emit a udiv after
the `mul nuw`. We expect them to combine away. The remaining shifts on
the RV64 tests are because we didn't add the zeroext attribute to the
incoming evl operand.
The original version of this change inadvertently dropped
b6e19b35cd87f3167a0f04a61a12016b935ab1ea. This version retains that fix
as well as adding tests for it and an explanation for why it is needed.
Verify it's a multiple of the result vector element count
instead of asserting this in random combines.
The testcase in #153808 fails in the wrong point. Add an
assert to getNode so the invalid extract asserts at construction
instead of use.