4145 Commits

Author SHA1 Message Date
Kazu Hirata
31b8ba5670
[Analysis, CodeGen] Use ArrayRef instead of const ArrayRef (NFC) (#166026)
This patch improves readability by using "ArrayRef<T>" instead of
"const ArrayRef<T>" and "const ArrayRef<T> &" in function parameter
types.
2025-11-01 23:20:19 -07:00
Fabian Ritter
8ea447b4c4
[SDAG] Set InBounds when when computing offsets into memory objects (#165425)
When a load or store accesses N bytes starting from a pointer P, and we want to
compute an offset pointer within these N bytes after P, we know that the
arithmetic to add the offset must be inbounds. This is for example relevant
when legalizing too-wide memory accesses, when lowering memcpy&Co., or when
optimizing "vector-load -> extractelement" into an offset load.

For SWDEV-516125.
2025-10-31 11:27:55 +01:00
Fabian Ritter
a85e84b854
[SDAG] Preserve InBounds in DAGCombines (#165424)
This PR preserves the InBounds flag (#162477) where possible in PTRADD-related
DAGCombines. We can't preserve them in all the cases that we could in the
analogous GISel change (#152495) because SDAG usually represents pointers as
integers, which means that pointer provenance is not preserved between PTRADD
operations (see the discussion at PR #162477 for more details). This PR marks
the places in the DAGCombiner where this is relevant explicitly.

For SWDEV-516125.
2025-10-31 10:25:39 +01:00
Princeton Ferro
68e74f8f84
[DAGCombiner] Lower dynamic insertelt chain more efficiently (#162368)
For an insertelt with a dynamic index, the default handling in
DAGTypeLegalizer and LegalizeDAG will reserve a stack slot for the
vector, lower the insertelt to a store, then load the modified vector
back into temporaries. The vector store and load may be legalized into a
sequence of smaller operations depending on the target.

Let V = the vector size and L = the length of a chain of insertelts with
dynamic indices. In the worse case, this chain will lower to O(VL)
operations, which can increase code size dramatically.

Instead, identify such chains, reserve one stack slot for the vector,
and lower all of the insertelts to stores at once. This requires only
O(V + L) operations. This change only affects the default lowering
behavior.
2025-10-29 09:46:01 -07:00
Lauren
e964acf85f
[DAG] Fold mismatched widened avg idioms to narrow form (#147946) (#163366)
[DAG] Fold mismatched widened avg idioms to narrow form (fixes half of
[llvm#147946](https://github.com/llvm/llvm-project/issues/147946))

1. `trunc(avgceilu(sext(x), sext(y))) -> avgceils(x, y)` 
2. `trunc(avgceils(zext(x), zext(y))) -> avgceilu(x, y)`

When inputs are sign-extended, unsigned and signed averaging operations
produce identical results after truncation, allowing us to use the
semantically correct narrow operation.

alive2: https://alive2.llvm.org/ce/z/ZRbfHT
2025-10-27 12:24:41 +00:00
David Green
332f786a35
[DAG][AArch64] Ensure that ResNo is correct for uses of Ptr when considering postinc. (#164810)
We might be looking at a different use, for example in the uses of a
i32,i64,ch preindex load.

Fixes #164775
2025-10-24 11:33:08 +01:00
paperchalice
15d11ebc84
[NFC] "unsafe-fp-math" post cleanup (code comments part) (#164582) 2025-10-22 11:07:23 +00:00
kper
e83eee335c
[DAG] Create SDPatternMatch method m_SelectLike to match ISD::Select and ISD::VSelect (#164069)
Fixes #150019
2025-10-22 09:49:34 +00:00
Simon Pilgrim
f8edcba62d
[DAG] visitTRUNCATE - more aggressively fold trunc(add(x,x)) -> add(trunc(x),trunc(x)) (#164227)
We're very careful not to truncate binary arithmetic ops if it will
affect legality, or cause additional truncation instructions, hence we
currently limit this to cases where one operand is constant.

But if both ops are the same (i.e. for some add/mul cases) then we
wouldn't increase the number of truncations, so can be slightly more
aggressive at folding the truncation.
2025-10-21 10:17:57 +00:00
Simon Pilgrim
a51e498ea6
[DAG] combineTruncationShuffle - ensure the *_EXTEND_VECTOR_INREG node didn't come from a smaller type (#164160)
The *_EXTEND_VECTOR_INREG source vector must be the same size as the destination

We already have a similar TODO to handle more types.

Fixes #164107
2025-10-19 14:15:33 +00:00
paperchalice
bfee9db785
[DAGCombiner] Remove NoNaNsFPMath uses (#163504)
Users should use `nnan` flag instead.
2025-10-15 21:22:13 +08:00
paperchalice
dd44e63c8e
[DAGCombiner] Use FlagInserter in visitFSQRT (#163301)
Propagate fast-math flags for TLI.getSqrtEstimate etc.
2025-10-15 09:03:15 +08:00
Paul Walker
d7fc770340
[LLVM][DAGCombiner] Improve simplifyDivRem's effectiveness after type legalisation. (#162706)
simplifyDivRem does not work as well after type legalisation because
splatted constants can have a size mismatch between the scalar to splat
and the element type of the splatted result. simplifyDivRem does not
seem to care about this mismatch so I've updated the "is one" check
for the divisor to allow truncation.
2025-10-14 11:23:53 +01:00
Sam Parker
1820102167
Wasm fmuladd relaxed (#163177)
Reland #161355, after fixing up the cross-projects-tests for the wasm
simd intrinsics.

Original commit message:
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions.
If we have FP16, then lower v8f16 fmuladds to FMA.

I've introduced an ISD node for fmuladd to maintain the rounding
ambiguity through legalization / combine / isel.
2025-10-13 16:50:53 +01:00
Sam Parker
30d3441cf0
Revert "[WebAssembly] Lower fmuladd to madd and nmadd" (#163171)
Reverts llvm/llvm-project#161355

Looks like I've broken some intrinsic code generation.
2025-10-13 11:53:40 +01:00
Sam Parker
a4eb7ea225
[WebAssembly] Lower fmuladd to madd and nmadd (#161355)
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions.
If we have FP16, then lower v8f16 fmuladds to FMA.

I've introduced an ISD node for fmuladd to maintain the rounding
ambiguity through legalization / combine / isel.
2025-10-13 10:36:08 +01:00
paperchalice
a61107472b
[SelectionDAG] Remove NoInfsFPMath uses (#162788)
Users should use fast-math flags instead.
2025-10-12 09:34:24 +08:00
AZero13
d95f8ffee4
[ARM][TargetLowering] Combine Level should not be a factor in shouldFoldConstantShiftPairToMask (NFC) (#156949)
This should be based on the type and instructions, and only thumb uses
combine level anyway.
2025-10-11 10:58:48 +09:00
Yi-Chi Lee
a9c8e94b43
[DAGCombiner] Extend FP-to-Int cast without requiring nsz (#161093)
This patch updates the FP-to-Int conversion handling:
- For signed integers: use `ftrunc` followed by clamping to the target
integer range.
- For unsigned integers: apply `fabs` + `ftrunc`, then clamp.

This removes the previous dependence on `nsz` and ensures correct
lowering for both signed and unsigned cases.

I've tested the code generation of -mtriple=amdgcn. It seems that the
assembly code is expected, but I'm not sure how to write a general
testcase for every target.

Fixes #160623.
2025-10-11 00:34:33 +09:00
Lewis Crawford
4c2b1d495a
[DAGCombiner] Improve FMin/FMax DAGCombines (#161352)
Add several improvements to DAGCombine patterns for fmin/fmax:
 * Fix incorrect results due to minimumnum not being marked as IsMin
    - e.g. nnan minimumnum(x, +inf) returned +inf instead of x
 * Fix incorrect results checking maximumnum for vecreduce patterns
 * Make maxnum/minnum return QNaN if one input is SNaN instead of X
 * Quiet SNaN inputs when propagating them e.g. maximum(x, SNaN) = QNaN
 * Update comments to mark when SNaN propagation is being ignored
2025-10-09 18:00:50 +01:00
paperchalice
4967bc17df
[DAGCombiner] Remove NoSignedZerosFPMath in visitFNEG (#162052)
Remove the `NoSignedZerosFPMath` use in `visitFNEG`. Now the only use of
`NoSignedZerosFPMath` is in `foldFPToIntToFP`, but adding fast-math
flags support for `uitofp` may introduce breaking changes.
2025-10-08 17:01:47 +08:00
Yatao Wang
178e2a704b
[LLVM][CodeGen] Check Non Saturate Case in isSaturatingMinMax (#160637)
Fix Issue #160611
2025-10-03 20:39:45 +01:00
Florian Hahn
e86b3386fd
[DAGCombine] Support (shl %x, constant) in foldPartialReduceMLAMulOp. (#160663)
Support shifts in foldPartialReduceMLAMulOp by treating (shl %x, %c) as
(mul %x, (shl 1, %c)).

PR: https://github.com/llvm/llvm-project/pull/160663
2025-10-01 09:06:01 +00:00
paperchalice
c6d3b517ee
[DAGCombiner] Remove most NoSignedZerosFPMath uses (#161180)
Remained two uses are related to fneg and foldFPToIntToFP, some AMDGPU
tests are duplicated and regenerated.
2025-09-30 11:44:34 +08:00
paperchalice
84e4c0686e
[DAGCombiner] Remove NoSignedZerosFPMath uses in visitFSUB (#160974)
Remove NoSignedZerosFPMath in visitFSUB part, we should always use
instruction level fast math flags.
2025-09-29 19:19:18 +08:00
paperchalice
1e01c02996
[DAGCombiner] Remove NoSignedZerosFPMath uses in visitFADD (#160635)
Remove these global flags and use node level flags instead.
2025-09-26 11:24:02 +08:00
kper
0b1318f2a8
[DAG] Fold rem(rem(A, BCst), Op1Cst) -> rem(A, Op1Cst) (#159517)
Fixes [157370](https://github.com/llvm/llvm-project/issues/157370)

UREM General proof: https://alive2.llvm.org/ce/z/b_GQJX
SREM General proof: https://alive2.llvm.org/ce/z/Whkaxh

I have added it as rv32i and rv64i tests because they are the only architectures where I could verify that it works.
2025-09-22 09:30:10 +00:00
Abhishek Kaushik
f65d5a7a56
[DAG] Skip mstore combine for <1 x ty> vectors (#159915)
Fixes #159912
2025-09-21 11:06:49 -07:00
Fabian Ritter
d5607694e1
[AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (#146075)
If we can't fold a PTRADD's offset into its users, lowering them to
disjoint ORs is preferable: Often, a 32-bit OR instruction suffices
where we'd otherwise use a pair of 32-bit additions with carry.

This needs to be a DAGCombine (and not a selection rule) because its
main purpose is to enable subsequent DAGCombines for bitwise operations.
We don't want to just turn PTRADDs into disjoint ORs whenever that's
sound because this transform loses the information that the operation
implements pointer arithmetic, which AMDGPU for instance needs when
folding constant offsets.

For SWDEV-516125.
2025-09-19 11:58:41 +02:00
Fabian Ritter
771c94c8db
[SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD transforms (#146074)
This PR adds a TargetLowering hook, canTransformPtrArithOutOfBounds,
that targets can use to allow transformations to introduce out-of-bounds
pointer arithmetic. It also moves two such transformations from the
AMDGPU-specific DAG combines to the generic DAGCombiner.

This is motivated by target features like AArch64's checked pointer
arithmetic, CPA, which does not tolerate the introduction of
out-of-bounds pointer arithmetic.
2025-09-19 11:07:59 +02:00
Björn Pettersson
1c4c7bd808
[SelectionDAG] Deal with POISON for INSERT_VECTOR_ELT/INSERT_SUBVECTOR (#143102)
As reported in https://github.com/llvm/llvm-project/issues/141034
SelectionDAG::getNode had some unexpected
behaviors when trying to create vectors with UNDEF elements. Since
we treat both UNDEF and POISON as undefined (when using isUndef())
we can't just fold away INSERT_VECTOR_ELT/INSERT_SUBVECTOR based on
isUndef(), as that could make the resulting vector more poisonous.

Same kind of bug existed in DAGCombiner::visitINSERT_SUBVECTOR.

Here are some examples:

This fold was done even if vec[idx] was POISON:
  INSERT_VECTOR_ELT vec, UNDEF, idx -> vec

This fold was done even if any of vec[idx..idx+size] was POISON:
  INSERT_SUBVECTOR vec, UNDEF, idx -> vec

This fold was done even if the elements not extracted from vec could
be POISON:
  sub = EXTRACT_SUBVECTOR vec, idx
  INSERT_SUBVECTOR UNDEF, sub, idx -> vec

With this patch we avoid such folds unless we can prove that the
result isn't more poisonous when eliminating the insert.

Fixes https://github.com/llvm/llvm-project/issues/141034
2025-09-17 21:04:00 +00:00
guan jian
6aab826e23
[DAGCombiner] add fold (xor (smin(x, C), C)) and fold (xor (smax(x, C), C)) (#155141)
Hi, I compared the following LLVM IR with GCC and Clang, and there is a small difference between the two. The LLVM IR is:
```
define i64 @test_smin_neg_one(i64 %a) {
  %1 = tail call i64 @llvm.smin.i64(i64 %a, i64 -1)
  %retval.0 = xor i64 %1, -1
  ret i64 %retval.0
}
```
GCC generates:
```
	cmp	x0, 0
	csinv	x0, xzr, x0, ge
	ret
```
Clang generates:
```
	cmn	x0, #1
	csinv	x8, x0, xzr, lt
	mvn	x0, x8
	ret
```
Clang keeps flipping x0 through x8 unnecessarily.
So I added the following folds to DAGCombiner:
fold (xor (smax(x, C), C)) -> select (x > C), xor(x, C), 0
fold (xor (smin(x, C), C)) -> select (x < C), xor(x, C), 0

alive2: https://alive2.llvm.org/ce/z/gffoir

---------

Co-authored-by: Yui5427 <785369607@qq.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-09-16 15:30:57 +00:00
Arthur Eubanks
984251acad
Revert "[DAGCombiner] Relax condition for extract_vector_elt combine" (#157953)
Reverts llvm/llvm-project#157658

Causes hangs, see
https://github.com/llvm/llvm-project/pull/157658#issuecomment-3276441812
2025-09-10 21:33:44 +00:00
ZhaoQi
4621e17dee
[DAGCombiner] Relax condition for extract_vector_elt combine (#157658)
Checking `isOperationLegalOrCustom` instead of `isOperationLegal` allows
more optimization opportunities. In particular, if a target wants to
mark `extract_vector_elt` as `Custom` rather than `Legal` in order to
optimize some certain cases, this combiner would otherwise miss some
improvements.

Previously, using `isOperationLegalOrCustom` was avoided due to the risk
of getting stuck in infinite loops (as noted in
61ec738b60).
After testing, the issue no longer reproduces, but the coverage is
limited to the regression/unit tests and the test-suite.
2025-09-10 15:51:52 +08:00
guan jian
83af24dd85
[DAG] Generalize fold (not (neg x)) -> (add X, -1) (#154348)
Generalize `fold (not (neg x)) -> (add X, -1)` to `fold (not (sub Y, X)) -> (add X, ~Y)`

---------

Co-authored-by: Yui5427 <785369607@qq.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-09-08 15:12:59 +00:00
paperchalice
667f919214
[SelectionDAG][ARM] Propagate fast math flags in visitBRCOND (#156647)
Factor out from #151275.
2025-09-06 20:44:25 +08:00
Yingwei Zheng
0d29279465
[DAGCombine] Propagate nuw when evaluating sub with narrower types (#156710)
Proof: https://alive2.llvm.org/ce/z/cdbzSL
Closes https://github.com/llvm/llvm-project/issues/156559.
2025-09-04 10:17:45 +08:00
Yingwei Zheng
5d111a20c5
[DAGCombiner] Avoid double deletion when replacing multiple frozen/unfrozen uses (#155427)
Closes https://github.com/llvm/llvm-project/issues/155345.
In the original case, we have one frozen use and two unfrozen uses:
```
t73: i8 = select t81, Constant:i8<0>, t18
t75: i8 = select t10, t18, t73
t59: i8 = freeze t18 (combining)

t80: i8 = freeze t59 (another user of t59)
```

In `DAGCombiner::visitFREEZE`, we replace all uses of `t18` with `t59`.
After updating the uses, `t59: i8 = freeze t18` will be updated to `t59:
i8 = freeze t59` (`AddModifiedNodeToCSEMaps`) and CSEed into `t80: i8 =
freeze t59` (`ReplaceAllUsesWith`). As the previous call to
`AddModifiedNodeToCSEMaps` already removed `t59` from the CSE map,
`ReplaceAllUsesWith` cannot remove `t59` again.

For clarity, see the following call graph:
```
ReplaceAllUsesOfValueWith(t18, t59)
  ReplaceAllUsesWith(t18, t59)
    RemoveNodeFromCSEMaps(t73)
    update t73
    AddModifiedNodeToCSEMaps(t73)
    RemoveNodeFromCSEMaps(t75)
    update t75
    AddModifiedNodeToCSEMaps(t75)
    RemoveNodeFromCSEMaps(t59) <- first delection
    update t59
    AddModifiedNodeToCSEMaps(t59)
        ReplaceAllUsesWith(t59, t80)
            RemoveNodeFromCSEMaps(t59) <- second delection
                Boom!
```

This patch unfreezes all the uses first to avoid triggering CSE when
introducing cycles.
2025-08-27 11:21:22 +08:00
Craig Topper
56289647be
[DAGCombiner] Preserve nuw when converting mul to shl. Use nuw in srl+shl combine. (#155043)
If the srl+shl have the same shift amount and the shl has the nuw flag,
we can remove both.

In the affected test, the InterleavedAccess pass will emit a udiv after
the `mul nuw`. We expect them to combine away. The remaining shifts on
the RV64 tests are because we didn't add the zeroext attribute to the
incoming evl operand.
2025-08-25 20:44:06 -07:00
Alex MacLean
8ab917a241
Reland "[NVPTX] Legalize aext-load to zext-load to expose more DAG combines" (#155063)
The original version of this change inadvertently dropped
b6e19b35cd87f3167a0f04a61a12016b935ab1ea. This version retains that fix
as well as adding tests for it and an explanation for why it is needed.
2025-08-25 09:15:44 -07:00
XChy
fd330dedcb
[DAG] Constant fold ISD::FSHL/FSHR nodes (#154480)
Fixes #153612.
This patch handles trinary scalar integers for FSHL/R in
`FoldConstantArithmetic`.
Pending until #153790 is merged.
2025-08-23 10:08:21 +09:00
Joseph Huber
d439c9ea4a Revert "[NVPTX] Legalize aext-load to zext-load to expose more DAG combines (#154251)"
Causes failures in the LLVM libc test suite
https://lab.llvm.org/buildbot/#/builders/69/builds/26327/steps/12/logs/stdio.

This reverts commit a3ed96b899baddd4865f1ef09f01a83da011db5c.
2025-08-22 16:13:58 -05:00
paperchalice
2014890c09
[SelectionDAG] Remove UnsafeFPMath in visitFP_ROUND (#154768)
Remove `UnsafeFPMath` in `visitFP_ROUND` part, it blocks some bugfixes
related to clang and the ultimate goal is to remove `resetTargetOptions`
method in `TargetMachine`, see FIXME in `resetTargetOptions`.
See also
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast

https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract

Now all UnsafeFPMath uses are eliminated in LLVMCodeGen
2025-08-22 19:46:33 +08:00
paperchalice
945a186089
[DAGCombiner] Remove most UnsafeFPMath references (#146295)
This pull request removes all references to `UnsafeFPMath` in dag
combiner except FP_ROUND.
- Set fast math flags in some tests.
2025-08-22 15:27:25 +08:00
Alex MacLean
a3ed96b899
[NVPTX] Legalize aext-load to zext-load to expose more DAG combines (#154251) 2025-08-21 15:33:23 -07:00
Jim Lin
fd28257195
[DAGCombiner] Fold umax/umin operations with vscale operands (#154461)
If umax/umin operations with vscale operands, that can be constant
folded.
2025-08-21 09:15:40 +08:00
Matt Arsenault
276c1d8114
DAG: Add assert to getNode for EXTRACT_SUBVECTOR indexes (#154099)
Verify it's a multiple of the result vector element count
instead of asserting this in random combines.

The testcase in #153808 fails in the wrong point. Add an
assert to getNode so the invalid extract asserts at construction
instead of use.
2025-08-20 09:55:43 +09:00
Simon Pilgrim
fcb36ca8cc
[DAG] visitTRUNCATE - merge the trunc(abd) and trunc(avg) handling which are almost identical (#154301)
CC @houngkoungting
2025-08-19 12:59:39 +01:00
黃國庭
0773854575
[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have sufficient leading zero/sign bits (#152273)
avgceil version :  https://alive2.llvm.org/ce/z/2CKrRh  

Fixes #147773 

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-18 16:36:26 +01:00
Simon Pilgrim
858d1dfa2c
[DAG] visitTRUNCATE - early out from computeKnownBits/ComputeNumSignBits failures. NFC. (#154111)
Avoid unnecessary (costly) computeKnownBits/ComputeNumSignBits calls - use MaskedValueIsZero instead of computeKnownBits directly to simplify code.
2025-08-18 14:55:09 +01:00