3679 Commits

Author SHA1 Message Date
Noah Goldstein
112e49b381 [DAGCombiner] Transform (icmp eq/ne (and X,C0),(shift X,C1)) to use rotate or to getter constants.
If `C0` is a mask and `C1` shifts out all the masked bits (to
essentially compare two subsets of `X`), we can arbitrarily re-order
shift as `srl` or `shl`.

If `C1` (shift amount) is a power of 2, we can replace the and+shift
with a rotate.

Otherwise, based on target preference we can arbitrarily swap `shl`
and `shl` in/out to get better constants.

On x86 we can use this re-ordering to:
    1) get better `and` constants for `C0` (zero extended moves or
       avoid imm64).
    2) covert `srl` to `shl` if `shl` will be implementable with `lea`
       or `add` (both of which can be preferable).

Proofs: https://alive2.llvm.org/ce/z/qzGM_w

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D152116
2023-10-18 01:16:55 -05:00
Pierre van Houtryve
c464fea779
[DAG] Constant fold FMAD (#69324)
This has very little effect on codegen in practice, but is a nice to
have I think.

See #68315
2023-10-18 07:46:24 +02:00
Björn Pettersson
4acb96c99f
[SelectionDAG] Tidy up around endianness and isConstantSplat (#68212)
The BuildVectorSDNode::isConstantSplat function could depend on
endianness, and it takes a bool argument that can be used to indicate
if big or little endian should be considered when internally casting
from a vector to a scalar. However, that argument is default set to
false (= little endian). And in many situations, even in target
generic code such as DAGCombiner, the endianness isn't specified when
using the function.

The intent with this patch is to highlight that endianness doesn't
matter, depending on the context in which the function is used.

In DAGCombiner the code is slightly refactored. Back in the days when
the code was written it wasn't possible to request a MinSplatBits
size when calling isConstantSplat. Instead the code re-expanded the
found SplatValue to match with the EltBitWidth. Now we can just
provide EltBitWidth as MinSplatBits and remove the logic for doing
the re-expand.

While being at it, tidying up around isConstantSplat, this patch also
adds an explicit check in BuildVectorSDNode::isConstantSplat to break
out from the loop if trying to split an on VecWidth into two halves.
Haven't been able to prove that there could be miscompiles involved
if not doing so. There are lit tests that trigger that scenario,
although I think they happen to later discard the returned SplatValue
for other reasons.
2023-10-16 14:53:53 +02:00
Yingwei Zheng
53c81a8c16
[RISCV][SDAG] Fix constant narrowing when narrowing loads (#69015)
When narrowing logic ops(OR/XOR) with constant rhs, `DAGCombiner` will fixup the constant rhs node.
It is incorrect when lhs is also a constant. For example, we will incorrectly replace `xor OpaqueConstant:i64<8191>, Constant:i64<-1>` with `xor (and OpaqueConstant:i64<8191>, Constant:i64<65535>), Constant:i64<-1>`.

Fixes #68855.
2023-10-14 06:38:17 +08:00
Jie Fu
573a083c1c [DAG] Remove unused variable 'VT' in DAGCombiner.cpp (NFC)
/llvm-project/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:26896:7: error: unused variable 'VT' [-Werror,-Wunused-variable]
  EVT VT = N->getValueType(0);
      ^
1 error generated.
2023-10-09 18:30:38 +08:00
Simon Pilgrim
072675f14e [DAG] foldSelectOfBinops - correctly handle select of binops where ResNo != 0
Correctly handle cases where the select(cond, binop(x, y), binop(z, y)) --> binop(select(cond, x, z), y) fold is selecting ResNo != 0 results (UADDO flags etc.)

Fixes #68539
2023-10-09 11:08:55 +01:00
Ben Mudd
6d6b395b53 [DebugInfo][SelectionDAG] Add debug info salvaging for TRUNC nodes
This patch adds support for salvaging TRUNC nodes during SelectionDAG,
fixing LLVM issue #63076:
  https://github.com/llvm/llvm-project/issues/63076

Reviewed in: https://github.com/llvm/llvm-project/pull/66922
2023-10-06 16:10:33 +01:00
Alexey Bataev
e22818d5c9 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-05 06:17:07 -07:00
Arthur Eubanks
07389535a7 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit b186f1f68be11630355afb0c08b80374a6d31782.

Causes crashes, see https://reviews.llvm.org/D158449.
2023-10-04 14:37:16 -07:00
Alexey Bataev
b186f1f68b [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-04 07:53:30 -07:00
Alexey Bataev
1129dec778 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 6f43d28f3452b3ef598bc12b761cfc2dbd0f34c9 to fix
a crash reported in https://reviews.llvm.org/D158449.
2023-10-03 13:02:16 -07:00
Alexey Bataev
6f43d28f34 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-03 10:26:11 -07:00
Simon Pilgrim
b4f591363c [DAG] visitSHL - move SimplifyDemandedBits after all standard folds to give them a chance to match
Pulled out of D155472
2023-10-02 16:09:35 +01:00
Alexey Bataev
ebcb5d59fc Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 9f5960e004ff54082ccfa9396522e07358f5b66b to fix
buildbots reported here https://lab.llvm.org/buildbot/#/builders/230/builds/19412.
2023-09-29 15:03:46 -07:00
Alexey Bataev
9f5960e004 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-29 13:16:03 -07:00
Jay Foad
6e3d2a4b38
[ISel] Fix another crash in new FMA DAG combine (#67818)
Following on from D135150, this patch fixes another crash caused by this
DAG combine:

fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)

The combine calls ReplaceAllUsesOfValueWith to replace (fmul C, D) with
(fma C, D, E). This can cause nodes to get CSEd. In D135150 the problem
was that the (fma C, D, E) node got CSEd away. In this new case, the
problem is that the outer fadd node gets CSEd away. To fix it we have
to return SDValue(N, 0) from the combine and be careful not to add a
deleted node to the worklist.
2023-09-29 17:18:23 +01:00
Alexey Bataev
3204f88a8b Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit c88c281cf1ac1a01c55231b93826d7c8ae83985b to fix the
crash revealed by https://lab.llvm.org/buildbot/#/builders/230/builds/19353.
2023-09-28 11:57:32 -07:00
Noah Goldstein
de7881ebf5 [DAGCombiner] Combine (select c, (and X, 1), 0) -> (and (zext c), X)
The middle end canonicalizes:
`(and (zext c), X)`
    -> `(select c, (and X, 1), 0)`

But the `and` + `zext` form gets better codegen.
2023-09-28 13:46:46 -05:00
Alexey Bataev
c88c281cf1 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-28 11:03:21 -07:00
David Green
03647e2e4b [AArch64] Handle scalable vectors in combineFMulOrFDivWithIntPow2.
The transform will still not trigger as takeInexpensiveLog2 will bail out for
any scalable vector, but this guards against a scalable typesize error.
2023-09-26 15:34:34 +01:00
Noah Goldstein
bc38c427d4 [DAGCombiner][AArch64] Fix incorrect cast VT in takeInexpensiveLog2
Previously, we where taking `CurVT` before finalizing `ToCast` which
meant potentially returning an `SDValue` with an illegal `ValueType`
for the operation.

Fix is to just take `CurVT` after we have finalized `ToCast` with
`PeekThroughCastsAndTrunc`.
2023-09-23 09:50:42 -05:00
Noah Goldstein
6d6314ba64 [DAGCombiner] Extend combineFMulOrFDivWithIntPow2 to work for non-splat float vecs
Do so by extending `matchUnaryPredicate` to also work for
`ConstantFPSDNode` types then encapsulate the constant checks in a
lambda and pass it to `matchUnaryPredicate`.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D154868
2023-09-20 13:28:24 -05:00
Noah Goldstein
47c642f9a0 [DAGCombiner] Fold IEEE fmul/fdiv by Pow2 to add/sub of exp
Note: This is moving D154678 which previously implemented this in
InstCombine. Concerns where brought up that this was de-canonicalizing
and really targeting a codegen improvement, so placing in DAGCombiner.

This implements:

```
(fmul C, (uitofp Pow2))
    -> (bitcast_to_FP (add (bitcast_to_INT C), Log2(Pow2) << mantissa))
(fdiv C, (uitofp Pow2))
    -> (bitcast_to_FP (sub (bitcast_to_INT C), Log2(Pow2) << mantissa))
```
The motivation is mostly fdiv where 2^(-p) is a fairly common
expression.

The patch is intentionally conservative about the transform, only
doing so if we:
    1) have IEEE floats
    2) C is normal
    3) add/sub of max(Log2(Pow2)) stays in the min/max exponent
       bounds.

Alive2 can't realistically prove this, but did test float16/float32
cases (within the bounds of the above rules) exhaustively.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D154805
2023-09-20 13:28:24 -05:00
Fraser Cormack
ebefe83c09 [NFC] Fix spelling 'constanst' -> 'constants' 2023-09-20 15:33:03 +01:00
Luke Lau
22d0bd8632
[DAGCombiner] Combine vp.strided.store with unit stride to vp.store (#66774)
This is the VP equivalent of #66677. If we have a strided store where
the stride is equal to the element width, we can just use a regular VP
store.
2023-09-19 16:43:50 +01:00
Luke Lau
469f6b9b4c
[DAGCombiner] Combine vp.strided.load with unit stride to vp.load (#66766)
This is the VP equivalent of #65674. We already combine MGATHER loads
with unit stride to MLOAD, so this extends it for
EXPERIMENTAL_VP_STRIDED_LOAD.
2023-09-19 16:39:28 +01:00
Sergei Barannikov
caaf61eb6e
[SDag] Fold saddo[_carry] with bitwise-not argument to ssubo[_carry] (#66571)
Fold `(saddo (not a), 1)` to `(ssubo 0, a)` and
`(saddo_carry (not a), b, c)` to `(ssubo_carry b, a, !c)`.

Proof: https://alive2.llvm.org/ce/z/Lj49YM

This is the same as https://reviews.llvm.org/D46505 and
https://reviews.llvm.org/D59208, but for signed opcodes.
2023-09-18 14:45:41 +03:00
Philip Reames
09a5aac514 [TLI] Add extend as explicit parameter to shouldRemoveExtendFromGSIndex [nfc]
Note: Reviewed as part of a stack of changes in PR# 66405.
2023-09-15 14:48:02 -07:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Philip Reames
61757fbd04 [DAG] Remove pointless peephole from refineUniformBase [nfc]
No need to special case add 0, N.  SelectionDAG::getNode contains the canonicalization and simplification for this case, so no need to duplicate it here.
2023-09-13 10:16:11 -07:00
Philip Reames
2f005df066
[DAG][X86] Fold mgather/mscatter/etc with splat index (#65980)
A splat index means the operation is reading from (writing to) the same
memory location. Generally, zero is the cheapest value to splat. As
such, we'd prefer to add the splatted value to the base, and use a
constant zero as the index operand.
2023-09-13 09:26:30 -07:00
Yingwei Zheng
4793c2c3de
[DAGCombiner][RISCV] Prefer to sext i32 non-negative values (#65984)
By default, `DAGCombiner` folds `sext x` to `zext x` when `x` is
non-negative. It will generate redundant `zext` inst seq on riscv64
(typically `slli (srli x, 32), 32`).
godbolt: https://godbolt.org/z/osf6adP1o
This patch applies the transform iff `zext` is **cheaper** than `sext`.
2023-09-12 19:02:35 +08:00
Mohamed Atef
741c127817 [SelectionDAG] Add computeOverflowForSignedMul / computeOverflowForUnsignedMul overflow handlers
Support signed multiplication
Support unsigned multiplication

Differential Revision: https://reviews.llvm.org/D159406
2023-09-07 10:03:18 +01:00
Simon Pilgrim
84447c044f [DAG] Add SelectionDAG::isADDLike helper. NFC.
Make the DAGCombine helper global so we can more easily reuse it.
2023-09-06 16:54:25 +01:00
Simon Pilgrim
e4d0e12099 [DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 << c2) (REAPPLIED)
Assuming the ADD is nsw then it may be sign-extended to merge with a SHL op in a similar fold to the existing (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold.

This is most useful for helping to expose address math for X86, but has also touched several aarch64 test cases as well.

Alive2: https://alive2.llvm.org/ce/z/2UpSbJ

Differential Revision: https://reviews.llvm.org/D159198
2023-09-06 13:19:42 +01:00
Dmitri Gribenko
97bf104d97 Revert "[DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 << c2)"
This reverts commit b027ce0ab93060bc6cb79d5402d21520e8b93fb7.

This commit breaks Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll.
2023-09-06 11:28:55 +02:00
Simon Pilgrim
b027ce0ab9 [DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 << c2)
Assuming the ADD is nsw then it may be sign-extended to merge with a SHL op in a similar fold to the existing (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold.

This is most useful for helping to expose address math for X86, but has also touched several aarch64 test cases as well.

Alive2: https://alive2.llvm.org/ce/z/2UpSbJ

Differential Revision: https://reviews.llvm.org/D159198
2023-09-06 10:06:21 +01:00
David Sherwood
50598f0ff4 [DAGCombiner][SVE] Add support for illegal extending masked loads
In some cases where the same mask is used for multiple
extending masked loads it can be more efficient to combine
the zero- or sign-extend into the load even if it's not a
legal or custom operation. This leads to splitting up the
extending load into smaller parts, which also requires
splitting the mask. For SVE at least this improves the
performance of the SPEC benchmark x264 slightly on
neoverse-v1 (~0.3%), and at least one other benchmark
improves by around 30%. The uplift for SVE seems due to
removing the dependencies (vector unpacks) introduced
between the loads and the vector operations, since this
should increase the level of parallelism.

See tests:

  CodeGen/AArch64/sve-masked-ldst-sext.ll
  CodeGen/AArch64/sve-masked-ldst-zext.ll

https://reviews.llvm.org/D159191
2023-09-05 10:41:21 +00:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
Konstantina Mitropoulou
17fc78e7a4 [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.
This reverts commit 48fa79a503a7cf380f98b6335fbd349afae1bd86.

Reviewed By: brooksmoses

Differential Revision: https://reviews.llvm.org/D159240
2023-08-31 11:36:50 -07:00
Luke Lau
3a4ad45a2c [DAGCombiner] Combine trunc (splat_vector x) -> splat_vector (trunc x)
From the discussion in https://reviews.llvm.org/D158853, moving the truncate
into the splat helps more splatted scalar operands get selected on RISC-V, and
also avoids the need for splat_vector_parts on RV32.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D159147
2023-08-30 15:22:57 +01:00
Simon Pilgrim
d037445f3a [DAG] visitSHL - use FoldConstantArithmetic to fold constants in (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold
Matches what we do in the (shl (mul x, c1), c2) -> (mul x, c1 << c2) fold as well as inside visitShiftByConstant
2023-08-29 18:52:24 +01:00
Konstantina Mitropoulou
48fa79a503 Revert "[DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points."
This reverts commit 5ec13535235d07eafd64058551bc495f87c283b1.
2023-08-24 20:39:04 -07:00
Konstantina Mitropoulou
5ec1353523 [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.
CMP(A,C)||CMP(B,C) => CMP(MIN/MAX(A,B), C)
CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C)

If the operands are proven to be non NaN, then the optimization can be applied
for all predicates.

We can apply the optimization for the following predicates for FMINNUM/FMAXNUM
(for quiet and signaling NaNs) and for FMINNUM_IEEE/FMAXNUM_IEEE if we can prove
that the operands are not signaling NaNs.
- ordered lt/le and ||
- ordered gt/ge and ||
- unordered lt/le and &&
- unordered gt/ge and &&

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155267
2023-08-24 10:48:56 -07:00
Simon Pilgrim
d254014fdb [DAG] Add willNotOverflowAdd/willNotOverflowSub helper functions.
Matches similar instructions on InstCombine
2023-08-24 17:52:54 +01:00
Craig Topper
2ad50f354a [DAGCombiner][RISCV][AArch64][PowerPC] Restrict foldAndOrOfSETCC from using SMIN/SMAX where and OR/AND would do.
This removes some diffs created by D153502.

I'm assuming an AND/OR won't be worse than an SMIN/SMAX. For
RISC-V at least, AND/OR can be a shorter encoding than SMIN/SMAX.

It's weird that we have two different functions responsible for
folding logic of setccs, but I'm not ready to try to untangle that.

I'm unclear if the PowerPC chang is a regression or not. It looks
like it might use more registers, but I don't understand PowerPC
register so I'm not sure.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158292
2023-08-23 20:26:23 -07:00
Peter Rong
f58fbfc746 [X86][CodeGen] Add a dag pattern to fix #64323
After recent patch D30189, #64323's error message become a new one.
When DAGCombiner was optimizing `(vextract (scalar_to_vector val, 0) -> val`, it didn't
consider the possibility that the inserted value type has less bit than the dest type.
This patch fixes that.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D158355
2023-08-23 10:50:32 -07:00
Simon Pilgrim
ba818c4019 [DAG] replaceStoreOfInsertLoad - don't fold if the inserted element is implicitly truncated
D152276 wasn't handling the case where the inserted element is implicitly truncated into the vector - resulting in a i1 element (implicitly truncated from i8) overwriting 8 bits instead of 1 bit.

This patch is intended to be merged into 17.x so I've just disallowed any vector element vs inserted element type mismatch - technically we could be more elegant and permit truncated stores (as long as the store is still byte sized), but the use cases for that are so limited I'd prefer to play it safe for now.

Candidate patch for #64655 17.x merge

Differential Revision: https://reviews.llvm.org/D158366
2023-08-21 11:22:07 +01:00
Jim Lin
18f5ada244 [DAGCombiner] Don't reduce BUILD_VECTOR to BITCAST before LegalizeTypes if VT is legal.
Targets may lose some optimization opportunities for certain vector operation
if we reduce BUILD_VECTOR to BITCAST early.

And if VT is not legal, reduce BUILD_VECTOR to BITCAST before LegailizeTypes
can get benefit. Because type-legalizer often scalarizes illegal type of vectors.

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D156645
2023-08-19 12:53:50 +08:00
Philip Reames
92e0c0dc1a [DAG] Restrict insert_subvector undef, splat_veector, dontcare transform
On the extract_subvector side, we already have the restriction. With D158201, we'd start getting unprofitable splat combines unless we add the same one on the extract_subvector side.

Differential Revision: https://reviews.llvm.org/D158202
2023-08-18 12:44:09 -07:00