3704 Commits

Author SHA1 Message Date
Jonas Paulsson
e32e147d6c
[DAGCombiner] Don't drop alignment info of original load. (#75626)
Pass the original MMO instead of different individual values.

getAlign() was used before where actually getOriginalAlign() would have been
better, and this patch has the same effect.
2023-12-19 16:30:47 +01:00
Rin
0894c2ee5f
[DAGCombiner] Avoid the pre-truncate of BUILD_VECTOR sources. (#75792)
Avoid the pre-truncate of BUILD_VECTOR sources when there is more than
one use. This can avoid using unnecessary movs later down the
instruction selection pipeline.
2023-12-19 15:25:38 +00:00
Simon Pilgrim
7b1e4239b3
[DAG] Fold (vt trunc (extload (vt x))) -> (vt load x) (#75229)
We were only folding cases which remained extloads, but DAG.getExtLoad can also handle the cases which don't need to extend at all (we just can't do truncloads).

reduceLoadWidth can handle this for scalar loads, but not for vectors.

Noticed while triaging D152928
2023-12-18 16:21:11 +00:00
Philip Reames
e8a15eca92
[RISCV] Prefer whole register loads and stores when VL=VLMAX (#75531)
If we're lowering a fixed length vector load or store which happens to
exactly VLEN in size (when VLEN is exactly known), we can use a whole
register load or store instead of the unit strided variants. This
doesn't require a vsetvli in some cases, allows additional flexibility
of vsetvli cases in others, and doesn't have a runtime dependency on the
value of VL.
2023-12-15 09:26:57 -08:00
Craig Topper
2a21260ea8 [SelectionDAG] Use getVectorElementPointer in DAGCombiner::replaceStoreOfInsertLoad. (#74249)
This ensures we clip the index to be in bounds of the vector we are
inserting into. If the index is out of bounds the results of the insert
element is poison. If we don't clip the index we can write memory that
was not part of the original store.

Fixes #74248 #75557.
2023-12-14 20:25:16 -08:00
Simon Pilgrim
39093102ca [DAG] visitTRUNCATE - format (truncate (load x)) fold code.
Reduces diff in #75229
2023-12-14 15:13:38 +00:00
Simon Pilgrim
f1200ca7ac
[DAG] visitEXTRACT_VECTOR_ELT - constant fold legal fp imm values (#74304)
If we're extracting a constant floating point value, and the constant is a legal fp imm value, then replace the extraction with a fp constant.
2023-12-07 14:56:12 +00:00
Simon Pilgrim
22df0886a1
[DAG] Don't split f64 constant stores if the fp imm is legal (#74622)
If the target can generate a specific fp immediate constant, then don't split the store into 2 x i32 stores

Another cleanup step for #74304
2023-12-07 10:33:03 +00:00
Vitaly Buka
7e3aeee3bf
[NFC][asan] Replace AsanInited/ENSURE_ASAN_INITED with TryAsanInitFromRtl (#74172) 2023-12-04 14:56:21 -08:00
Craig Topper
5bc391a7c9
[SelectionDAG] Use getVectorElementPointer in DAGCombiner::replaceStoreOfInsertLoad. (#74249)
This ensures we clip the index to be in bounds of the vector we are
inserting into. If the index is out of bounds the results of the insert
element is poison. If we don't clip the index we can write memory that
was not part of the original store.

Fixes #74248.
2023-12-04 11:11:37 -08:00
Philip Reames
93e156833b
[DAG] Fix a miscompile in insert_subvector undef (insert_subvector undef, ..), idx combine (#73587)
The combine was implicitly assuming that the index on the outer
insert_subvector meant the same thing when the source was switched to be
the index of the inner insert_subvector. This is not true if the
innermost sub-vector is fixed, and the outer subvector is scalable.

I could do a less restrictive fix here - i.e. allow the case where the
scalability of the subvectors are the same - but there's no test
coverage which shows this transform actually has profit. Given that, go
for the simplest fix.
2023-11-27 16:45:29 -08:00
Sander de Smalen
81b7f115fb
[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979)
It seems TypeSize is currently broken in the sense that:

  TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)

without failing its assert that explicitly tests for this case:

  assert(LHS.Scalable == RHS.Scalable && ...);

The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.

This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.

The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
2023-11-22 08:52:53 +00:00
Yeting Kuo
a756a6b97e
[TargetLowering][RISCV] Introduce shouldFoldSelectWithSingleBitTest and RISC-V implement. (#72978)
DAGCombiner folds (select_cc seteq (and x, y), 0, 0, A) to (and (sra
(shl x)) A) where y has a single bit set. Previously, DAGCombiner relies
on `shouldAvoidTransformToShift` to decide when to do the combine, but
`shouldAvoidTransformToShift` is only about shift cost. This patch
introuduces a specific hook to decide when to do the combine and disable
the combine when Zicond enabled and AndMask <= 1024.
2023-11-22 08:22:14 +08:00
Simon Pilgrim
761a963dfc
[DAG] narrowExtractedVectorBinOp - ensure we limit late node creation to LegalOperations only (#72130)
Avoids infinite issues in some upcoming patches to help D152928 - x86 sees a number of regressions that are addressed by extending SimplifyDemandedVectorEltsForTargetNode to cover more binop opcodes
2023-11-20 10:56:41 +00:00
Noah Goldstein
ed7c97e0ad Recommit "[DAGCombiner] Transform (icmp eq/ne (and X,C0),(shift X,C1)) to use rotate or to getter constants." (2nd Try)
Added missing check that the mask and shift amount added up to correct
bitwidth as well as test cases for the bug.

Closes #71729
2023-11-19 12:15:04 -06:00
Simon Pilgrim
de41396895 [DAG] foldABSToABD - add support for abs(sub(sign_extend_inreg(),sign_extend_inreg())) patterns
Partial fix for ABDS regressions on D152928
2023-11-15 15:49:30 +00:00
Simon Pilgrim
9180b9f2be [DAG] foldABSToABD - rename operand value types. NFC.
Match operand variable naming.
2023-11-15 15:44:35 +00:00
Acim-Maravic
f3138524db
[AMDGPU] Generic lowering for rint and nearbyint (#69596)
The are three different rounding intrinsics, that are brought down to
same instruction.

Co-authored-by: Acim Maravic <acim.maravic@amd.com>
2023-11-14 18:49:21 +01:00
Simon Pilgrim
074e4ae0e7
[DAG] foldABSToABD - support abs(*ext(x) - *ext(y)) -> zext(abd*(x, y)) from different extension source types (#71670)
We currently limit the fold to cases where we're extending from the same source type, but we can safely perform this using the wider of mismatching source types (we're really just interested in having extension bits on both sources), ensuring we don't create additional extensions/truncations.
2023-11-14 12:56:42 +00:00
alexfh
067632e141
Revert "[DAGCombiner] Transform (icmp eq/ne (and X,C0),(shift X,C1)) to use rotate or to getter constants." due to a miscompile (#71598)
- Revert "[DAGCombiner] Transform `(icmp eq/ne (and X,C0),(shift X,C1))`
to use rotate or to getter constants." - causes a miscompile, see
112e49b381 (commitcomment-131943923)
- Revert "[X86] Fix gcc warning about mix of enumeral and non-enumeral
types. NFC", which fixes a compiler warning in the commit above
2023-11-08 15:07:12 +01:00
Simon Pilgrim
1085b70a94 [DAG] Don't fold (zext (bitop (load x), cst)) -> (bitop (zextload x), (zext cst)) if the zext is free
Prevents an infinite loop if we've been trying to narrow the bitop to a more preferable type
2023-11-04 15:32:13 +00:00
Craig Topper
70b35ec0a8
[SelectionDAG] Add initial support for nneg flag on ISD::ZERO_EXTEND. (#70872)
This adds the nneg flag to SDNodeFlags and the node printing code.
SelectionDAGBuilder will add this flag to the node if the target doesn't
prefer sign extend.

A future RISC-V patch can remove the sign extend preference from
SelectionDAGBuilder.

I've also added the flag to the DAG combine that converts
ISD::SIGN_EXTEND to ISD::ZERO_EXTEND.
2023-11-03 11:15:08 -07:00
Craig Topper
20020c1b43
[DAGCombiner] Fix misuse of getZeroExtendInReg in SimplifySelectCC. (#70066)
If VT has less bits than SCC, using a ZeroExtendInReg isn't going to fix
it. That's an AND instruction. We need to truncate the value instead.

This should be ok because we already checked that the boolean contents
is ZeroOrOne so the setcc can only produce 0 or 1.

No test because I found this while trying to make i32 legal for RISC-V
64 which I'm not ready to upload yet. You can see in the coverage report
that this line isn't tested today.


https://lab.llvm.org/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp.html#L27270
2023-10-24 12:35:55 -07:00
Pierre van Houtryve
2bc93584f5
[DAG] Constant Folding for U/SMUL_LOHI (#69437) 2023-10-24 07:37:55 +02:00
Ramkumar Ramachandra
98c90a13c6
ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering (#66924)
The issue #55208 noticed that std::rint is vectorized by the
SLPVectorizer, but a very similar function, std::lrint, is not.
std::lrint corresponds to ISD::LRINT in the SelectionDAG, and
std::llrint is a familiar cousin corresponding to ISD::LLRINT. Now,
neither ISD::LRINT nor ISD::LLRINT have a corresponding vector variant,
and the LangRef makes this clear in the documentation of llvm.lrint.*
and llvm.llrint.*.

This patch extends the LangRef to include vector variants of
llvm.lrint.* and llvm.llrint.*, and lays the necessary ground-work of
scalarizing it for all targets. However, this patch would be devoid of
motivation unless we show the utility of these new vector variants.
Hence, the RISCV target has been chosen to implement a custom lowering
to the vfcvt.x.f.v instruction. The patch also includes a CostModel for
RISCV, and a trivial follow-up can potentially enable the SLPVectorizer
to vectorize std::lrint and std::llrint, fixing #55208.

The patch includes tests, obviously for the RISCV target, but also for
the X86, AArch64, and PowerPC targets to justify the addition of the
vector variants to the LangRef.
2023-10-19 13:05:04 +01:00
Noah Goldstein
112e49b381 [DAGCombiner] Transform (icmp eq/ne (and X,C0),(shift X,C1)) to use rotate or to getter constants.
If `C0` is a mask and `C1` shifts out all the masked bits (to
essentially compare two subsets of `X`), we can arbitrarily re-order
shift as `srl` or `shl`.

If `C1` (shift amount) is a power of 2, we can replace the and+shift
with a rotate.

Otherwise, based on target preference we can arbitrarily swap `shl`
and `shl` in/out to get better constants.

On x86 we can use this re-ordering to:
    1) get better `and` constants for `C0` (zero extended moves or
       avoid imm64).
    2) covert `srl` to `shl` if `shl` will be implementable with `lea`
       or `add` (both of which can be preferable).

Proofs: https://alive2.llvm.org/ce/z/qzGM_w

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D152116
2023-10-18 01:16:55 -05:00
Pierre van Houtryve
c464fea779
[DAG] Constant fold FMAD (#69324)
This has very little effect on codegen in practice, but is a nice to
have I think.

See #68315
2023-10-18 07:46:24 +02:00
Björn Pettersson
4acb96c99f
[SelectionDAG] Tidy up around endianness and isConstantSplat (#68212)
The BuildVectorSDNode::isConstantSplat function could depend on
endianness, and it takes a bool argument that can be used to indicate
if big or little endian should be considered when internally casting
from a vector to a scalar. However, that argument is default set to
false (= little endian). And in many situations, even in target
generic code such as DAGCombiner, the endianness isn't specified when
using the function.

The intent with this patch is to highlight that endianness doesn't
matter, depending on the context in which the function is used.

In DAGCombiner the code is slightly refactored. Back in the days when
the code was written it wasn't possible to request a MinSplatBits
size when calling isConstantSplat. Instead the code re-expanded the
found SplatValue to match with the EltBitWidth. Now we can just
provide EltBitWidth as MinSplatBits and remove the logic for doing
the re-expand.

While being at it, tidying up around isConstantSplat, this patch also
adds an explicit check in BuildVectorSDNode::isConstantSplat to break
out from the loop if trying to split an on VecWidth into two halves.
Haven't been able to prove that there could be miscompiles involved
if not doing so. There are lit tests that trigger that scenario,
although I think they happen to later discard the returned SplatValue
for other reasons.
2023-10-16 14:53:53 +02:00
Yingwei Zheng
53c81a8c16
[RISCV][SDAG] Fix constant narrowing when narrowing loads (#69015)
When narrowing logic ops(OR/XOR) with constant rhs, `DAGCombiner` will fixup the constant rhs node.
It is incorrect when lhs is also a constant. For example, we will incorrectly replace `xor OpaqueConstant:i64<8191>, Constant:i64<-1>` with `xor (and OpaqueConstant:i64<8191>, Constant:i64<65535>), Constant:i64<-1>`.

Fixes #68855.
2023-10-14 06:38:17 +08:00
Jie Fu
573a083c1c [DAG] Remove unused variable 'VT' in DAGCombiner.cpp (NFC)
/llvm-project/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:26896:7: error: unused variable 'VT' [-Werror,-Wunused-variable]
  EVT VT = N->getValueType(0);
      ^
1 error generated.
2023-10-09 18:30:38 +08:00
Simon Pilgrim
072675f14e [DAG] foldSelectOfBinops - correctly handle select of binops where ResNo != 0
Correctly handle cases where the select(cond, binop(x, y), binop(z, y)) --> binop(select(cond, x, z), y) fold is selecting ResNo != 0 results (UADDO flags etc.)

Fixes #68539
2023-10-09 11:08:55 +01:00
Ben Mudd
6d6b395b53 [DebugInfo][SelectionDAG] Add debug info salvaging for TRUNC nodes
This patch adds support for salvaging TRUNC nodes during SelectionDAG,
fixing LLVM issue #63076:
  https://github.com/llvm/llvm-project/issues/63076

Reviewed in: https://github.com/llvm/llvm-project/pull/66922
2023-10-06 16:10:33 +01:00
Alexey Bataev
e22818d5c9 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-05 06:17:07 -07:00
Arthur Eubanks
07389535a7 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit b186f1f68be11630355afb0c08b80374a6d31782.

Causes crashes, see https://reviews.llvm.org/D158449.
2023-10-04 14:37:16 -07:00
Alexey Bataev
b186f1f68b [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-04 07:53:30 -07:00
Alexey Bataev
1129dec778 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 6f43d28f3452b3ef598bc12b761cfc2dbd0f34c9 to fix
a crash reported in https://reviews.llvm.org/D158449.
2023-10-03 13:02:16 -07:00
Alexey Bataev
6f43d28f34 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-03 10:26:11 -07:00
Simon Pilgrim
b4f591363c [DAG] visitSHL - move SimplifyDemandedBits after all standard folds to give them a chance to match
Pulled out of D155472
2023-10-02 16:09:35 +01:00
Alexey Bataev
ebcb5d59fc Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 9f5960e004ff54082ccfa9396522e07358f5b66b to fix
buildbots reported here https://lab.llvm.org/buildbot/#/builders/230/builds/19412.
2023-09-29 15:03:46 -07:00
Alexey Bataev
9f5960e004 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-29 13:16:03 -07:00
Jay Foad
6e3d2a4b38
[ISel] Fix another crash in new FMA DAG combine (#67818)
Following on from D135150, this patch fixes another crash caused by this
DAG combine:

fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)

The combine calls ReplaceAllUsesOfValueWith to replace (fmul C, D) with
(fma C, D, E). This can cause nodes to get CSEd. In D135150 the problem
was that the (fma C, D, E) node got CSEd away. In this new case, the
problem is that the outer fadd node gets CSEd away. To fix it we have
to return SDValue(N, 0) from the combine and be careful not to add a
deleted node to the worklist.
2023-09-29 17:18:23 +01:00
Alexey Bataev
3204f88a8b Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit c88c281cf1ac1a01c55231b93826d7c8ae83985b to fix the
crash revealed by https://lab.llvm.org/buildbot/#/builders/230/builds/19353.
2023-09-28 11:57:32 -07:00
Noah Goldstein
de7881ebf5 [DAGCombiner] Combine (select c, (and X, 1), 0) -> (and (zext c), X)
The middle end canonicalizes:
`(and (zext c), X)`
    -> `(select c, (and X, 1), 0)`

But the `and` + `zext` form gets better codegen.
2023-09-28 13:46:46 -05:00
Alexey Bataev
c88c281cf1 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-28 11:03:21 -07:00
David Green
03647e2e4b [AArch64] Handle scalable vectors in combineFMulOrFDivWithIntPow2.
The transform will still not trigger as takeInexpensiveLog2 will bail out for
any scalable vector, but this guards against a scalable typesize error.
2023-09-26 15:34:34 +01:00
Noah Goldstein
bc38c427d4 [DAGCombiner][AArch64] Fix incorrect cast VT in takeInexpensiveLog2
Previously, we where taking `CurVT` before finalizing `ToCast` which
meant potentially returning an `SDValue` with an illegal `ValueType`
for the operation.

Fix is to just take `CurVT` after we have finalized `ToCast` with
`PeekThroughCastsAndTrunc`.
2023-09-23 09:50:42 -05:00
Noah Goldstein
6d6314ba64 [DAGCombiner] Extend combineFMulOrFDivWithIntPow2 to work for non-splat float vecs
Do so by extending `matchUnaryPredicate` to also work for
`ConstantFPSDNode` types then encapsulate the constant checks in a
lambda and pass it to `matchUnaryPredicate`.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D154868
2023-09-20 13:28:24 -05:00
Noah Goldstein
47c642f9a0 [DAGCombiner] Fold IEEE fmul/fdiv by Pow2 to add/sub of exp
Note: This is moving D154678 which previously implemented this in
InstCombine. Concerns where brought up that this was de-canonicalizing
and really targeting a codegen improvement, so placing in DAGCombiner.

This implements:

```
(fmul C, (uitofp Pow2))
    -> (bitcast_to_FP (add (bitcast_to_INT C), Log2(Pow2) << mantissa))
(fdiv C, (uitofp Pow2))
    -> (bitcast_to_FP (sub (bitcast_to_INT C), Log2(Pow2) << mantissa))
```
The motivation is mostly fdiv where 2^(-p) is a fairly common
expression.

The patch is intentionally conservative about the transform, only
doing so if we:
    1) have IEEE floats
    2) C is normal
    3) add/sub of max(Log2(Pow2)) stays in the min/max exponent
       bounds.

Alive2 can't realistically prove this, but did test float16/float32
cases (within the bounds of the above rules) exhaustively.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D154805
2023-09-20 13:28:24 -05:00
Fraser Cormack
ebefe83c09 [NFC] Fix spelling 'constanst' -> 'constants' 2023-09-20 15:33:03 +01:00
Luke Lau
22d0bd8632
[DAGCombiner] Combine vp.strided.store with unit stride to vp.store (#66774)
This is the VP equivalent of #66677. If we have a strided store where
the stride is equal to the element width, we can just use a regular VP
store.
2023-09-19 16:43:50 +01:00