3640 Commits

Author SHA1 Message Date
Konstantina Mitropoulou
17fc78e7a4 [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.
This reverts commit 48fa79a503a7cf380f98b6335fbd349afae1bd86.

Reviewed By: brooksmoses

Differential Revision: https://reviews.llvm.org/D159240
2023-08-31 11:36:50 -07:00
Luke Lau
3a4ad45a2c [DAGCombiner] Combine trunc (splat_vector x) -> splat_vector (trunc x)
From the discussion in https://reviews.llvm.org/D158853, moving the truncate
into the splat helps more splatted scalar operands get selected on RISC-V, and
also avoids the need for splat_vector_parts on RV32.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D159147
2023-08-30 15:22:57 +01:00
Simon Pilgrim
d037445f3a [DAG] visitSHL - use FoldConstantArithmetic to fold constants in (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold
Matches what we do in the (shl (mul x, c1), c2) -> (mul x, c1 << c2) fold as well as inside visitShiftByConstant
2023-08-29 18:52:24 +01:00
Konstantina Mitropoulou
48fa79a503 Revert "[DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points."
This reverts commit 5ec13535235d07eafd64058551bc495f87c283b1.
2023-08-24 20:39:04 -07:00
Konstantina Mitropoulou
5ec1353523 [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.
CMP(A,C)||CMP(B,C) => CMP(MIN/MAX(A,B), C)
CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C)

If the operands are proven to be non NaN, then the optimization can be applied
for all predicates.

We can apply the optimization for the following predicates for FMINNUM/FMAXNUM
(for quiet and signaling NaNs) and for FMINNUM_IEEE/FMAXNUM_IEEE if we can prove
that the operands are not signaling NaNs.
- ordered lt/le and ||
- ordered gt/ge and ||
- unordered lt/le and &&
- unordered gt/ge and &&

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155267
2023-08-24 10:48:56 -07:00
Simon Pilgrim
d254014fdb [DAG] Add willNotOverflowAdd/willNotOverflowSub helper functions.
Matches similar instructions on InstCombine
2023-08-24 17:52:54 +01:00
Craig Topper
2ad50f354a [DAGCombiner][RISCV][AArch64][PowerPC] Restrict foldAndOrOfSETCC from using SMIN/SMAX where and OR/AND would do.
This removes some diffs created by D153502.

I'm assuming an AND/OR won't be worse than an SMIN/SMAX. For
RISC-V at least, AND/OR can be a shorter encoding than SMIN/SMAX.

It's weird that we have two different functions responsible for
folding logic of setccs, but I'm not ready to try to untangle that.

I'm unclear if the PowerPC chang is a regression or not. It looks
like it might use more registers, but I don't understand PowerPC
register so I'm not sure.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158292
2023-08-23 20:26:23 -07:00
Peter Rong
f58fbfc746 [X86][CodeGen] Add a dag pattern to fix #64323
After recent patch D30189, #64323's error message become a new one.
When DAGCombiner was optimizing `(vextract (scalar_to_vector val, 0) -> val`, it didn't
consider the possibility that the inserted value type has less bit than the dest type.
This patch fixes that.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D158355
2023-08-23 10:50:32 -07:00
Simon Pilgrim
ba818c4019 [DAG] replaceStoreOfInsertLoad - don't fold if the inserted element is implicitly truncated
D152276 wasn't handling the case where the inserted element is implicitly truncated into the vector - resulting in a i1 element (implicitly truncated from i8) overwriting 8 bits instead of 1 bit.

This patch is intended to be merged into 17.x so I've just disallowed any vector element vs inserted element type mismatch - technically we could be more elegant and permit truncated stores (as long as the store is still byte sized), but the use cases for that are so limited I'd prefer to play it safe for now.

Candidate patch for #64655 17.x merge

Differential Revision: https://reviews.llvm.org/D158366
2023-08-21 11:22:07 +01:00
Jim Lin
18f5ada244 [DAGCombiner] Don't reduce BUILD_VECTOR to BITCAST before LegalizeTypes if VT is legal.
Targets may lose some optimization opportunities for certain vector operation
if we reduce BUILD_VECTOR to BITCAST early.

And if VT is not legal, reduce BUILD_VECTOR to BITCAST before LegailizeTypes
can get benefit. Because type-legalizer often scalarizes illegal type of vectors.

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D156645
2023-08-19 12:53:50 +08:00
Philip Reames
92e0c0dc1a [DAG] Restrict insert_subvector undef, splat_veector, dontcare transform
On the extract_subvector side, we already have the restriction. With D158201, we'd start getting unprofitable splat combines unless we add the same one on the extract_subvector side.

Differential Revision: https://reviews.llvm.org/D158202
2023-08-18 12:44:09 -07:00
Philip Reames
67b71ad04a [DAG] Fold insert_subvector undef, (extract_subvector X, 0), 0 with non-matching types
We have an existing DAG combine for when an insert/extract subvector pair is entirely a nop, but we hadn't handled the case where the net result was either an insert or an extract (but not both). The transform is restricted to index = 0 to avoid having to adjust indices after the transform.

Differential Revision: https://reviews.llvm.org/D158201
2023-08-18 12:28:27 -07:00
Craig Topper
bbbb93eb48 Revert "[DAG] Fold insert_subvector undef, (extract_subvector X, 0), 0 with non-matching types"
This reverts commit 770be43f6782dab84d215d01b37396d63a9c2b6e.

Forgot to remove from my tree while experimenting.
2023-08-18 12:00:07 -07:00
Craig Topper
770be43f67 [DAG] Fold insert_subvector undef, (extract_subvector X, 0), 0 with non-matching types
We have an existing DAG combine for when an insert/extract subvector pair is entirely a nop, but we hadn't handled the case where the net result was either an insert or an extract (but not both).  The transform is restricted to index = 0 to avoid having to adjust indices after the transform.

Reviews, a couple comments on the test changes:
* Mostly RISCV, mostly schedule reordering.
* One real regression in splats-with-mixed-vl.ll due to a different overly aggressive combine, fix in a follow up patch.
* The test/CodeGen/X86/vector-replicaton-i1-mask.ll diff looked concerning at first, but not the mask size at most 4 i1s.  I think the type changes on the mask loads are correct, but would welcome a second opinion with someone more familiar with AVX512 codegen.

Differential Revision: https://reviews.llvm.org/D158201
2023-08-18 11:59:18 -07:00
Craig Topper
846fbb06b8 [DAGCombiner][RISCV] Return SDValue(N, 0) instead of SDValue() after 2 calls to CombineTo in visitSTORE.
RISC-V found a case where the CombineTo caused N to be CSEd with
an existing node and then deleted. The top level DAGCombiner loop
was surprised to find a node was deleted, but SDValue() was returned
from the visit function.

We need to return SDValue(N, 0) to tell the top level loop that
a change was made, but the worklist updates were already handled.

Fixes #64772.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158208
2023-08-17 15:13:36 -07:00
Simon Pilgrim
b0a77af4f1 [DAG] SimplifyDemandedBits - add sra(shl(x,c1),c1) -> sign_extend_inreg(x) demanded elts fold
Move the sra(shl(x,c1),c1) -> sign_extend_inreg(x) fold inside SimplifyDemandedBits so we can recognize hidden splats with DemandedElts masks.

Because the c1 shift amount has multiple uses, hidden splats won't get simplified to a splat constant buildvector - meaning the existing fold in DAGCombiner::visitSRA can't fire as it won't see a uniform shift amount.

I also needed to add TLI preferSextInRegOfTruncate hook to help keep truncate(sign_extend_inreg(x)) vector patterns on X86 so we can use PACKSS more efficiently.

Differential Revision: https://reviews.llvm.org/D157972
2023-08-15 16:32:03 +01:00
Craig Topper
6299650f97 [DAGCombiner] Fold trunc(undef) -> undef.
We already do this in getNode, but the undef might appear during
another DAGCombine.

While here remove code for handling noop truncates. getNode checks
the types and won't a noop truncate.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D157910
2023-08-14 13:02:24 -07:00
Philip Reames
b1ada7a1d3 [DAG] Support store merging of vector constant stores (try 2)
Original commit didn't handle the case where one of the stores was a
truncating store of the build_vector.  The existing codepath produced
wrong code (which thankfully also failed asserts) instead of guarding
against unexpected types.  Original commit message follows..

Ran across this when making a change to RISCV memset lowering. Seems
very odd that manually merging a store into a vector prevents it from
being further merged.

Differential Revision: https://reviews.llvm.org/D156349
2023-08-10 08:54:05 -07:00
Philip Reames
0696a531c2 Revert "[DAG] Support store merging of vector constant stores"
This reverts commit 660b740e4b3c4b23dfba36940ae0fe2ad41bfedf.  Crash reported in the review thread post commit.  Reverting while investigating.
2023-08-10 07:58:00 -07:00
Konstantina Mitropoulou
2c5d1b5ab7 [DAGCombiner] Reassociate the operands from (OR (OR(CMP1, CMP2)), CMP3) to (OR (OR(CMP1, CMP3)), CMP2)
This happens when CMP1 and CMP3 have the same predicate (or CMP2 and CMP3 have
the same predicate).

This helps optimizations such as the fololowing one:
CMP(A,C)||CMP(B,C) => CMP(MIN/MAX(A,B), C)
CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C)

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156215
2023-08-08 20:08:01 -07:00
pvanhout
98ccc70b93 [DAG] Fix crash in replaceStoreOfInsertLoad
Idx's type can be different from Ptr's, causing a "Binary operator types must match" assertion failure when emitting the MUL.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156972
2023-08-08 15:15:34 +02:00
Philip Reames
660b740e4b [DAG] Support store merging of vector constant stores
Ran across this when making a change to RISCV memset lowering. Seems very odd that manually merging a store into a vector prevents it from being further merged.

Differential Revision: https://reviews.llvm.org/D156349
2023-08-02 14:41:46 -07:00
David Blaikie
4e429fd2a7 Few linter fixes
size() > 0 -> !empty
indentation
mismatched names on parameters in decls/defs
const on value return types
2023-07-31 18:52:57 +00:00
Evgenii Kudriashov
c13e310fa7 [DAGCombine] Support truncated constants for fptosi.sat combining
Closes https://github.com/llvm/llvm-project/issues/56779

Reviewed By: RKSimon, dmgreen

Differential Revision: https://reviews.llvm.org/D152926
2023-07-28 18:54:39 +03:00
Pranav Kant
6f305e0658 [DAGCombiner] Limit graph traversal to cap compile times
hasPredecessorHelper method, that is used by DAGCombiner to combine to pre-indexed and post-indexed load/stores, is a major source of slowdown while compiling a large function with MSan enabled on Arm. This patch caps the DFS-graph traversal for this method to 8192 which cuts compile time by 50% (4m -> 2m compile time) at the cost of less overall nodes combined.

Here's the summary of pre-index DAG nodes created and time it took to compile the pathological case with different MaxDepth limit:
1. With MaxDepth = 0 (unlimited): 1800, took 4m
2. With MaxDepth = 32k, 560, took 2m31s
3. With MaxDepth = 8k, 139, took 2m.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D154885
2023-07-26 17:29:38 +00:00
Jay Foad
6fcad9cf93 [DAGCombiner] Simplify foldAndOrOfSETCC. NFC.
Pull out repeated hasOneUse checks. Simplify some conditions. Reduce
indentation.

Differential Revision: https://reviews.llvm.org/D156220
2023-07-26 10:22:55 +01:00
Craig Topper
1f5a1b8952 [DAGCombiner] Minor improvements to foldAndOrOfSETCC. NFC
Reduce the scope of some variables.
Replace an if with an assertion.

Reviewed By: kmitropoulou

Differential Revision: https://reviews.llvm.org/D156140
2023-07-25 00:20:06 -07:00
WANG Rui
595d5f36f4 [DAGCombine] Canonicalize operands for visitANDLike
During the construction of SelectionDAG, there are no explicit canonicalization rules to adjust the order of operands for AND nodes.  This may prevent the optimization in DAGCombiner::visitANDLike from being triggered. This patch canonicalizes the operands before matches, which can be observed to improve optimization on the RISC-V target architecture.

Canonicalize:
```
and(x, add) -> and(add, x)
```

Signed-off-by: WANG Rui <wangrui@loongson.cn>

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D154760
2023-07-24 16:52:04 +08:00
Amaury Séchet
88452508f3 [DAG] Improve carry reconstruction in combineCarryDiamond.
The gain is usually suffiscient to go the extra mile and reconstruct a carry in some cases.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D154533
2023-07-22 22:49:48 +00:00
Simon Pilgrim
697f60598e [DAG] hoistLogicOpWithSameOpcodeHands - ensure SIGN_EXTEND_INREG nodes have the same extension value type
Fix bug in the check for matching SIGN_EXTEND_INREG types
2023-07-20 10:44:46 +01:00
Simon Pilgrim
98b0f1360d [DAG] hoistLogicOpWithSameOpcodeHands - add support for SIGN_EXTEND_INREG nodes.
This can reuse the existing *_EXTEND node handling (with special handling for the valuetype arg)
2023-07-19 11:56:32 +01:00
Simon Pilgrim
2167ae93c9 [DAG] hoistLogicOpWithSameOpcodeHands - add support for *_EXTEND_VECTOR_INREG nodes.
This can reuse the existing *_EXTEND node handling.
2023-07-19 10:50:23 +01:00
Simon Pilgrim
3ad4f92f83 [DAG] More aggressively (extract_vector_elt (build_vector x, y), c) iff element is zero constant
We currently don't extract vector elements from multi-use build vectors unless TLI.aggressivelyPreferBuildVectorSources accepts them, which seems a little extreme for constant build vectors (especially as under some cases ComputeKnownBits will indirectly extract the data for us).

This is causing a few regressions in some upcoming SimplifyDemandedBits work I'm looking at, all of which just need to know that the element is zero, so I've tweaked the fold to accept zero elements as well, which will typically fold very easily.

Differential Revision: https://reviews.llvm.org/D155582
2023-07-18 17:31:34 +01:00
Konstantina Mitropoulou
4c42ab1199 [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns
CMP(A,C)||CMP(B,C) => CMP(MIN/MAX(A,B), C)
CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C)

This first patch handles integer types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153502
2023-07-17 17:13:47 -07:00
Matt Arsenault
296e24cd2e DAG: Constant fold frexp nodes
Special casing the nonfinite exponent value everywhere is kind of
annoying.
2023-07-17 17:34:29 -04:00
Simon Pilgrim
e9caa37e9c [DAG] Move lshr narrowing from visitANDLike to SimplifyDemandedBits
Inspired by some of the cases from D145468

Let SimplifyDemandedBits handle the narrowing of lshr to half-width if we don't require the upper bits, the narrowed shift is profitable and the zext/trunc are free.

A future patch will propose the equivalent shl narrowing combine.

Differential Revision: https://reviews.llvm.org/D146121
2023-07-17 15:50:09 +01:00
Noah Goldstein
74f0ec5e24 [DAGCombiner] Make it so that udiv can be folded with (select c, NonZero, 1)
This is done by allowing speculation of `udiv` if we can prove the
denominator is non-zero.

https://alive2.llvm.org/ce/z/VNCt_q

Differential Revision: https://reviews.llvm.org/D149198
2023-07-12 17:17:53 -05:00
Ivan Kosarev
15e7749e19 [Codegen] Generate fast fp64-to-fp16 conversions in unsafe mode.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154528
2023-07-12 11:55:19 +01:00
Amaury Séchet
ee2d10cd16 [NFC] Reorder functions in DAGCombiner so all UADDO_CARRY related functions are next to each others. 2023-07-04 14:55:11 +00:00
Simon Pilgrim
4742715eb7 [DAG] Fold (*ext (*_extend_vector_inreg x)) -> (*_extend_vector_inreg x) 2023-06-30 14:42:49 +01:00
David Green
14f54a594e [DAG][AArch64] Fold shuffle_vector<4,5,6,7> to extract_subvector
During legalization, we can end up with shuffles that are identity masks, so
act like extract_subvector, but do not simplify to extract_subvector. This
adjusts the profitability heuristic in foldExtractSubvectorFromShuffleVector to
allow identity vectors that do not start at element 0. Undef masks elements are
excluded as it can be more useful to keep the undef elements.

Differential Revision: https://reviews.llvm.org/D153504
2023-06-30 11:13:39 +01:00
Luke Lau
742fb8b5c7 [DAGCombine] Fold (store (insert_elt (load p)) x p) -> (store x)
If we have a store of a load with no other uses in between it, it's
considered dead and is removed. So sometimes when legalizing a fixed
length vector store of an insert, we end up producing better code
through scalarization than without.
An example is the follow below:

  %a = load <4 x i64>, ptr %x
  %b = insertelement <4 x i64> %a, i64 %y, i32 2
  store <4 x i64> %b, ptr %x

If this is scalarized, then DAGCombine successfully removes 3 of the 4
stores which are considered dead, and on RISC-V we get:

  sd a1, 16(a0)

However if we make the vector type legal (-mattr=+v), then we lose the
optimisation because we don't scalarize it.

This patch attempts to recover the optimisation for vectors by
identifying patterns where we store a load with a single insert
inbetween, replacing it with a scalar store of the inserted element.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D152276
2023-06-28 22:45:04 +01:00
FLZ101
32e4013dd4 [AArch64][SelectionDAG] fix infinite loop caused by legalizing & combining CONCAT_VECTORS
Legalizing in `AArch64TargetLowering::LowerCONCAT_VECTORS()` and combining in `DAGCombiner::visitCONCAT_VECTORS()` could cause an infinite loop.
This commit fixes that issue by conditionally skipping the combining.

Fix https://github.com/llvm/llvm-project/issues/63322

Reviewed By: RKSimon, MaskRay

Differential Revision: https://reviews.llvm.org/D153316
2023-06-27 13:57:41 -07:00
Simon Pilgrim
1f006f5fb6 [DAG] mergeTruncStores - early out if we collect more than the maximum number of stores
If we have an excessive number of stores in a single chain then the candidate WideVT may exceed the maximum width of an EVT integer type (and will assert) - but since mergeTruncStores doesn't support anything wider than a i64 store we should just early-out if we've collected more than stores than that.

Fixes #63306
2023-06-23 16:22:11 +01:00
David Green
589c940eb3 [DAG] Fix and expand fmin/fmax reassociation fold.
This call to reassociateReduction is used by both fminnum/fmaxnum and
fminimum/fmaximum. In adding support for fminimum/fmaximum we appear to be
fixing the use of an incorrect reduction type, which should have only applied
to minnum/maxnum.

I also believe that it doesn't need nsz and reassoc to perform the
reassociation. For float min/max it should always be valid.

Differential Revision: https://reviews.llvm.org/D153247
2023-06-23 14:45:14 +01:00
Amaury Séchet
34d8c5b9ce [DAG] Peek through trunc when combining select into shifts.
This fixes a regression in D127115

Depends on D127115

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D151916
2023-06-23 00:35:39 +00:00
Simon Pilgrim
43ad2e9c8b [DAG] Add getExtOrTrunc helper. NFC.
Wrap the getSExtOrTrunc/getZExtOrTrunc calls behind an IsSigned argument.
2023-06-20 16:03:18 +01:00
Simon Pilgrim
ff23856c1c [DAG] Fold (abds x, y) -> (abdu x, y) iff both args are known positive
This is a generic DAG combine version of D151055 which recognizes when a signed ABDS can be safely replaced with a unsigned ABDU instruction if it is legal.

Alive2: https://alive2.llvm.org/ce/z/pb5BjG

Differential Revision: https://reviews.llvm.org/D153328
2023-06-20 15:31:22 +01:00
Jeffrey Byrnes
7972a6e126 [DAGCombiner][NFC] Factor out ByteProvider
Differential Revision: https://reviews.llvm.org/D143018

Change-Id: I3dc03787a3382c0c3fe6b869f869c2946f450874
2023-06-19 08:54:34 -07:00
Craig Topper
7163539466 [DAGCombiner] When combining (sext_inreg (zext X), VT) -> (sext X) don't pass along the sext_inreg VT.
ISD::SIGN_EXTEND is only supposed to have one operand, but we
were creating it with 2 operands.

Since we basically never check for extra operands this went
unnoticed.
2023-06-15 11:47:42 -07:00