4027 Commits

Author SHA1 Message Date
Simon Pilgrim
ea2ee5dc2f
[DAG] Add legalization handling for AVGCEIL/AVGFLOOR nodes (#92096)
Always match AVG patterns pre-legalization, and use TargetLowering::expandAVG to expand again during legalization.

I've removed the X86 custom AVGCEILU pattern detection and replaced with combines to try and convert other AVG nodes to AVGCEILU.
2024-06-12 14:11:07 +01:00
c8ef
0e346eeac6
[DAG] fold avgu(zext(x), zext(y)) -> zext(avgu(x, y)) (#95134)
close: #86301
2024-06-12 12:58:49 +01:00
David Green
a284bdb311 [DAG] Fold fdiv X, c2 -> fmul X, 1/c2 without AllowReciprocal if exact (#93882)
This moves the combine of fdiv by constant to fmul out of an
'if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()' block,
so that it triggers if the divide is exact. An extra check for
Recip.isDenormal() is added as multiple places make reference
to it being unsafe or slow on certain platforms.
2024-06-09 12:28:20 +01:00
Yingwei Zheng
d9507a3e10
[DAGCombine] Fix miscompilation caused by PR94008 (#94850)
The pr description in #94008 mismatches with the code.
> + When VT is smaller than ShiftVT, it is safe to use trunc.
> + When VT is larger than ShiftVT, it is safe to use zext iff
`is_zero_poison` is true (i.e., `opcode == ISD::CTTZ_ZERO_UNDEF`). See
also the counterexample `src_shl_cttz2 -> tgt_shl_cttz2` in the alive2
    proofs.

Closes #94824.
2024-06-08 21:40:57 +08:00
Quentin Colombet
25506f4864
[SDISel][Combine] Constant fold FP16_TO_FP (#94790)
In some case, constant can survive early constant folding optimization
because they are hidden behind several layers of type changes.

E.g., consider the following sequence (extracted from the arm test that
this commit changes):
```
    t2: v1f16 = BUILD_VECTOR ConstantFP:f16<APFloat(0)>
    t4: v1f16 = insert_vector_elt t2, ConstantFP:f16<APFloat(0)>, Constant:i32<0>
  t5: f16 = bitcast t4
t6: f32 = fp_extend t5
```

Because the constant (APFloat(0)) is hidden behind a <1 x ty> type, all
the constant folding that normally happen for scalar nodes when using
`SelectionDAG::getNode` are blocked.

As a result the constant manages to survive as an actual conversion
instruction down to the select phase:
```
t11: f32 = fp16_to_fp Constant:i32<0>
```

With the change in this patch, we try to do constant folding one more
time during dag combine, which in the motivating example result in the
much better sequence:
```
t7: ch = CopyToReg t0, Register:f32 %0, ConstantFP:f32<0.000000e+00>
```

Note: I'm sure we have this problem in a lot of other places. Generally
speaking I believe SDISel is not that good with <1 x ty> compared to
pure scalar. However, I only changed what I could easily test.
2024-06-08 11:31:13 +02:00
aengelke
74d62c2f73
[CodeGen][SDAG] Remove CombinedNodes SmallPtrSet (#94609)
This "small" set grows quite large and it's more performant to store
whether a node has been combined before in the node itself.

As this information is only relevant for nodes that are currently not in
the worklist, add a second state to the CombinerWorklistIndex (-2) to
indicate that a node is currently not in a worklist, but was combined
before.

This brings a substantial performance improvement.
2024-06-07 13:17:27 +02:00
Simon Pilgrim
af3ffff34f
[DAG] Always allow folding XOR patterns to ABS pre-legalization (#94601)
Removes residual ARM handling for vXi64 ABS nodes to prevent infinite loops.
2024-06-07 11:02:50 +01:00
Simon Pilgrim
03a2fe9a75 [DAG] visitSUB - update the ABS matching code to use SDPatternMatch and hasOperation.
Avoids the need to explicitly test both commuted variants and doesn't match custom lowering after legalization.

Cleanup for #94504
2024-06-06 10:06:57 +01:00
aengelke
6150e84cfc
[CodeGen][SDAG] Remove Combiner WorklistMap (#92900)
DenseMap for pointer lookup is expensive, and this is only used for
deduplication and index lookup. Instead, store the worklist index in the
node itself.

This brings a substantial performance improvement.
2024-06-05 17:58:08 +02:00
Yingwei Zheng
47fd32f81c
[DAGCombine] Fix type mismatch in (shl X, cttz(Y)) -> (mul (Y & -Y), X) (#94008)
Proof: https://alive2.llvm.org/ce/z/J7GBMU

Same as https://github.com/llvm/llvm-project/pull/92753, the types of
LHS and RHS in shift nodes may differ.
+ When VT is smaller than ShiftVT, it is safe to use trunc.
+ When VT is larger than ShiftVT, it is safe to use zext iff
`is_zero_poison` is true (i.e., `opcode == ISD::CTTZ_ZERO_UNDEF`). See
also the counterexample `src_shl_cttz2 -> tgt_shl_cttz2` in the alive2
proofs.

Fixes issue
https://github.com/llvm/llvm-project/pull/85066#issuecomment-2142553617.
2024-06-01 19:04:55 +08:00
Jay Foad
b1be480b03 [DAGCombiner] Move CanReassociate down to first use. NFC. 2024-05-31 09:44:47 +01:00
Jianjian Guan
db6de1a20f
[DAGCombiner][VP] Add DAGCombine for VP_MUL (#80105)
Use visitMUL to combine VP_MUL, share most logic of MUL with VP_MUL.

Migrate from https://reviews.llvm.org/D121187
2024-05-31 10:17:11 +08:00
Yingwei Zheng
9e8ecce88e
[DAGCombine] Transform shl X, cttz(Y) to mul (Y & -Y), X if cttz is unsupported (#85066)
This patch fold `shl X, cttz(Y)` to `mul (Y & -Y), X` if cttz is
unsupported by the target.
Alive2: https://alive2.llvm.org/ce/z/AtLN5Y
Fixes https://github.com/llvm/llvm-project/issues/84763.
2024-05-29 18:26:54 +08:00
Matt Arsenault
16a5fd3fdb
DAG: Use flags in isLegalToCombineMinNumMaxNum (#93555) 2024-05-28 18:57:38 +02:00
Shengchen Kan
eeb2f72a49
[SelectionDAG][X86] Fix the assertion failure in Release build after #91747 (#93434)
In #91747, we changed the SDNode from `X86ISD::SUB` (FROM) to
`X86ISD::CCMP`
(TO) in the DAGCombine. The value type of `X86ISD::SUB` can be `i8, i32`
while the value type of `X86ISD::CCMP` is i32. This breaks the
assumption
that the value type should match after the combine and triggers the
error

```
SelectionDAG.cpp:10942: void
llvm::SelectionDAG::transferDbgValues(llvm::SDValue, llvm::SDValue,
unsigned int, unsigned int, bool): Assertion `FromNode && ToNode &&
"Can't modify dbg values"' failed.
```

when running tests

llvm/test/CodeGen/X86/apx/ccmp.ll
llvm/test/CodeGen/X86/apx/ctest.ll

in Release build when LLVM_ENABLE_ASSERTIONS is on.

In this patch, we fix it by creating a merged value.
2024-05-27 11:33:23 +08:00
Shengchen Kan
331eb8a004
[X86][CodeGen] Support lowering for CCMP/CTEST (#91747)
DAG combine for `CCMP` and `CTESTrr`:

```
and/or(setcc(cc0, flag0), setcc(cc1, sub (X, Y)))
->
setcc(cc1, ccmp(X, Y, ~cflags/cflags, cc0/~cc0, flag0))

and/or(setcc(cc0, flag0), setcc(cc1, cmp (X, 0)))
 ->
setcc(cc1, ctest(X, X, ~cflags/cflags, cc0/~cc0, flag0))
```
 where `cflags` is determined by `cc1`.

Generic DAG combine:
```
cmp(setcc(cc, X), 0)
brcond ne
->
X
brcond cc

sub(setcc(cc, X), 1)
brcond ne
->
X
brcond ~cc
```

Post DAG transform:  `ANDrr/rm + CTESTrr -> CTESTrr/CTESTmr`


Pattern match for `CTESTri`:
```
X= and A, B
ctest(X, X, cflags, cc0/, flag0)
->
ctest(A, B, cflags, cc0/, flag0)
```

`CTESTmi` is already handled by the memory folding mechanism in MIR.
2024-05-26 18:32:23 +08:00
Simon Pilgrim
729fdb6bb6 [DAG] visitFunnelShift - pull out repeated SDLoc. 2024-05-24 14:50:42 +01:00
Simon Pilgrim
7273ad1238 [DAG] visitABD - rewrite "(abs x, 0)" folds to use SDPatternMatch
No need for this to be vector specific, and its more likely that scalar cases will appear after #92576
2024-05-19 11:49:51 +01:00
Simon Pilgrim
9f5c8de386 [DAG] visitAVG - rewrite "fold (avgfloor x, 0) -> x >> 1" to use SDPatternMatch
No need for this to be vector specific, and its more likely that scalar cases will appear after #92096
2024-05-19 11:30:20 +01:00
David Green
4c98f5b439
[DAG] Use copysign in frem power-2 fold. (#91751)
As a small addition to #91148, this uses copysign to produce the correct
sign for zero when converting frem to div/trunc/mul when we do not know
that the input is positive (and we care about sign bits). The copysign
lets us get the sign of zero correct.

In testing, the only case this produced different results than fmod was:
frem -inf, 4.0 -> nan vs -nan
2024-05-18 22:50:19 +01:00
Patrick O'Neill
4ab2ac22d0
[DAGCombiner] Mark vectors as not AllAddOne/AllSubOne on type mismatch (#92195)
Fixes #92193.
2024-05-15 12:39:28 -07:00
Simon Pilgrim
7f3e3785d0 Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFC. 2024-05-10 22:40:23 +01:00
Simon Pilgrim
7e6879b245 [X86] scalarizeExtractedBinop - reuse existing SDLoc. NFC. 2024-05-10 22:40:23 +01:00
David Green
8fc9e3d577
[DAG] Lower frem of power-2 using div/trunc/mul+sub (#91148)
If we are lowering a frem and the divisor is known to be an integer power-2, we
can use the formula 'frem = x - trunc(x / d) * d'. This avoids the more
expensive call to fmod. The results are identical as fmod so long as d is a
power-2 (so the mul does not round incorrectly), and the sign of the return is
either always positive or not important for zeroes (nsz).

Unfortunately Alive2 does not handle this well at the moment. I was using
exhaustive checking to test this:
(https://gist.github.com/davemgreen/6078015f30d3bacd1e9572f8db5d4b64).

I found this in cpythons implementation of float_pow. I currently added it as a
DAG combine for frem with power-2 fp constants.
2024-05-10 14:58:48 +01:00
David Green
23b673e5b4
[DAG][AArch64] Handle vscale addressing modes in reassociationCanBreakAddressingModePattern (#89908)
reassociationCanBreakAddressingModePattern tries to prevent bad add
reassociations that would break adrressing mode patterns. This adds
support for vscale offset addressing modes, making sure we don't break
patterns that already exist. It does not optimize _to_ the correct
addressing modes yet, but prevents us from optimizating _away_ from
them.
2024-05-10 09:27:02 +01:00
David Green
fcf945f4ed
[DAG] Fold add(mul(add(A, CA), CM), CB) -> add(mul(A, CM), CM*CA+CB) (#90860)
This is useful when the inner add has multiple uses, and so cannot be
canonicalized by pushing the constants down through the mul. This patch
adds patterns for both `add(mul(add(A, CA), CM), CB)` and with an extra add
`add(add(mul(add(A, CA), CM), B) CB)` as the second can come up when
lowering geps.
2024-05-08 22:11:18 +01:00
Craig Topper
ef84452571
[DAGCombiner] Be more careful about looking through extends and truncates in mergeTruncStores. (#91375)
Previously we recursively looked through extends and truncates on both
SourceValue and WideVal.

SourceValue is the largest source found for each of the stores we are
combining. WideVal is the source for the current store.

Previously we could incorrectly look through a (zext (trunc X)) pair and
incorrectly believe X to be a good source.

I think we could also look through a zext on one store and a sext on
another store and arbitrarily pick one of the extends as the final
source.

With this patch we only look through one level of extend or truncate.
And we don't look through extends/truncs on both SourceValue and WideVal
at the same time.

This may lose some optimization cases, but keeps everything we had tests
for.

Fixes #90936.
2024-05-07 21:17:50 -07:00
Simon Pilgrim
522b4bfe5b
[DAG] Fold bitreverse(shl/srl(bitreverse(x),y)) -> srl/shl(x,y) (#89897)
Noticed while investigating GFNI per-element vector shifts (we can form SHL but not SRL/SRA)

Alive2: https://alive2.llvm.org/ce/z/fSH-rf
2024-05-06 11:13:05 +01:00
Simon Pilgrim
caacf8685a
[DAG] Fold freeze(shuffle(x,y,m)) -> shuffle(freeze(x),freeze(y),m) (#90952)
If the shuffle mask contains no undef elements, then we can move the freeze through a shuffle node.

This requires special case handling to create a new ShuffleVectorSDNode.

Includes VECTOR_SHUFFLE support for isGuaranteedNotToBeUndefOrPoison  / canCreateUndefOrPoison.
2024-05-04 12:03:10 +01:00
Craig Topper
3563af6c06
[DAGCombiner] In mergeTruncStore, make sure we aren't storing shifted in bits. (#90939)
When looking through a right shift, we need to make sure that all of
the bits we are using from the shift come from the shift input and
not the sign or zero bits that are shifted in.
    
Fixes #90936.
2024-05-03 09:59:33 -07:00
Simon Pilgrim
91c52b966a [DAG] Pull out repeated SDLoc() from SHL/SRL/SRA combines. NFC.
We were always calling SDLoc(N) at the top of each visitSHL/SRL/SRA for the FoldConstantArithmetic call, so just reuse this as much as possible.
2024-04-30 17:30:43 +01:00
Luke Lau
5e03c0af47
[DAGCombiner] Fix mayAlias not accounting for scalable MMOs with offsets (#90573)
In #70452 DAGCombiner::mayAlias was taught to handle scalable sizes, but
when it checks via AA->isNoAlias it didn't take into account the case
where the size is scalable but there was an offset too.

For the fixed length case the offset was just accounted for by adding to
the LocationSize, but for the scalable case there doesn't seem to be a
way to represent both a scalable and fixed part in it. So this patch
works around it by bailing if there is an offset.

Fixes #90559
2024-04-30 20:20:40 +08:00
Bjorn Pettersson
55c6bda01e Revert "Revert "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921)" and more..."
This reverts commit 16bd10a38730fed27a3bf111076b8ef7a7e7b3ee.

Re-applies:
    b3c55b707110084a9f50a16aade34c3be6fa18da - "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921)"
    8e2f6495c0bac1dd6ee32b6a0d24152c9c343624 - "[DAGCombiner] Do not always fold FREEZE over BUILD_VECTOR (#85932)"
    73472c5996716cda0dbb3ddb788304e0e7e6a323 - "[SelectionDAG] Treat CopyFromReg as freezing the value (#85932)"

with a fix in DAGCombiner::visitFREEZE.
2024-04-29 13:08:52 +02:00
David Spickett
16bd10a387 Revert "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921)" and more...
This reverts:
b3c55b707110084a9f50a16aade34c3be6fa18da - "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921)"
(because it updates a test case that I don't know how to resolve the conflict for)
8e2f6495c0bac1dd6ee32b6a0d24152c9c343624 - "[DAGCombiner] Do not always fold FREEZE over BUILD_VECTOR (#85932)"
73472c5996716cda0dbb3ddb788304e0e7e6a323 - "[SelectionDAG] Treat CopyFromReg as freezing the value (#85932)"

Due to a test suite failure on AArch64 when compiling for SVE.
https://lab.llvm.org/buildbot/#/builders/197/builds/13955

clang: ../llvm/llvm/include/llvm/CodeGen/ValueTypes.h:307: MVT llvm::EVT::getSimpleVT() const: Assertion `isSimple() && "Expected a SimpleValueType!"' failed.
2024-04-29 09:47:41 +01:00
Björn Pettersson
b3c55b7071
[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921)
[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison

Handle SELECT_CC similarly as SETCC.

Handle these operations that only propagate poison/undef based on the
input operands:
  SADDSAT, UADDSAT, SSUBSAT, USUBSAT, MULHU, MULHS,
  SMIN, SMAX, UMIN, UMAX

These operations may create poison based on shift amount and exact
flag being violated:
  SRL, SRA

One goal here is to allow pushing freeze through these operations
when allowed, as well as letting analyses such as
isGuaranteedNotToBeUndefOrPoison to not break on such operations.

Since some problems have been observed with pushing freeze through
SRA/SRL we block that explicitly in DAGCombiner::visitFreeze now.
That way we can still model SRA/SRL properly in
SelectionDAG::canCreateUndefOrPoison, e.g. when used by
isGuaranteedNotToBeUndefOrPoison, even if we do not want to push
freeze through those instructions.
2024-04-29 07:56:49 +02:00
Matt Arsenault
405c018c71
DAG: Simplify demanded bits for truncating atomic_store (#90113)
It's really unfortunate that STORE and ATOMIC_STORE are separate
opcodes. This duplicates a basic simplify demanded for the truncating
case. This avoids some AMDGPU lit regressions in a future patch.

I'm not sure how to craft a test that exposes this without first
introducing the regressions by promoting half to i16.
2024-04-26 15:21:44 +02:00
Simon Pilgrim
55d85c84ac
[DAG] visitORCommutative - fold build_pair(not(x),not(y)) -> not(build_pair(x,y)) style patterns (#90050)
(Sorry, not an actual build_pair node just a similar pattern).

For cases where we're concatenating 2 integers into a double width integer, see if both integer sources are NOT patterns.

We could take this further and handle all logic ops with a constant operands, but I just wanted to handle the case reported on #89533 initially.

Fixes #89533
2024-04-26 14:11:03 +01:00
Bjorn Pettersson
8e2f6495c0 [DAGCombiner] Do not always fold FREEZE over BUILD_VECTOR (#85932)
Avoid turning a BUILD_VECTOR that can be recognized as "all zeros",
"all ones" or "constant" into something that depends on
freeze(undef), as that would destroy those properties.

Instead we replace undef by 0/-1 in such vectors, making it possible
to fold away the freeze. We typically use -1 if the BUILD_VECTOR
would identify as "all ones", and otherwise we use the value 0.
2024-04-26 13:41:21 +02:00
Simon Pilgrim
d51a17f684 [DAG] visitORCommutative - pull out repeated SDLoc(). NFC. 2024-04-25 14:23:36 +01:00
Björn Pettersson
f9b419b7a0
[DAGCombiner] Fix miscompile bug in combineShiftOfShiftedLogic (#89616)
Ensure that the sum of the shift amounts does not overflow the
shift amount type when combining shifts in combineShiftOfShiftedLogic.

Solves a miscompile bug found when testing the C23 BitInt feature.

Targets like X86 that only use an i8 for shift amounts after
legalization seems to be extra susceptible for bugs like this as it
isn't legal to shift more than 255 steps.
2024-04-23 14:11:34 +02:00
Simon Pilgrim
ca9a44ef47 [DAG] visitORCommutative - use sd_match to reduce the need for commutative operand matching. NFCI.
Use sd_match to match commutative inner AND/OR/XOR node arguments instead of some messy manual matching of each commutation.
2024-04-22 10:41:57 +01:00
Simon Pilgrim
c88b84d467 [DAG] visitOR/visitORLike - merge repeated SDLoc calls. 2024-04-22 10:28:02 +01:00
Craig Topper
ce48f43f05
[SelectionDAG] Require UADDO_CARRY carryin and carryout to have the same type. (#89255)
This requires type legalization to keep them the same. This means we no
longer need to legalize the operand since it will be legalized when we
legalize the second result.
2024-04-19 12:38:53 -07:00
Simon Pilgrim
2e68ba99de [DAG] visitADDLike - update "(x - y) + -1 -> add (xor y, -1), x" fold to accept UNDEF in a splat vector of -1
Make sure we use getNOT instead of reusing the allones (with undefs) vector
2024-04-19 13:47:29 +01:00
Craig Topper
ba1158813d
[DAGCombiner][AArch64] Make combineCarryDiamond avoid creating UADDO_CARRY with carry in larger than setcc result type. (#89121)
In the attach test case we were creating a UADDO_CARRY with i1 carry out
and i41 carry in. i41 exceeds is larger than the setcc result type for
AArch64 which is i32. i41 needs to be promoted to i64 since it is larger
than i32. The type legalizer tried to use promoteTargetBoolean, but that
can only promote from a type smaller than setcc result type.

The easiest fix here is to force the carryin type to match the carryout
type at the type of creation. This should ensure the node won't exceeed
setcc result type as long as the output type doesn't.

I think we should explore requiring the types to match for this node.

Fixes #88966
2024-04-18 08:34:51 -07:00
Simon Pilgrim
73b255c9f8 [DAG] Ensure extract_subvector(insert_subvector(x,y,c1),c2) --> extract_subvector(y,c2-c1) is working on fixed vector types
#87925 failed to ensure we weren't removing the extracted subvector from a scalable vector type

Thanks to @antmox for the headsup.
2024-04-18 13:21:52 +01:00
Simon Pilgrim
c18a3b6bd3 [DAG] Fold extract_subvector(insert_subvector(x,y,c1),c2) --> extract_subvector(y,c2-c1) (#87925) (REAPPLIED)
If the extract_subvector is cheap, attempt to extract directly from an inserted subvector

Reapplied with a check to ensure we only attempt this for fixed vectors
2024-04-16 12:30:27 +01:00
Alina Sbirlea
40bbdb609f Revert "[DAG] Fold extract_subvector(insert_subvector(x,y,c1),c2) --> extract_subvector(y,c2-c1) (#87925)"
This reverts commit 8c0f52e9d5a99bf96bb64ac23b5893482c292527.

Reverting to green, reproducer attached in the PR/revision comments.
2024-04-15 17:38:52 -07:00
Yingwei Zheng
4d28d3f93b
[SDAG] Turn umin into smin if the saturation pattern is broken (#88505)
As we canonicalizes smin with non-negative operands into umin in the
middle-end, the saturation pattern will be broken.
This patch reverts the transform in DAGCombine to fix the regression on
ARM.

Fixes https://github.com/llvm/llvm-project/issues/85706.
2024-04-16 01:28:28 +08:00
fengfeng
36230f90ee
[SelectionDAG] Propagate Disjoint flag. (#88370)
Signed-off-by: feng.feng <feng.feng@iluvatar.com>
2024-04-15 11:01:15 +02:00