2980 Commits

Author SHA1 Message Date
Luís Marques
acac29ca42 [DAGCombiner] Don't fold FCOPYSIGN vector sign operand casts
Avoid doing the following combine for vector types:

```
copysign(x, fp_extend(y)) -> copysign(x, y)
copysign(x, fp_round(y)) -> copysign(x, y)
```

That combine seemed to impede the selection of vector instruction and cause
a mess in some circumstances.

Differential Revision: https://reviews.llvm.org/D96037
2021-02-10 14:25:24 +00:00
Kazu Hirata
7e75f6fc1d [SelectionDAG] Use range-based for loops (NFC) 2021-02-09 22:14:30 -08:00
Nemanja Ivanovic
a5222aa085 [DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets
As of commit 284f2bffc9bc5, the DAG Combiner gets rid of the masking of the
input to this node if the mask only keeps the bottom 16 bits. This is because
the underlying library function does not use the high order bits. However, on
PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits
from the register. Therefore, the library implementation of __gnu_h2f_ieee will
return an incorrect result if the bits aren't cleared.

This combine is desired for ARM (and possibly other targets) so this patch adds
a query to Target Lowering to check if this zeroing needs to be kept.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092

Differential revision: https://reviews.llvm.org/D96283
2021-02-09 06:33:48 -06:00
Simon Pilgrim
c5c690a835 [DAG] visitVECTOR_SHUFFLE - move shuffle legality check into MergeInnerShuffle lamda. NFCI.
This is going to be necessary for a future reuse of MergeInnerShuffle
2021-02-08 14:25:16 +00:00
Huihui Zhang
1b81117f88 [DAGCombiner][SVE] Fix invalid use of getVectorNumElements() in visitSRA.
Make sure scalable property is preserved by using getVectorElementCount().

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D95967
2021-02-05 09:56:49 -08:00
Craig Topper
34da12dd1f [DAGCombiner] Remove (sra (shl X, C), C) if X has more than C sign bits.
If sext_inreg is supported, we will turn this into sext_inreg. That
will then remove it if there are enough sign bits. But if sext_inreg
isn't supported, we can still remove the shift pair based on sign
bits.

Split from D95890.
2021-02-03 10:18:40 -08:00
Fraser Cormack
fde2466171 [SelectionDAG] Support scalable-vector splats in more cases
This patch adds support for scalable-vector splats in DAGCombiner's
`isConstantOrConstantVector` and `ISD::matchUnaryPredicate` functions,
which enable the SelectionDAG div/rem-by-constant optimizations for
scalable vector types.

It also fixes up one case where the UDIV optimization was generating a
SETCC without first consulting the target for its preferred SETCC result
type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94501
2021-01-25 10:58:15 +00:00
QingShan Zhang
ffc3e800c6 [NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero
For now, we correct the result for sqrt if iteration > 0. This doesn't make
sense as they are not strict relative.

Reviewed By: dmgreen, spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D94480
2021-01-25 04:02:44 +00:00
Kazu Hirata
16baad8f4e [llvm] Use pop_back_val (NFC) 2021-01-24 12:18:57 -08:00
Simon Pilgrim
5dbe5d2c91 [DAG] Commute shuffle(splat(A,u), shuffle(C,D)) -> shuffle'(shuffle(C,D), splat(A,u))
We only merge shuffles if the inner (LHS) shuffle is a non-splat, so commute these shuffles to improve merging of multiple shuffles.
2021-01-22 11:43:18 +00:00
Simon Pilgrim
69bc0990a9 [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED).
Add DemandedElts support inside the TRUNCATE analysis.

REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d

Differential Revision: https://reviews.llvm.org/D56387
2021-01-21 13:01:34 +00:00
Simon Pilgrim
bc9ab9a5cd [DAG] CombineToPreIndexedLoadStore - use const APInt& for getAPIntValue(). NFCI.
Cleanup some code to use auto* properly from cast, and use const APInt& for getAPIntValue() to avoid an unnecessary copy.
2021-01-21 11:04:09 +00:00
Hans Wennborg
a51226057f Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE"
It caused "Vector shift amounts must be in the same as their first arg"
asserts in Chromium builds. See the code review for repro instructions.

> Add DemandedElts support inside the TRUNCATE analysis.
>
> Differential Revision: https://reviews.llvm.org/D56387

This reverts commit cad4275d697c601761e0819863f487def73c67f8.
2021-01-20 20:06:55 +01:00
Simon Pilgrim
cad4275d69 [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE
Add DemandedElts support inside the TRUNCATE analysis.

Differential Revision: https://reviews.llvm.org/D56387
2021-01-20 15:39:58 +00:00
Kazu Hirata
b023cdeacc [llvm] Use llvm::all_of (NFC) 2021-01-19 20:19:17 -08:00
Kazu Hirata
8857202489 [llvm] Use llvm::find (NFC) 2021-01-19 20:19:14 -08:00
Simon Pilgrim
207f32948b [DAG] SimplifyDemandedBits - use KnownBits comparisons to remove ISD::UMIN/UMAX ops
Use the KnownBits icmp comparisons to determine when a ISD::UMIN/UMAX op is unnecessary should either op be known to be ULT/ULE or UGT/UGE than the other.

Differential Revision: https://reviews.llvm.org/D94532
2021-01-18 10:29:23 +00:00
Simon Pilgrim
46aa3c6c33 [DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - improve shuffle(shuffle(x,y),shuffle(x,y)) merging
MergeInnerShuffle currently attempts to merge shuffle(shuffle(x,y),z) patterns into a single shuffle, using 1 or 2 of the x,y,z ops.

However if we already match 2 ops we might be able to handle the third op if its also a shuffle that references one of the previous ops, allowing us to handle some cases like:

shuffle(shuffle(x,y),shuffle(x,y))
shuffle(shuffle(shuffle(x,z),y),z)
shuffle(shuffle(x,shuffle(x,y)),z)
etc.

This isn't an exhaustive match and is dependent on the order the candidate ops are encountered - if one of the matched ops was a shuffle that was peek-able we don't go back and try to split that, I haven't found much need for that amount of analysis yet.

This is a preliminary patch that will allow us to later improve x86 HADD/HSUB matching - but needs to be reviewed separately as its in generic code and affects existing Thumb2 tests.

Differential Revision: https://reviews.llvm.org/D94671
2021-01-15 15:08:31 +00:00
Simon Pilgrim
7c30c05ff7 [DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - reset shuffle ops and reorder early-out and second op matching. NFCI.
I'm hoping to reuse MergeInnerShuffle in some other folds - so ensure the candidate ops/mask are reset at the start of each run.

Also, move the second op matching before bailing to make it simpler to try to match other things afterward.
2021-01-14 11:55:20 +00:00
Simon Pilgrim
af8d27a7a8 [DAG] visitVECTOR_SHUFFLE - pull out shuffle merging code into lambda helper. NFCI.
Make it easier to reuse in a future patch.
2021-01-14 11:05:19 +00:00
Kazu Hirata
5c1c39e8d8 [llvm] Use *Set::contains (NFC) 2021-01-13 19:14:41 -08:00
Simon Pilgrim
993c488ed2 [DAG] visitVECTOR_SHUFFLE - use all_of to check for all-undef shuffle mask. NFCI. 2021-01-13 17:19:41 +00:00
Juneyoung Lee
25eb7b08ba [DAGCombiner] Fold BRCOND(FREEZE(COND)) to BRCOND(COND)
This patch resolves the suboptimal codegen described in http://llvm.org/pr47873 .
When CodeGenPrepare lowers select into a conditional branch, a freeze instruction is inserted.
It is then translated to `BRCOND(FREEZE(SETCC))` in SelDag.
The `FREEZE` in the middle of `SETCC` and `BRCOND` was causing a suboptimal code generation however.
This patch adds `BRCOND(FREEZE(cond))` -> `BRCOND(cond)` fold to DAGCombiner to remove the `FREEZE`.

To make this optimization sound, `BRCOND(UNDEF)` simply should nondeterministically jump to the branch or not, rather than raising UB.
It wasn't clear what happens when the condition was undef according to the comments in ISDOpcodes.h, however.
I updated the comments of `BRCOND` to make it explicit (as well as `BR_CC`, which is also a conditional branch instruction).

Note that it diverges from the semantics of `br` instruction in IR, which is explicitly UB.
Since the UB semantics was necessary to explain optimizations that use branching conditions, and SelDag doesn't seem to have such optimization, I think this divergence is okay.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D92015
2021-01-13 09:36:52 +09:00
Craig Topper
df74c001fa [DAGCombiner] Replace static helper function isConstantFPBuildVectorOrConstantFP with the identical version in SelectionDAG. NFC 2021-01-11 23:41:40 -08:00
Joe Ellis
007358239d [DAGCombiner] Use getVectorElementCount inside visitINSERT_SUBVECTOR
This avoids TypeSize-/ElementCount-related warnings.

Differential Revision: https://reviews.llvm.org/D92747
2021-01-11 14:15:11 +00:00
QingShan Zhang
7539c75bb4 [DAGCombine] Remove the check for unsafe-fp-math when we are checking the AFN
We are checking the unsafe-fp-math for sqrt but not for fpow, which behaves inconsistent.
As the direction is to remove this global option, we need to remove the unsafe-fp-math
check for sqrt and update the test with afn fast-math flags.

Reviewed By: Spatel

Differential Revision: https://reviews.llvm.org/D93891
2021-01-11 02:25:53 +00:00
Craig Topper
4ef91f5871 [DAGCombiner] Don't speculatively create an all ones constant in visitREM that might not be used.
This looks to have been done to save some duplicated code under
two different if statements, but it ends up being harmful to D94073.
This speculative constant can be called on a scalable vector type
with i64 element size when i64 scalars aren't legal. The code tries
and fails to find a vector type with i32 elements that it can use.

So only create the node when we know it will be used.
2021-01-05 12:45:57 -08:00
Cameron McInally
92be640bd7 [FPEnv][AMDGPU] Disable FSUB(-0,X)->FNEG(X) DAGCombine when subnormals are flushed
This patch disables the FSUB(-0,X)->FNEG(X) DAG combine when we're flushing subnormals. It requires updating the existing AMDGPU tests to use the fneg IR instruction, in place of the old fsub(-0,X) canonical form, since AMDGPU is the only backend currently checking the DenormalMode flags.

Note that this will require follow-up optimizations to make sure the FSUB(-0,X) form is handled appropriately

Differential Revision: https://reviews.llvm.org/D93243
2021-01-04 14:44:10 -06:00
Layton Kifer
d29f93bda5 [DAGCombiner] Don't create sexts of deleted xors when they were in-visit replaced
Fixes a bug introduced by D91589.

When folding `(sext (not i1 x)) -> (add (zext i1 x), -1)`, we try to replace the not first when possible. If we replace the not in-visit, then the now invalidated node will be returned, and subsequently we will return an invalid sext. In cases where the not is replaced in-visit we can simply return SDValue, as the not in the current sext should have already been replaced.

Thanks @jgorbe, for finding the below reproducer.

The following reduced test case crashes clang when built with `clang -O1 -frounding-math`:

```
template <class> class a {
  int b() { return c == 0.0 ? 0 : -1; }
  int c;
};
template class a<long>;
```

A debug build of clang produces this "assertion failed" error:
```
clang: /home/jgorbe/code/llvm/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:264: void {anonymous}::DAGCombiner::AddToWorklist(llvm::
SDNode*): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed.
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93274
2020-12-23 16:16:26 -08:00
Layton Kifer
385e9a2a04 [DAGCombiner] Improve shift by select of constant
Clean up a TODO, to support folding a shift of a constant by a
select of constants, on targets with different shift operand sizes.

Reviewed By: RKSimon, lebedev.ri

Differential Revision: https://reviews.llvm.org/D90349
2020-12-18 02:21:42 +00:00
Kerry McLaughlin
05edfc5475 [SVE][CodeGen] Add DAG combines for s/zext_masked_gather
This patch adds the following DAGCombines, which apply if isVectorLoadExtDesirable() returns true:
 - fold (and (masked_gather x)) -> (zext_masked_gather x)
 - fold (sext_inreg (masked_gather x)) -> (sext_masked_gather x)

LowerMGATHER has also been updated to fetch the LoadExtType associated with the
gather and also use this value to determine the correct masked gather opcode to use.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92230
2020-12-09 11:53:19 +00:00
Kerry McLaughlin
4519ff4b6f [SVE][CodeGen] Add the ExtensionType flag to MGATHER
Adds the ExtensionType flag, which reflects the LoadExtType of a MaskedGatherSDNode.
Also updated SelectionDAGDumper::print_details so that details of the gather
load (is signed, is scaled & extension type) are printed.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91084
2020-12-09 11:19:08 +00:00
Huihui Zhang
8e6fc1f97e [AArch64][SVE] Add lowering for llvm.maxnum|minnum for scalable type.
LLVM intrinsic llvm.maxnum|minnum is overloaded intrinsic, can be used on any
floating-point or vector of floating-point type.
This patch extends current infrastructure to support scalable vector type.

This patch also fix a warning message of incorrect use of EVT::getVectorNumElements()
for scalable type, when DAGCombiner trying to split scalable vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92607
2020-12-08 09:35:53 -08:00
Kai Luo
44bd8ea167 [DAGCombine][PowerPC] Simplify nabs by using legal smin operation
Convert `0 - abs(x)` to `smin (x, -x)` if `smin` is a legal operation.

Verification: https://alive2.llvm.org/ce/z/vpquFR

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92637
2020-12-08 03:24:07 +00:00
Simon Pilgrim
b6e847c396 [DAG] Cleanup by folding some single use VT.getScalarSizeInBits() calls into its comparison. NFCI. 2020-12-07 18:23:54 +00:00
Kerry McLaughlin
111f559bbd [SVE][CodeGen] Call refineIndexType & refineUniformBase from visitMGATHER
The refineIndexType & refineUniformBase functions added by D90942 can also be used to
improve CodeGen of masked gathers.

These changes were split out from D91092

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92319
2020-12-07 13:20:19 +00:00
Bing1 Yu
eee30a6dce [CodeGen] Modify the refineIndexType(...)'s code to fix a bug in D90942.
In previous code, when refineIndexType(...) is called and Index is undef, Index.getOperand(0) will raise a assertion fail.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D92548
2020-12-07 08:49:07 +08:00
Layton Kifer
ac522f8700 [DAGCombiner] Fold (sext (not i1 x)) -> (add (zext i1 x), -1)
Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets.

Differential Revision: https://reviews.llvm.org/D91589
2020-12-06 11:52:10 -05:00
Simon Pilgrim
6f4ee6f870 [DAGCombiner] Use const APInt& for getConstantOperandAPInt results. NFCI.
Avoid unnecessary instantiation.

Noticed while removing unnecessary autos
2020-12-04 09:44:58 +00:00
dfukalov
2ce38b3f03 [NFC] Reduce include files dependency.
1. Removed #include "...AliasAnalysis.h" in other headers and modules.
2. Cleaned up includes in AliasAnalysis.h.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92489
2020-12-03 18:25:05 +03:00
Joe Ellis
78c0ea54a2 [DAGCombine] Fix TypeSize warning in DAGCombine::visitLIFETIME_END
Bail out early if we encounter a scalable store.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D92392
2020-12-03 12:12:41 +00:00
Layton Kifer
d7fec38f05
[DAGCombiner][NFC] Replace duplicate implementation flipBoolean with DAG.getLogicalNOT
Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D92246
2020-12-01 22:23:04 +03:00
Benjamin Kramer
107e92dff8 [DAG] Remove unused variable. NFC. 2020-12-01 16:29:02 +01:00
Simon Pilgrim
1b209ff9e3 [DAG] Move vselect(icmp_ult, 0, sub(x,y)) -> usubsat(x,y) to DAGCombine (PR40111)
Move the X86 VSELECT->USUBSAT fold to DAGCombiner - there's nothing target specific about these folds.
2020-12-01 14:25:29 +00:00
Simon Pilgrim
6dbd0d36a1 [DAG] Move vselect(icmp_ult, -1, add(x,y)) -> uaddsat(x,y) to DAGCombine (PR40111)
Move the X86 VSELECT->UADDSAT fold to DAGCombiner - there's nothing target specific about these folds.

The SSE42 test diffs are relatively benign - its avoiding an extra constant load in exchange for an extra xor operation - there are extra register moves, which is annoying as all those operations should commute them away.

Differential Revision: https://reviews.llvm.org/D91876
2020-12-01 11:56:26 +00:00
QingShan Zhang
4d83aba422 [DAGCombine] Adding a hook to improve the precision of fsqrt if the input is denormal
For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will
have the impact the precision. As the fsqrt added belong to the cold path of the
cmp+branch, it won't impact the performance for normal inputs for PowerPC, but improve
the precision if the input is denormal.

Reviewed By: Spatel

Differential Revision: https://reviews.llvm.org/D80974
2020-11-27 02:10:55 +00:00
QingShan Zhang
9c588f53fc [DAGCombine] Add hook to allow target specific test for sqrt input
PowerPC has instruction ftsqrt/xstsqrtdp etc to do the input test for software square root.
LLVM now tests it with smallest normalized value using abs + setcc. We should add hook to
target that has test instructions.

Reviewed By: Spatel, Chen Zheng, Qiu Chao Fang

Differential Revision: https://reviews.llvm.org/D80706
2020-11-25 05:37:15 +00:00
Kai Luo
5931be60b5 [DAGCombine][PowerPC] Convert negated abs to trivial arithmetic ops
This patch converts `0 - abs(x)` to `Y = sra (X, size(X)-1); sub (Y, xor (X, Y))` for better codegen.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D91120
2020-11-24 09:43:35 +00:00
Kerry McLaughlin
306c8ab208 [SVE][CodeGen] Improve codegen of scalable masked scatters
If the scatter store is able to perform the sign/zero extend of
its index, this is folded into the instruction with refineIndexType().
Additionally, refineUniformBase() will return the base pointer and index
from an add + splat_vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90942
2020-11-13 11:19:36 +00:00
Francesco Petrogalli
fc2fe6817e [llvm][AArch64] Simplify (and (sign_extend..) #bitmask).
Fold

    VT = (and (sign_extend NarrowVT to VT) #bitmask)

into

    VT = (zero_extend NarrowVT)

With this combine, the test replaces a sign extended load + an
unsigned extention with a zero extended load to render one of the
operands of the last multiplication.

  BEFORE                       |  AFTER
    f_i16_i32:                 |    f_i16_i32:
         .fnstart              |           .fnstart
         ldrsh   r0, [r0]      |           ldrh    r1, [r1]
         ldrsh   r1, [r1]      |           ldrsh   r0, [r0]
         smulbb  r0, r1, r0    |           smulbb  r0, r0, r1
         uxth    r1, r1        |           mul     r0, r0, r1
         mul     r0, r0, r1    |           bx      lr
         bx      lr            |

Reviewed By: resistor

Differential Revision: https://reviews.llvm.org/D90605
2020-11-09 12:53:36 +00:00