1573 Commits

Author SHA1 Message Date
Sander de Smalen
81b7f115fb
[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979)
It seems TypeSize is currently broken in the sense that:

  TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)

without failing its assert that explicitly tests for this case:

  assert(LHS.Scalable == RHS.Scalable && ...);

The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.

This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.

The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
2023-11-22 08:52:53 +00:00
Simon Pilgrim
98efa8f9aa [DAG] Fix ShrinkDemandedOp doxygen description to match behaviour. NFC.
ShrinkDemandedOp checks for both isTruncateFree AND isZExtFree but extends with ANY_EXTEND.
2023-11-18 22:44:08 +00:00
Tavian Barnes
75cf672b12
[SDAG] Simplify is-power-of-2 codegen (#72275)
When x is not known to be nonzero, ctpop(x) == 1 is expanded to

    x != 0 && (x & (x - 1)) == 0

resulting in codegen like

    leal    -1(%rdi), %eax
    testl   %eax, %edi
    sete    %cl
    testl   %edi, %edi
    setne   %al
    andb    %cl, %al

But another expression that works is

    (x ^ (x - 1)) > x - 1

which has nicer codegen:

    leal    -1(%rdi), %eax
    xorl    %eax, %edi
    cmpl    %eax, %edi
    seta    %al
2023-11-15 22:26:34 +09:00
Yingwei Zheng
650026897c
[RISCV][SDAG] Prefer ShortForwardBranch to lower sdiv by pow2 (#67364)
This patch lowers `sdiv x, +/-2**k` to `add + select + shift` when the
short forward branch optimization is enabled. The latter inst seq
performs faster than the seq generated by target-independent
DAGCombiner. This algorithm is described in ***Hacker's Delight***.

This patch also removes duplicate logic in the X86 and AArch64 backend.
But we cannot do this for the PowerPC backend since it generates a
special instruction `addze`.
2023-11-10 21:38:47 +08:00
Craig Topper
70b35ec0a8
[SelectionDAG] Add initial support for nneg flag on ISD::ZERO_EXTEND. (#70872)
This adds the nneg flag to SDNodeFlags and the node printing code.
SelectionDAGBuilder will add this flag to the node if the target doesn't
prefer sign extend.

A future RISC-V patch can remove the sign extend preference from
SelectionDAGBuilder.

I've also added the flag to the DAG combine that converts
ISD::SIGN_EXTEND to ISD::ZERO_EXTEND.
2023-11-03 11:15:08 -07:00
Qiu Chaofan
b46e768455
[DAGCombine] Fold setcc_eq infinity into is.fpclass (#67829) 2023-11-01 11:51:15 +09:00
Simon Pilgrim
8d2efd7427 [DAG] Avoid ComputeNumSignBits call when we know the result is unsigned
D146121 needs to set the NSW flag, but given the result is NUW then we know that the result has leading zeros, so we don't need to call ComputeNumSignBits - just reuse the existing KnownBits value instead.
2023-10-29 17:35:24 +00:00
Simon Pilgrim
d96529af3c [DAG] Attempt shl narrowing in SimplifyDemandedBits (REAPPLIED)
If a shl node leaves the upper half bits zero / undemanded, then see if we can profitably perform this with a half-width shl and a free trunc/zext.

Followup to D146121

Reapplied - moved after the ShrinkDemandedOp call; reuse the existing KnownBits result; ensure that we only attempt this if all the upper bits are demanded; 547dc461225ba should address the remaining regressions that were noticed in the previous commit.

Differential Revision: https://reviews.llvm.org/D155472
2023-10-29 15:38:46 +00:00
Simon Pilgrim
547dc46122 [DAG] SimplifyDemandedBits - ensure we drop NSW/NUW flags when we simplify a SHL node's input
We already do this for variable shifts, but we missed it for constant shifts

Fixes #69965
2023-10-26 10:34:58 +01:00
Simon Pilgrim
2a40ec2d3e [DAG] SimplifyDemandedBits - fix isOperationLegal typo in D146121
We need to check that the simplified ISD::SRL node is legal, not the old one

Noticed while trying to isolate the regressions in D155472
2023-10-17 17:50:12 +01:00
Kirill Stoimenov
0a776996af Revert "[DAG] Attempt shl narrowing in SimplifyDemandedBits"
This reverts commit 7a8c04ef84ecdab4390b451d4c2fe17bc45a7b63.
2023-10-04 22:15:41 +00:00
Simon Pilgrim
7a8c04ef84 [DAG] Attempt shl narrowing in SimplifyDemandedBits
If a shl node leaves the upper half bits zero / undemanded, then see if we can profitably perform this with a half-width shl and a free trunc/zext.

Followup to D146121

Differential Revision: https://reviews.llvm.org/D155472
2023-10-04 10:23:02 +01:00
Nick Desaulniers
e0a48c065b
[InlineAsm] add comments for NumOperands and ConstraintType (#67474)
Splitting up patches for #20571. I found these comments generally useful
to add and not predicated on those changes. Hopefully they help future
travelers.
2023-09-28 08:24:56 -07:00
Nick Desaulniers
35a364fa5c
[TargetLowering] fix index OOB (#67494)
I accidentally introduced this in

commit 330fa7d2a4e0 ("[TargetLowering] Deduplicate choosing InlineAsm
constraint between ISels (#67057)")

Fix forward.
2023-09-26 15:50:26 -07:00
Sam McCall
679c3a1791 [TargetLowering] use stable_sort to avoid nondeterminism
After 330fa7d2a4e0cfbb4b078 we were seeing nondeterministic failures of
llvm/test/CodeGen/ARM/thumb-big-stack.ll, with different code being
generated in different runs.

Switching sort -> stable_sort fixes this.
It looks like the old algorithm picked the first best option, and using
stable_sort restores that behavior.
2023-09-26 15:16:09 +02:00
Nick Desaulniers
330fa7d2a4
[TargetLowering] Deduplicate choosing InlineAsm constraint between ISels (#67057)
Given a list of constraints for InlineAsm (ex. "imr") I'm looking to
modify the order in which they are chosen. Before doing so, I noticed a
fair
amount of logic is duplicated between SelectionDAGISel and GlobalISel
for this.

That is because SelectionDAGISel is also trying to lower immediates
during selection. If we detangle these concerns into:
1. choose the preferred constraint
2. attempt to lower that constraint

Then we can slide down the list of constraints until we find one that
can be lowered. That allows the implementation to be shared between
instruction selection frameworks.

This makes it so that later I might only need to adjust the priority of
constraints in one place, and have both selectors behave the same.
2023-09-25 08:53:03 -07:00
Sirish Pande
e6f9483f77
[SelectionDAG] Flags are dropped when creating a new FMUL (#66701)
While simplifying some vector operators in DAG combine, we may need to
create new instructions for simplified vectors. At that time, we need to
make sure that all the flags of the new instruction are copied/modified
from the old instruction.

If "contract" is dropped from an instruction like FMUL, it may not
generate FMA instruction which would impact performance.

Here's an example where "contract" flag is dropped when FMUL is created.

Replacing.2 t42: v2f32 = fmul contract t41, t38
With: t48: v2f32 = fmul t38, t38

Co-authored-by: Sirish Pande <sirish.pande@amd.com>
2023-09-21 10:26:34 -05:00
Craig Topper
8f04d81ede [SelectionDAG][RISCV] Mask constants to narrow size in TargetLowering::expandUnalignedStore.
If the SRL for Hi constant folds, but we don't remoe those bits from
the Lo, we can end up with strange constant folding through DAGCombine later.
I've only seen this with constants being lowered to constant pools
during lowering on RISC-V.
2023-09-18 09:10:19 -07:00
Yingwei Zheng
e042ff7eef
[SDAG][RISCV] Avoid expanding is-power-of-2 pattern on riscv32/64 with zbb
This patch adjusts the legality check for riscv to use `cpop/cpopw` since `isOperationLegal(ISD::CTPOP, MVT::i32)` returns false on rv64gc_zbb.
Clang vs gcc: https://godbolt.org/z/rc3s4hjPh

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D156390
2023-09-17 02:56:09 +08:00
Kazu Hirata
5fb990ac51 [SelectionDAG] Use isNullConstant (NFC) 2023-09-02 09:32:43 -07:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
Simon Pilgrim
2a81396b1b [DAG] SimplifyDemandedBits - add SMIN/SMAX KnownBits comparison analysis
Followup to D158364

Also, final fix for Issue #59902 which noted that the snippet should just return 1
2023-09-01 12:42:30 +01:00
Simon Pilgrim
aca8b9d0d5 [DAG] SimplifyDemandedBits - if we're only demanding the signbits, a MIN/MAX node can be simplified to a OR or AND node
Extension to the signbit case, if the signbits extend down through all the demanded bits then SMIN/SMAX/UMIN/UMAX nodes can be simplified to a OR/AND/AND/OR.

Alive2: https://alive2.llvm.org/ce/z/mFVFAn (general case)

Differential Revision: https://reviews.llvm.org/D158364
2023-09-01 10:56:32 +01:00
Daniel Paoliello
0c5c7b52f0 Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:

* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).

Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.

Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)

This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D149367
2023-08-31 12:06:50 -07:00
Luke Lau
6e4860f5d0 [SDAG] Add SimplifyDemandedBits support for ISD::SPLAT_VECTOR
This improves some cases where a splat_vector uses a build_pair that can be
simplified, e.g:

(rotl x:i64, splat_vector (build_pair x1:i32, x2:i32))

rotl only demands the bottom 6 bits, so this patch allows it to simplify it to:

(rotl x:i64, splat_vector (build_pair x1:i32, undef:i32))

Which in turn improves some cases where a splat_vector_parts is lowered on
RV32.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D158839
2023-08-28 10:35:56 +01:00
Arthur Eubanks
0a4fc4ac1c Revert "Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables"
This reverts commit 8d0c3db388143f4e058b5f513a70fd5d089d51c3.

Causes crashes, see comments in https://reviews.llvm.org/D149367.

Some follow-up fixes are also reverted:

This reverts commit 636269f4fca44693bfd787b0a37bb0328ffcc085.
This reverts commit 5966079cf4d4de0285004eef051784d0d9f7a3a6.
This reverts commit e7294dbc85d24a08c716d9babbe7f68390cf219b.
2023-08-25 18:34:15 -07:00
Daniel Paoliello
8d0c3db388 Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:

* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).

Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.

Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)

This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D149367
2023-08-25 10:19:17 -07:00
Simon Pilgrim
d254014fdb [DAG] Add willNotOverflowAdd/willNotOverflowSub helper functions.
Matches similar instructions on InstCombine
2023-08-24 17:52:54 +01:00
Yingwei Zheng
d6639f83a9
[SDAG][RISCV] Avoid folding setcc (xor C1, -1), C2, cond into setcc (xor C2, -1), C1, cond
This patch fixes https://github.com/llvm/llvm-project/issues/64935.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158654
2023-08-24 04:18:17 +08:00
Kazu Hirata
134115618a [CodeGen] Use isAllOnesConstant and isNullConstant (NFC) 2023-08-20 22:56:40 -07:00
Simon Pilgrim
95865e5138 [DAG] SimplifyDemandedBits - if we're only demanding the signbit, a SMIN/SMAX node can be simplified to a OR/AND node respectively.
Alive2: https://alive2.llvm.org/ce/z/MehvFB

REAPPLIED from 54d663d5896008 with fix for using the correct DemandedBits mask.
2023-08-20 14:20:49 +01:00
Craig Topper
0a5347f40d [DAG] SimplifyDemandedBits - Use DemandedBits intead of OriginalDemandedBits to when simplifying UMIN/UMAX to AND/OR.
DemandedBits is forced to all ones if there are multiple users.

The changes X86 test cases looks like they were miscompiles before.
The value of eax/rax from the cmov is returned from the function in
addition to being used by the sar. That usage needs all bits even
though the sar doesn't.
2023-08-18 11:59:18 -07:00
Thurston Dang
29b2009061 Revert "[DAG] SimplifyDemandedBits - if we're only demanding the signbit, a SMIN/SMAX node can be simplified to a OR/AND node respectively."
This reverts commit 54d663d5896008c09c938f80357e2a056454bc65, which breaks the test CodeGen/SystemZ/ctpop-01.ll for stage2-ubsan check (see https://lab.llvm.org/buildbot/#/builders/85/builds/18410)

I manually confirmed that the test had been passing immediately prior to that commit
(BUILDBOT_REVISION=4772c66cfb00d60f8f687930e9dd3aa1b6872228 llvm-zorg/zorg/buildbot/builders/sanitizers/buildbot_bootstrap_ubsan.sh)
2023-08-18 18:08:10 +00:00
Simon Pilgrim
bd9bf9cb67 [X86] SimplifyDemandedBits - move MaskedValueIsZero as late as possible to avoid unnecessary (recursive) analysis costs. NFC.
Mentioned on D155472 for the SHL equivalent
2023-08-18 15:14:06 +01:00
Simon Pilgrim
4cd1c07491 [DAG] SimplifyDemandedBits - if we're only demanding the msb, a UMIN/UMAX node can be simplified to a AND/OR node respectively.
Alive2: https://alive2.llvm.org/ce/z/qnvmc6
2023-08-18 12:12:22 +01:00
Simon Pilgrim
54d663d589 [DAG] SimplifyDemandedBits - if we're only demanding the signbit, a SMIN/SMAX node can be simplified to a OR/AND node respectively.
Alive2: https://alive2.llvm.org/ce/z/MehvFB
2023-08-18 11:35:34 +01:00
Noah Goldstein
e7f7b63fb3 [DAGCombiner][X86] Guard (X & Y) ==/!= Y --> (X & Y) !=/== 0 behind TLI preference
On X86 for vec types `(X & Y) == Y` is generally preferable to
`(X & Y) != 0`. Creating zero requires an extra instruction and on
pre-avx512 targets there is no vector `pcmpne` so it requires two
additional instructions to invert the `pcmpeq`.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D157014
2023-08-16 02:00:15 -05:00
Simon Pilgrim
b0a77af4f1 [DAG] SimplifyDemandedBits - add sra(shl(x,c1),c1) -> sign_extend_inreg(x) demanded elts fold
Move the sra(shl(x,c1),c1) -> sign_extend_inreg(x) fold inside SimplifyDemandedBits so we can recognize hidden splats with DemandedElts masks.

Because the c1 shift amount has multiple uses, hidden splats won't get simplified to a splat constant buildvector - meaning the existing fold in DAGCombiner::visitSRA can't fire as it won't see a uniform shift amount.

I also needed to add TLI preferSextInRegOfTruncate hook to help keep truncate(sign_extend_inreg(x)) vector patterns on X86 so we can use PACKSS more efficiently.

Differential Revision: https://reviews.llvm.org/D157972
2023-08-15 16:32:03 +01:00
Bjorn Pettersson
e53b28c833 [llvm] Drop some bitcasts and references related to typed pointers
Differential Revision: https://reviews.llvm.org/D157551
2023-08-10 15:07:07 +02:00
Alex Bradbury
1cffd26483 [TargetLowering][RISCV] Improve codegen for saturating bf16 to int conversion
Extending to f32 first (as is done for f16) results in better generated
code for RISC-V (and affects no other in-tree tests). Additionally,
performing the FP_EXTEND first seems equally justified for bf16 as for
f16.

Differential Revision: https://reviews.llvm.org/D156944
2023-08-07 11:21:25 +01:00
Simon Pilgrim
ae60706da0 [DAG] SimplifyDemandedBits - call ComputeKnownBits for constant non-uniform ISD::SRL shift amounts
We only attempted to determine KnownBits for uniform constant shift amounts, but ComputeKnownBits is able to handle some non-uniform cases as well that we can use as a fallback.
2023-07-21 14:52:57 +01:00
Simon Pilgrim
7567b72f4d [DAG] ShrinkDemandedConstant - early-out for empty DemandedBits/Elts
Leave this to constant folding in SimplifyDemandedBits

Fixes #63975
2023-07-20 12:18:10 +01:00
Simon Pilgrim
d7eb9240c0 [DAG] SimplifyDemandedBits - attempt to use SimplifyMultipleUseDemandedBits for bitcasts from larger element types
Attempt to avoid multi-use ops if the bitcast doesn't need anything from them.
2023-07-18 18:38:03 +01:00
Simon Pilgrim
e9caa37e9c [DAG] Move lshr narrowing from visitANDLike to SimplifyDemandedBits
Inspired by some of the cases from D145468

Let SimplifyDemandedBits handle the narrowing of lshr to half-width if we don't require the upper bits, the narrowed shift is profitable and the zext/trunc are free.

A future patch will propose the equivalent shl narrowing combine.

Differential Revision: https://reviews.llvm.org/D146121
2023-07-17 15:50:09 +01:00
Jon Roelofs
56e60bc5bb
TargetLowering: fix an infinite DAG combine in SimplifySETCC
TargetLowering::SimplifySetCC wants to swap the operands of a SETCC to
canonicalize the constant to the RHS. The bug here was that it did so whether
or not the RHS was already a constant, leading to an infinite loop.

rdar://111847838

Divverential revision: https://reviews.llvm.org/D155095

This reverts commit cdc633e4bc93d4bf241ecd4c29691ae065749313.
2023-07-12 16:13:27 -07:00
Jon Roelofs
cdc633e4bc
Revert "TargetLowering: fix an infinite DAG combine in SimplifySETCC"
This reverts commit b76c85b355578d9076c22a86faf4ea8de1745bdf.

It broke the RISCV-enabled bots. Oops.
2023-07-12 12:22:03 -07:00
Jon Roelofs
b76c85b355
TargetLowering: fix an infinite DAG combine in SimplifySETCC
TargetLowering::SimplifySetCC wants to swap the operands of a SETCC to
canonicalize the constant to the RHS. The bug here was that it did so whether
or not the RHS was already a constant, leading to an infinite loop.

rdar://111847838

Differential revision: https://reviews.llvm.org/D155095
2023-07-12 11:44:15 -07:00
Matt Arsenault
b59022b42e DAG: Handle lowering of unordered fcZero|fcSubnormal to fcmp 2023-07-11 18:30:15 -04:00
Matt Arsenault
310f839612 DAG: Lower is.fpclass fcInf to fcmp of fabs
InstCombine should have taken care of this, but I think
this is more useful in the future when the expansion
tries to handle multiple cases at a time with fcmp.

x87 looks worse to me but the only thing I know about it is that
I aggressively do not care about it.

https://reviews.llvm.org/D143198
2023-07-07 17:00:10 -04:00
Matt Arsenault
64df9573a7 DAG: Handle inversion of fcSubnormal | fcZero
There are a number of more test combinations here that
can be done together and reduce the number of instructions.

https://reviews.llvm.org/D143191
2023-07-06 21:19:44 -04:00