1361 Commits

Author SHA1 Message Date
Craig Topper
a983ef2c17 [DAGCombiner][AArch64][VE] Teach BuildUDIV/SDIV to use 2x mul when mulh/mul_lohi are not available.
Correct the legality of i32 mul_lohi on AArch64.

Previously, AArch64 incorrectly reported i32 mul_lohi as Legal.
This allowed BuildUDIV/SDIV to use them. A later DAGCombiner would
replace them with MULHS/MULHU because only the high half was used.
This conversion does not check the legality of MULHS/MULHU under
the assumption that LegalizeDAG can turn it back into MUL_LOHI later.

After they are converted to MULHS/MULHU, DAGCombine ran and saw that
these operations aren't supported but an i64 MUL is. So they get
converted to that plus a shift. Without this, LegalizeDAG would
convert back MUL_LOHI and isel would fail to find a pattern.

This patch teaches BuildUDIV/SDIV to create the wide mul and shift
so that we can report the correct operation legality on AArch64. It
also enables div by constant folding for more cases on VE.

I don't know if VE wants this div by constant optimization or not. If they
don't want it, they can use the isIntDivCheap hook to disable it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D150333
2023-05-12 09:06:17 -07:00
Dhruv Chawla
1d21d2eb7f [TargetLowering] Fix unnecessary call to computeKnownBits (NFCI)
In the SimplifyDemandedBits function, there is a fallthrough to the
default case in the case of ISD::ADD, ISD::MUL and ISD::SUB. This
leads to a call to computeKnownBits which is unnecessary as the
calls to SimplifyDemandedBits in the cases themselves handle the
calculation of the known bits. This information is discarded through
the Known2 variables.

By keeping this information around and calling
KnownBits::mul or KnownBits::computeForAddSub directly, the
unnecessary computation can be avoided. For now, the NSW bit is not
passed through to KnownBits as this is something that
computeKnownBits does not handle either. This requires updating
computeForAddCarry to handle the flag as well.

Differential Revision: https://reviews.llvm.org/D150110
2023-05-08 16:14:01 +02:00
Simon Pilgrim
051918c71e [DAG] expandIntMINMAX - add umax(x,1) --> sub(x,cmpeq(x,0)) fold
Move the fold from X86 to generic expansion

(We also have several existing expansions that are missing freezes on repeated operands - I've added a TODO for now).
2023-05-05 19:27:52 +01:00
Simon Pilgrim
04e809ab90 [DAG] Add TargetLowering::expandABD and convert X86 lowering to use it
Scalar widening cases are still custom lowered in the X86 backend - we still need to add promotion/legalization support to handle these
2023-05-05 15:13:23 +01:00
Evgenii Kudriashov
a82d27a9a6 [X86] Support llvm.{min,max}imum.f{16,32,64}
Addresses https://github.com/llvm/llvm-project/issues/53353

Reviewed By: RKSimon, pengfei

Differential Revision: https://reviews.llvm.org/D145634
2023-05-04 21:04:48 +08:00
Craig Topper
344368fb98 [TargetLowering] Stop passing an ISD::CondCode to isOperationLegalOrCustom.
ISD::CondCode is a separate num space from opcodes. isOperationLegalOrCustom
should take an opcode.

Reviewed By: barannikov88

Differential Revision: https://reviews.llvm.org/D149528
2023-04-29 15:23:09 -07:00
Sergei Barannikov
e744e51b12 [SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC)
This will make them consistent with other overflow-aware nodes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D148196
2023-04-29 21:59:58 +03:00
Craig Topper
df017ba9d3 [TargetLowering] Don't use ISD::SELECT_CC in expandFP_TO_INT_SAT.
This function gets called for vectors and ISD::SELECT_CC was never
intended to support vectors. Some updates were made to support
it when this function started getting used for vectors.

Overall, using separate ISD::SETCC and ISD::SELECT looks like an
improvement even for scalar.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D149481
2023-04-29 10:23:08 -07:00
Matt Arsenault
bc37be1855 LangRef: Add "dynamic" option to "denormal-fp-math"
This is stricter than the default "ieee", and should probably be the
default. This patch leaves the default alone. I can change this in a
future patch.

There are non-reversible transforms I would like to perform which are
legal under IEEE denormal handling, but illegal with flushing zero
behavior. Namely, conversions between llvm.is.fpclass and fcmp with
zeroes.

Under "ieee" handling, it is legal to translate between
llvm.is.fpclass(x, fcZero) and fcmp x, 0.

Under "preserve-sign" handling, it is legal to translate between
llvm.is.fpclass(x, fcSubnormal|fcZero) and fcmp x, 0.

I would like to compile and distribute some math library functions in
a mode where it's callable from code with and without denormals
enabled, which requires not changing the compares with denormals or
zeroes.

If an IEEE function transforms an llvm.is.fpclass call into an fcmp 0,
it is no longer possible to call the function from code with denormals
enabled, or write an optimization to move the function into a denormal
flushing mode. For the original function, if x was a denormal, the
class would evaluate to false. If the function compiled with denormal
handling was converted to or called from a preserve-sign function, the
fcmp now evaluates to true.

This could also be of use for strictfp handling, where code may be
changing the denormal mode.

Alternative name could be "unknown".

Replaces the old AMDGPU custom inlining logic with more conservative
logic which tries to permit inlining for callees with dynamic handling
and avoids inlining other mismatched modes.
2023-04-29 08:44:59 -04:00
Kazu Hirata
972983539b [llvm] Apply fixes from readability-redundant-control-flow (NFC) 2023-04-16 00:13:46 -07:00
Kazu Hirata
63c4967352 Use APInt::getOneBitSet (NFC) 2023-04-10 18:19:17 -07:00
Craig Topper
b5f207e5b2 [SelectionDAG] Rename Flag->Glue. NFC 2023-04-02 19:46:51 -07:00
Simon Pilgrim
8153b92d9b [DAG] Add SelectionDAG::SplitScalar helper
Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source.

Differential Revision: https://reviews.llvm.org/D147264
2023-03-31 18:35:40 +01:00
Craig Topper
c9e4d9a8ea [LegalizeTypes][TargetLowering][RISCV] Fix regressions from D146786.
Add some special cases for UADDO to recover codegen after D146786.

Reviewed By: reames, liaolucy

Differential Revision: https://reviews.llvm.org/D146789
2023-03-27 09:58:51 -07:00
Kazu Hirata
7bb6d1b32e [llvm] Skip getAPIntValue (NFC)
ConstantSDNode provides some convenience functions like isZero,
getZExtValue, and isMinSignedValue that are named identically to those
provided by APInt, so we can "skip" getAPIntValue.
2023-03-22 22:10:25 -07:00
Jun Zhang
b3e12beb44
[TLI] Fold ~X >/< ~Y --> Y >/< X
Fixes: https://github.com/llvm/llvm-project/issues/61120

Signed-off-by: Jun Zhang <jun@junz.org>

Differential Revision: https://reviews.llvm.org/D146512
2023-03-23 12:49:05 +08:00
Craig Topper
a37df84f99 [SelectionDAG][RISCV] Remove code for handling too small shift type from SimplifyDemandedBits.
This code detected that the type returned from getShiftAmountTy was
too small to hold the constant shift amount. But it used the full
type size instead of scalar type size leading it to crash for
scalable vectors.

This code was necessary when getShiftAmountTy would always
return the target preferred shift amount type for scalars even when
the type was an illegal type larger than the target supported. For
vectors, getShiftAmountTy has always returned the vector type.

Fortunately, getShiftAmountTy was fixed a while ago to detect that
the target's preferred size for scalars is not large enough for the
type. So we can delete this code.

Switched to use getShiftAmountConstant to further simplify the code.

Fixs PR61561.
2023-03-21 11:08:19 -07:00
Matt Arsenault
9356ec1516 CodeGen: Reorder case handling for is.fpclass legalization
Subnormal and zero checks can be combined into one, so move
the code closer to reduce the diff in a future change.
2023-03-17 11:29:50 -04:00
Simon Pilgrim
6bc0e362d7 [DAG] TargetLowering::ShrinkDemandedOp - move SmallVTBits iterator inside for loop. NFC 2023-03-16 12:12:33 +00:00
Simon Pilgrim
7aa7393aab [DAG] TargetLowering::ShrinkDemandedOp - pull out repeated getValueType calls. NFC 2023-03-16 12:12:33 +00:00
Simon Pilgrim
dc20ce7e54 [DAG] TargetLowering::ShrinkDemandedOp - rename Demanded arg to DemandedBits. NFC
Make it clear this is referring to DemandedBits not DemandedElts.
2023-03-15 13:22:21 +00:00
Jay Foad
0265dd9925 Fix "compatiable" typos 2023-03-07 12:57:39 +00:00
Simon Pilgrim
73cdccad55 [DAG] expandIntMINMAX - attempt to match existing SETCC node
As noticed on D144789, when we have pairs of min/max nodes we often end up with multiple comparisons which we could reuse with commuted select ops, so check to see if a suitable SETCC already exists. This also allowed us to remove a similar X86 peephole.

There are other getSETCC cases where we could safely reuse other CondCodes as well - I've been trying to think of how we could reuse this logic in SelectionDAG but haven't found anything that always works well.

An alternative would be to have a TLI callback that returns a preferred CondCode from a list of options, I've noticed this helped fpclamptosat tests on some other targets (MVE + WebAssembly), but other tests suffered.

Differential Revision: https://reviews.llvm.org/D145065
2023-03-01 19:04:03 +00:00
David Green
06daa515b2 [AArch64] Don't remove free sext_inreg(vector_extract(x)) if it leads to multiple extracts
If we have sext_inreg(vector_extract(x)) but the top bits are not used, DAG
will try to remove the sext_inreg, using vector_extract(x) directly. This can
lead to multiple uses of both sext_inreg(vector_extract(x)) and
vector_extract(x), leading to the generation of both umov and smov extracts.
This adds a target hook to prevent that under AArch64 where the sext_inreg can
be considered free if there are multiple uses of the sext and no uses of the
vector_extract. This helps fix a small regression from D144550.

Differential Revision: https://reviews.llvm.org/D144850
2023-02-27 19:20:10 +00:00
Serge Pavlov
7f81dd4dd6 [NFC] Make FPClassTest a bitmask enumeration
This is recommit of 2e416cdd52, fixed to be accepatble by GCC.
The original commit message is below.

With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.

Differential Revision: https://reviews.llvm.org/D144241
2023-02-24 15:12:16 +07:00
Serge Pavlov
08a09235b6 Revert "[NFC] Make FPClassTest a bitmask enumeration"
This reverts commit e7613c1d9b259bdf2b0b06b4169d9a10dd553406.

GCC issues an error:

In file included from /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:9:
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:66:22: error: explicit specialization of template<class E, class Enable> struct llvm::is_bitmask_enum outside its namespace must use a nested-name-specifier [-fpermissive]
   66 |   template <> struct is_bitmask_enum<Enum> : std::true_type {};                \
      |                      ^~~~~~~~~~~~~~~~~~~~~
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:30:1: note: in expansion of macro LLVM_DECLARE_ENUM_AS_BITMASK
   30 | LLVM_DECLARE_ENUM_AS_BITMASK(Flags2, V4);
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
2023-02-23 12:55:58 +07:00
Serge Pavlov
e7613c1d9b [NFC] Make FPClassTest a bitmask enumeration
This is recommit of 2e416cdd52, reverted in 8555ab2fcd, because GCC
complains on extra qualification. The macro LLVM_DECLARE_ENUM_AS_BITMASK
does not specify llvm:: anymore, so the macro must occur in the namespace
llvm. Documentation updated accordingly. The original commit message is below.

With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.

Differential Revision: https://reviews.llvm.org/D144241
2023-02-23 12:38:57 +07:00
Nikita Popov
8555ab2fcd Revert "[NFC] Make FPClassTest a bitmask enumeration"
This reverts commit 2e416cdd52c1079b8c7cb1f7d7e557c889a4fb56.

Breaks the GCC build:

In file included from /home/npopov/repos/llvm-project/llvm/include/llvm/ADT/FloatingPointMode.h:18,
                 from /home/npopov/repos/llvm-project/llvm/include/llvm/ADT/APFloat.h:20,
                 from /home/npopov/repos/llvm-project/llvm/lib/Support/APFloat.cpp:14:
/home/npopov/repos/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:66:22: error: extra qualification not allowed [-fpermissive]
   66 |   template <> struct llvm::is_bitmask_enum<Enum> : std::true_type {};          \
      |                      ^~~~
/home/npopov/repos/llvm-project/llvm/include/llvm/ADT/FloatingPointMode.h:223:1: note: in expansion of macro ‘LLVM_DECLARE_ENUM_AS_BITMASK’
  223 | LLVM_DECLARE_ENUM_AS_BITMASK(FPClassTest, /* LargestValue */ fcPosInf);
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/npopov/repos/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:67:22: error: extra qualification not allowed [-fpermissive]
   67 |   template <> struct llvm::largest_bitmask_enum_bit<Enum> {                    \
      |                      ^~~~
/home/npopov/repos/llvm-project/llvm/include/llvm/ADT/FloatingPointMode.h:223:1: note: in expansion of macro ‘LLVM_DECLARE_ENUM_AS_BITMASK’
  223 | LLVM_DECLARE_ENUM_AS_BITMASK(FPClassTest, /* LargestValue */ fcPosInf);
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
[43/4396] Building CXX object lib/Supp...iles/LLVMSupport.dir/CommandLine.cpp.o
2023-02-22 08:56:19 +01:00
Serge Pavlov
2e416cdd52 [NFC] Make FPClassTest a bitmask enumeration
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.

Differential Revision: https://reviews.llvm.org/D144241
2023-02-22 14:20:04 +07:00
Kazu Hirata
a28b252d85 Use APInt::getSignificantBits instead of APInt::getMinSignedBits (NFC)
Note that getMinSignedBits has been soft-deprecated in favor of
getSignificantBits.
2023-02-19 23:56:52 -08:00
Kazu Hirata
397265d88f [llvm] Use APInt::isAllOnes instead of isAllOnesValue (NFC)
Note that isAllOnesValue has been soft-deprecated in favor of
isAllOnes.
2023-02-19 23:35:39 -08:00
Kazu Hirata
9e5d2495ac Use APInt::isOne instead of APInt::isOneValue (NFC)
Note that isOneValue has been soft-deprecated in favor of isOne.
2023-02-19 23:06:36 -08:00
Kazu Hirata
b7ffd9686d Use APInt::getAllOnes instead of APInt::getAllOnesValue (NFC)
Note that getAllOnesValue has been soft-deprecated in favor of
getAllOnes.
2023-02-19 22:54:23 -08:00
Kazu Hirata
f8f3db2756 Use APInt::count{l,r}_{zero,one} (NFC) 2023-02-19 22:04:47 -08:00
Kazu Hirata
7e6e636fb6 Use llvm::has_single_bit<uint32_t> (NFC)
This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t>
where the argument is wider than uint32_t.
2023-02-15 22:17:27 -08:00
Kazu Hirata
64dad4ba9a Use llvm::bit_cast (NFC) 2023-02-14 01:22:12 -08:00
Samuel Parker
7bff37783f [SDAG] Check fminnum/fmaxnum for non-zero operand.
Currently, in TargetLowering, if the target does not support fminnum, we lower
to fminimum if neither operand could be a NaN. But this isn't quite correct
because fminnum and fminimum treat +/-0 differently; so, we need to prove that
one of the operands isn't a zero, or we don't have signed zeros.

Differential Revision: https://reviews.llvm.org/D143256
2023-02-07 10:54:23 +00:00
David Green
fd67e9545d [DAG] Remove non-canonical AVG case.
This removes a condition in the detection of AVG nodes, where we needn't be
checking the LHS of an add node as any const will be canonicalized to the RHS.
2023-02-06 17:24:25 +00:00
David Green
b76f40c12f [DAG][AArch64][ARM] Recognize avg (hadd) from wrapping flags
This slightly extends the creation of hadd nodes to allow them to be generated
with the original type size if wrapping flags allow.
https://alive2.llvm.org/ce/z/bPjakD
https://alive2.llvm.org/ce/z/fa_gzb

Differential Revision: https://reviews.llvm.org/D143371
2023-02-06 17:24:01 +00:00
Simon Pilgrim
f7b10467b6 [TLI] SimplifyMultipleUseDemandedBits - remove insert_subvector(undef, x, 0) fold
SimplifyMultipleUseDemandedBits shouldn't be creating general nodes on the fly, it should mainly just peek through them (although we do currently allow creation of new bitcasts and constant folding).

This is mostly a win - by avoiding new nodes we avoid a lot of hasOneUse limitations inside x86 shuffle combining - the main regressions I've noticed are where we've ended up with multiple insert_subvector(undef, x, 0) nodes, widening x to different vector widths - that should hopefully be improved when we remove the last of the vector widening from combineX86ShufflesRecursively for Issue #45319
2023-02-06 09:55:11 +00:00
Matt Arsenault
db0e659161 DAG: Fix broken lowering of is.fplcass fcZero with DAZ
is.fpclass x, fcZero is not equivalent to fcmp with 0 if
denormals are treated as 0. It would be equivalent to fcZero|fcSubnormal
which can be done separately; this is the minimal correctness fix.

The same optimization was not ported to the GlobalISel version.
2023-02-05 09:14:16 -04:00
Kazu Hirata
526966d07d Use llvm::bit_ceil (NFC)
Note that:

  std::has_single_bit(X) ? X : llvm::NextPowerOf2(X);

is equivalent to:

  std::bit_ceil(X)

even for input 0.
2023-01-28 16:13:09 -08:00
Kazu Hirata
22cdc6a126 [llvm] Use llvm::bit_ceil instead of PowerOf2Ceil (NFC)
The arguments to PowerOf2Ceil in this patch are all known to be
nonzero, so we can safely use llvm::bit_ceil here.
2023-01-25 00:05:33 -08:00
Roman Lebedev
edf004e691
[NFC][TargetLowering] isSplatValueForTargetNode(): add DAG operand
Without it we can't recurse further.
2023-01-16 00:02:20 +03:00
Guillaume Chatelet
48f5d77eee [NFC] Use TypeSize::getKnownMinValue() instead of TypeSize::getKnownMinSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:36:39 +00:00
Sanjay Patel
bf82070ea4 [SDAG] try to avoid multiply for X*Y==0
Forking this off from D140850 -
https://alive2.llvm.org/ce/z/TgBeK_
https://alive2.llvm.org/ce/z/STVD7d

We could almost justify doing this in IR, but consideration for
"minsize" requires that we only try it in codegen -- the
transform is not reversible.

In all other cases, avoiding multiply should be a win because a
mul is more expensive than simple/parallelizable compares. AArch
even has a trick to keep instruction count even for some types.

Differential Revision: https://reviews.llvm.org/D141086
2023-01-06 09:06:11 -05:00
Craig Topper
11e92bd61f [SelectionDAG] Improve codegen for udiv by constant if any divisors are 1.
If the divisor is 1, the magic algorithm does not return a correct
result and we end up using a select to pick the numerator for those
elements at the end.

Therefore we can use undef for that element of the earlier operations
when the divisor is 1. We sometimes get this through SimplifyDemandedVectorElts,
but not always. Definitely seems like we don't if the NPQ fixup is used.

Unfortunately, DAGCombiner is unable to fold srl X, <0, undef> to X so
I had to add flags to avoid emitting the srl unless one of the shift
amounts is non-zero.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D141022
2023-01-05 08:41:44 -08:00
Craig Topper
f8751b8ee6 [TargetLowering] Remove stale FIXME. NFC
This was implemented for scalars in D140750.
2023-01-04 18:40:42 -08:00
Craig Topper
3f749a5d9d [Support][SelectionDAG][GlobalISel] Hoist PostShift adjustment for IsAdd into UnsignedDivideUsingMagic.
Instead of doing the adjustment in 3 different places in the code
base, do it inside UnsignedDivideUsingMagic::get.

Differential Revision: https://reviews.llvm.org/D141014
2023-01-04 15:18:12 -08:00
Craig Topper
8bca60fb0a [SelectionDAG][GlobalISel] Don't use UnsignedDivisionByConstantInfo for divisor of 1.
The magic algorithm sets IsAdd indication for division by 1 that
the caller had to ignore.

I considered folding the ignore into UnsignedDivisionByConstantInfo,
but we only allow 1 for vectors of mixed visiors. And really what we
want to end up with is undef. Currently, we get to undef via
DemandedElts optimizations using the select instruction. We could
directly emit undef.

Differential Revision: https://reviews.llvm.org/D140940
2023-01-04 10:01:15 -08:00