Combine the two checks into a check if the exponent bits are 0. The
inverted case isn't reachable until a future change, and GlobalISel
currently doesn't attempt the inversion optimization.
https://reviews.llvm.org/D143182
The aarch64 backend will benefit from expanding 64vector sdiv/udiv into mulh
using shift(mul(ext, ext)), as the larger type size is legal and the mul(ext,
ext) can efficiently use smull/umull instructions. This extends the existing
code in GetMULHS to handle vector types for it.
Differential Revision: https://reviews.llvm.org/D154049
When the sign of either of the operands is known, it is possible to
determine what the saturating value will be without having to compute it
using the sign bits.
Differential Revision: https://reviews.llvm.org/D153575
`ctpop(X) eq/ne 1` is checking if X is a non-zero power of 2. Power of
2 check including zero is `(X & (X-1)) eq/ne 0` and unfortunately
there is no good pattern for checking a power of 2 while excluding
zero. So, when lowering `ctpop(X) eq/ne 1`, explicitly check
`IsKnownNeverZero(X)` to maybe be able to optimize out the extra zero
check.
We need this explicitly as DAGCombiner does not re-analyze provable
setcc nodes, and the middle-end never finds it beneficially to broaden
`ctpop(X) eq/ne 1` -> `ctpop(X) ule/ugt 1` (power of 2 including
zero).
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D152675
Note: Some patterns in visitFMA are needed refined to support splat of constant.
Reviewed By: luke
Differential Revision: https://reviews.llvm.org/D152260
Use computeOverflowForUnsignedAdd and computeOverflowForSignedAdd
instead. Unfortunately, this recomputes some known bits and sign bits
we may have already computed, but was the easiest fix without a lot
of restructuring.
This recovers the regressions from D151472.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D151858
This exposed a miscompile due to incorrect flag preservation in
integer type legalization, which has been fixed in D151472.
-----
This patch is a continuation of D150110. It separates the cases for
ADD and SUB into their own cases so that computeForAddSub can be
directly called and the NSW flag passed. This allows better
optimization when the NSW flag is enabled, and allows fixing up the
TODO that was there previously in SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D150769
This patch is a continuation of D150110. It separates the cases for
ADD and SUB into their own cases so that computeForAddSub can be
directly called and the NSW flag passed. This allows better
optimization when the NSW flag is enabled, and allows fixing up the
TODO that was there previously in SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D150769
Define intersectWith and unionWith as two complementary ways of
combining KnownBits. The names are chosen for consistency with
ConstantRange.
Deprecate commonBits as a synonym for intersectWith.
Differential Revision: https://reviews.llvm.org/D150443
Correct the legality of i32 mul_lohi on AArch64.
Previously, AArch64 incorrectly reported i32 mul_lohi as Legal.
This allowed BuildUDIV/SDIV to use them. A later DAGCombiner would
replace them with MULHS/MULHU because only the high half was used.
This conversion does not check the legality of MULHS/MULHU under
the assumption that LegalizeDAG can turn it back into MUL_LOHI later.
After they are converted to MULHS/MULHU, DAGCombine ran and saw that
these operations aren't supported but an i64 MUL is. So they get
converted to that plus a shift. Without this, LegalizeDAG would
convert back MUL_LOHI and isel would fail to find a pattern.
This patch teaches BuildUDIV/SDIV to create the wide mul and shift
so that we can report the correct operation legality on AArch64. It
also enables div by constant folding for more cases on VE.
I don't know if VE wants this div by constant optimization or not. If they
don't want it, they can use the isIntDivCheap hook to disable it.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D150333
In the SimplifyDemandedBits function, there is a fallthrough to the
default case in the case of ISD::ADD, ISD::MUL and ISD::SUB. This
leads to a call to computeKnownBits which is unnecessary as the
calls to SimplifyDemandedBits in the cases themselves handle the
calculation of the known bits. This information is discarded through
the Known2 variables.
By keeping this information around and calling
KnownBits::mul or KnownBits::computeForAddSub directly, the
unnecessary computation can be avoided. For now, the NSW bit is not
passed through to KnownBits as this is something that
computeKnownBits does not handle either. This requires updating
computeForAddCarry to handle the flag as well.
Differential Revision: https://reviews.llvm.org/D150110
Move the fold from X86 to generic expansion
(We also have several existing expansions that are missing freezes on repeated operands - I've added a TODO for now).
ISD::CondCode is a separate num space from opcodes. isOperationLegalOrCustom
should take an opcode.
Reviewed By: barannikov88
Differential Revision: https://reviews.llvm.org/D149528
This function gets called for vectors and ISD::SELECT_CC was never
intended to support vectors. Some updates were made to support
it when this function started getting used for vectors.
Overall, using separate ISD::SETCC and ISD::SELECT looks like an
improvement even for scalar.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D149481
This is stricter than the default "ieee", and should probably be the
default. This patch leaves the default alone. I can change this in a
future patch.
There are non-reversible transforms I would like to perform which are
legal under IEEE denormal handling, but illegal with flushing zero
behavior. Namely, conversions between llvm.is.fpclass and fcmp with
zeroes.
Under "ieee" handling, it is legal to translate between
llvm.is.fpclass(x, fcZero) and fcmp x, 0.
Under "preserve-sign" handling, it is legal to translate between
llvm.is.fpclass(x, fcSubnormal|fcZero) and fcmp x, 0.
I would like to compile and distribute some math library functions in
a mode where it's callable from code with and without denormals
enabled, which requires not changing the compares with denormals or
zeroes.
If an IEEE function transforms an llvm.is.fpclass call into an fcmp 0,
it is no longer possible to call the function from code with denormals
enabled, or write an optimization to move the function into a denormal
flushing mode. For the original function, if x was a denormal, the
class would evaluate to false. If the function compiled with denormal
handling was converted to or called from a preserve-sign function, the
fcmp now evaluates to true.
This could also be of use for strictfp handling, where code may be
changing the denormal mode.
Alternative name could be "unknown".
Replaces the old AMDGPU custom inlining logic with more conservative
logic which tries to permit inlining for callees with dynamic handling
and avoids inlining other mismatched modes.
Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source.
Differential Revision: https://reviews.llvm.org/D147264
Add some special cases for UADDO to recover codegen after D146786.
Reviewed By: reames, liaolucy
Differential Revision: https://reviews.llvm.org/D146789
ConstantSDNode provides some convenience functions like isZero,
getZExtValue, and isMinSignedValue that are named identically to those
provided by APInt, so we can "skip" getAPIntValue.
This code detected that the type returned from getShiftAmountTy was
too small to hold the constant shift amount. But it used the full
type size instead of scalar type size leading it to crash for
scalable vectors.
This code was necessary when getShiftAmountTy would always
return the target preferred shift amount type for scalars even when
the type was an illegal type larger than the target supported. For
vectors, getShiftAmountTy has always returned the vector type.
Fortunately, getShiftAmountTy was fixed a while ago to detect that
the target's preferred size for scalars is not large enough for the
type. So we can delete this code.
Switched to use getShiftAmountConstant to further simplify the code.
Fixs PR61561.
As noticed on D144789, when we have pairs of min/max nodes we often end up with multiple comparisons which we could reuse with commuted select ops, so check to see if a suitable SETCC already exists. This also allowed us to remove a similar X86 peephole.
There are other getSETCC cases where we could safely reuse other CondCodes as well - I've been trying to think of how we could reuse this logic in SelectionDAG but haven't found anything that always works well.
An alternative would be to have a TLI callback that returns a preferred CondCode from a list of options, I've noticed this helped fpclamptosat tests on some other targets (MVE + WebAssembly), but other tests suffered.
Differential Revision: https://reviews.llvm.org/D145065
If we have sext_inreg(vector_extract(x)) but the top bits are not used, DAG
will try to remove the sext_inreg, using vector_extract(x) directly. This can
lead to multiple uses of both sext_inreg(vector_extract(x)) and
vector_extract(x), leading to the generation of both umov and smov extracts.
This adds a target hook to prevent that under AArch64 where the sext_inreg can
be considered free if there are multiple uses of the sext and no uses of the
vector_extract. This helps fix a small regression from D144550.
Differential Revision: https://reviews.llvm.org/D144850
This is recommit of 2e416cdd52, fixed to be accepatble by GCC.
The original commit message is below.
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241
This reverts commit e7613c1d9b259bdf2b0b06b4169d9a10dd553406.
GCC issues an error:
In file included from /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:9:
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:66:22: error: explicit specialization of template<class E, class Enable> struct llvm::is_bitmask_enum outside its namespace must use a nested-name-specifier [-fpermissive]
66 | template <> struct is_bitmask_enum<Enum> : std::true_type {}; \
| ^~~~~~~~~~~~~~~~~~~~~
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:30:1: note: in expansion of macro LLVM_DECLARE_ENUM_AS_BITMASK
30 | LLVM_DECLARE_ENUM_AS_BITMASK(Flags2, V4);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is recommit of 2e416cdd52, reverted in 8555ab2fcd, because GCC
complains on extra qualification. The macro LLVM_DECLARE_ENUM_AS_BITMASK
does not specify llvm:: anymore, so the macro must occur in the namespace
llvm. Documentation updated accordingly. The original commit message is below.
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241
Currently, in TargetLowering, if the target does not support fminnum, we lower
to fminimum if neither operand could be a NaN. But this isn't quite correct
because fminnum and fminimum treat +/-0 differently; so, we need to prove that
one of the operands isn't a zero, or we don't have signed zeros.
Differential Revision: https://reviews.llvm.org/D143256
This removes a condition in the detection of AVG nodes, where we needn't be
checking the LHS of an add node as any const will be canonicalized to the RHS.