This is recommit of 2e416cdd52, fixed to be accepatble by GCC.
The original commit message is below.
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241
This reverts commit e7613c1d9b259bdf2b0b06b4169d9a10dd553406.
GCC issues an error:
In file included from /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:9:
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:66:22: error: explicit specialization of template<class E, class Enable> struct llvm::is_bitmask_enum outside its namespace must use a nested-name-specifier [-fpermissive]
66 | template <> struct is_bitmask_enum<Enum> : std::true_type {}; \
| ^~~~~~~~~~~~~~~~~~~~~
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:30:1: note: in expansion of macro LLVM_DECLARE_ENUM_AS_BITMASK
30 | LLVM_DECLARE_ENUM_AS_BITMASK(Flags2, V4);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is recommit of 2e416cdd52, reverted in 8555ab2fcd, because GCC
complains on extra qualification. The macro LLVM_DECLARE_ENUM_AS_BITMASK
does not specify llvm:: anymore, so the macro must occur in the namespace
llvm. Documentation updated accordingly. The original commit message is below.
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241
Currently, in TargetLowering, if the target does not support fminnum, we lower
to fminimum if neither operand could be a NaN. But this isn't quite correct
because fminnum and fminimum treat +/-0 differently; so, we need to prove that
one of the operands isn't a zero, or we don't have signed zeros.
Differential Revision: https://reviews.llvm.org/D143256
This removes a condition in the detection of AVG nodes, where we needn't be
checking the LHS of an add node as any const will be canonicalized to the RHS.
SimplifyMultipleUseDemandedBits shouldn't be creating general nodes on the fly, it should mainly just peek through them (although we do currently allow creation of new bitcasts and constant folding).
This is mostly a win - by avoiding new nodes we avoid a lot of hasOneUse limitations inside x86 shuffle combining - the main regressions I've noticed are where we've ended up with multiple insert_subvector(undef, x, 0) nodes, widening x to different vector widths - that should hopefully be improved when we remove the last of the vector widening from combineX86ShufflesRecursively for Issue #45319
is.fpclass x, fcZero is not equivalent to fcmp with 0 if
denormals are treated as 0. It would be equivalent to fcZero|fcSubnormal
which can be done separately; this is the minimal correctness fix.
The same optimization was not ported to the GlobalISel version.
Forking this off from D140850 -
https://alive2.llvm.org/ce/z/TgBeK_https://alive2.llvm.org/ce/z/STVD7d
We could almost justify doing this in IR, but consideration for
"minsize" requires that we only try it in codegen -- the
transform is not reversible.
In all other cases, avoiding multiply should be a win because a
mul is more expensive than simple/parallelizable compares. AArch
even has a trick to keep instruction count even for some types.
Differential Revision: https://reviews.llvm.org/D141086
If the divisor is 1, the magic algorithm does not return a correct
result and we end up using a select to pick the numerator for those
elements at the end.
Therefore we can use undef for that element of the earlier operations
when the divisor is 1. We sometimes get this through SimplifyDemandedVectorElts,
but not always. Definitely seems like we don't if the NPQ fixup is used.
Unfortunately, DAGCombiner is unable to fold srl X, <0, undef> to X so
I had to add flags to avoid emitting the srl unless one of the shift
amounts is non-zero.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D141022
Instead of doing the adjustment in 3 different places in the code
base, do it inside UnsignedDivideUsingMagic::get.
Differential Revision: https://reviews.llvm.org/D141014
The magic algorithm sets IsAdd indication for division by 1 that
the caller had to ignore.
I considered folding the ignore into UnsignedDivisionByConstantInfo,
but we only allow 1 for vectors of mixed visiors. And really what we
want to end up with is undef. Currently, we get to undef via
DemandedElts optimizations using the select instruction. We could
directly emit undef.
Differential Revision: https://reviews.llvm.org/D140940
The patch also adds expandVPCTLZ and expandVPCTTZ to expand vp.ctlz/cttz nodes
and the cost model of vp.ctlz/cttz.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D140370
If the dividend has leading zeros, we can use them to reduce the
size of the multiplier and avoid the fixup cases.
This patch is for scalars only, but we might be able to do this
for vectors in a follow up.
Differential Revision: https://reviews.llvm.org/D140750
We have a version of this transform in InstCombine, but surprisingly not in SDAG. Even more surprisingly, this benefits RISCV, but no other target. This was surprising enough I double checked my build configuration to make sure all targets were enabled; they appear to be.
Differential Revision: https://reviews.llvm.org/D140324
This reverts commit 3010f60381bcd828d1b409cfaa576328bcd05bbc.
This change introduced undefined behaviour (reported at
https://reviews.llvm.org/D138508#inline-1352840). Additionally, it
appears to be responsible for a mis-compilation on RISCV64 with the
vector extension (https://github.com/llvm/llvm-project/issues/59594).
The commit message indicates that this is meant to be ARM64 specific
though is a generic selection change.
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
This fixes LLVMMIRParser, LLVMGlobalISel, LLVMAsmPrinter, LLVMSelectionDAG.
Reland with a fixup to avoid converting APInts to int64_t which allowed for
overflows (UB) with sufficiently high/low multiplier values.
This allows DemandedBits to see the result of VSCALE will be at most
VScaleMax * some compile-time constant. This relies on the vscale_range()
attribute being present on the function, with a max set. (This is done by
default when clang is targeting AArch64+SVE).
Using this various redundant operations (zexts, sexts, ands, ors, etc)
can be eliminated.
Differential Revision: https://reviews.llvm.org/D138508
This allows DemandedBits to see the result of VSCALE will be at most
VScaleMax * some compile-time constant. This relies on the vscale_range()
attribute being present on the function, with a max set. (This is done by
default when clang is targeting AArch64+SVE).
Using this various redundant operations (zexts, sexts, ands, ors, etc)
can be eliminated.
Differential Revision: https://reviews.llvm.org/D138508
The patch also adds expandVPCTPOP in TargetLowering to expand VP_CTPOP nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139920
The patch also added function expandVPBITREVERSE to expand ISD::VP_BITREVERSE nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139697
This reverts commit 7883e5b061bdbbe8bee5f479ebe911db5045b7e9.
The original commit was reverted that it didn't update test files after D136263
landed. The recommit fixed those.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139509
The patch made VectorLegalizer expand ISD::VP_FSHL and ISD::VP_FSHR to
achieve the codegen.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D138379
(X & Pow2MaskC) == 0 --> (trunc X) >= 0
(X & Pow2MaskC) != 0 --> (trunc X) < 0
This was noted as a regression in the post-commit feedback for D112634
(where we canonicalized IR differently).
For x86, this saves a few instruction bytes. AArch64 seems neutral.
Differential Revision: https://reviews.llvm.org/D139363
This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines.
The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.
Differential Revision: https://reviews.llvm.org/D137190
This allows DemandedBits to see that the SVE count intrinsics (CNTB,
CNTH, CNTW, CNTD) sans multiplier will only ever produce small
positive integers. The maximum value you could get here is 256, which
is CNTB on a machine with a 2048bit vector size (the maximum for SVE).
Using this various redundant operations (zexts, sexts, ands, ors, etc)
can be eliminated.
Differential Revision: https://reviews.llvm.org/D138424
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
The patch also added function expandVPBSWAP to expand ISD::VP_BSWAP nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137928
We can reuse constants if we use SRL followed by AND and AND followed by SHL.
Similar was done to bitreverse previously.
Differential Revision: https://reviews.llvm.org/D138045
We have similar code to translate a demanded elements mask for a shuffle's operands in multiple places - this patch adds a helper function to VectorUtils and updates a number of locations to use it directly.
Differential Revision: https://reviews.llvm.org/D136832