Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:
When we compare sNaN vs NUM:
ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
+0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.
So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.
Half-fix: #93033
This patch adds basic support for scalable vector types in load & store
instructions for AArch64 with GISel.
Only scalable vector types with a 128-bit base size are supported, e.g.
`<vscale x 4 x i32>`, `<vscale x 16 x i8>`.
This patch adapted some ideas from a similar abandoned patch
[https://github.com/llvm/llvm-project/pull/72976](https://github.com/llvm/llvm-project/pull/72976).
This combine matches the existing fold in InstCombine, i.e.
InstCombinerImpl::pushFreezeToPreventPoisonFromPropagating.
It tries to push freeze through an operand if the operand has only one
maybe-poison operand and all other operands are guaranteed non-poison,
and if the operation itself cannot generate poison (eg. add with nsw can
generate poison, even with non-poison operands).
This is beneficial because it can potentially enable other optimizations
to occur that would otherwise be blocked because of the freeze.
Two icmp/and combines forced computation of KnownBits on all operands
everytime. We can avoid computing KnownBits on the LHS by exploiting a
couple of properties:
- Constants are always on the RHS for those instructions. If we have no
KnownBits on the RHS, we can bail out early and avoid computing LHS
knownbits.
- For icmp uge/ult 0, we don't need to know the KBs of the LHS to infer
the result
This allows to save some KnownBits calls, which are very expensive,
without affecting codegen.
Fixes#82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
Combines G_SHUFFLE_VECTOR whose sources comes from G_CONCAT_VECTORS into
a single G_CONCAT_VECTORS instruction.
a = G_CONCAT_VECTORS x, y, undef, undef
b = G_CONCAT_VECTORS z, undef, undef, undef
c = G_SHUFFLE_VECTORS a, b, <0, 1, 4, undef>
===>
c = G_CONCAT_VECTORS x, y, z, undef`
The current APInt::multiplicativeInverse takes a modulus which can be
any value, but all in-tree callers use a power of two. Moreover, most
callers want to use two to the power of the width of an existing APInt,
which is awkward because 2^N is not representable as an N-bit APInt.
Add a new overload of multiplicativeInverse which implicitly uses
2^BitWidth as the modulus.
- When both operands are constant, the matcher runs into an infinite
loop as the commutation should be applied only when LHS is a constant
and RHS is not.
Reviewers: arsenm
Reviewed By: arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/87426
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.
This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.
Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
Perform the requested arithmetic and produce a carry output in addition
to the normal result.
Clang has them as builtins (__builtin_add_overflow_p). The middle end
has intrinsics for them (sadd_with_overflow).
AArch64: ADDS Add and set flags
On Neoverse V2, they run at half the throughput of basic arithmetic and
have a limited set of pipelines.
This combine transforms an unmerge where only the first element is used
into a truncate. That works OK for scalar but for vector needs to insert
a bitcast to integers, perform the truncate then bitcast back to
vectors. This generates more awkward code than using an Unmerge.
It is purely based on symmetry. Registers can be scalars, vectors, and
non-constants.
X < 5.0 || X > 5.0
->
X != 5.0
X < Y && X > Y
->
FCMP_FALSE
X < Y && X < Y
->
FCMP_TRUE
see InstCombinerImpl::foldLogicOfFCmps
…tructions into account
Hint instructions like G_ASSERT_ZEXT cann be viewed as a copy. Including
this fact into the combiner allows the match more patterns involving
such instructions.
Since we already know which register we want to extend, we don't have to
ask its defining MI about it
---------
Co-authored-by: Emil Tywoniak <Emil.Tywoniak@hightec-rt.com>
Instcombine canonicalizes selects to floating point and integer minmax.
This and the dag combiner canonicalize to floating point minmax. None of
them canonicalizes to integer minmax. On Neoverse V2 basic integer
arithmetic and integer minmax have the same costs.
The pre-index matcher just needs some small heuristics to make sure it
doesn't cause regressions. Apart from that it's a simple change, since
the only difference is an immediate operand of '1' vs '0' in the
instruction.