Generate code using the VECTOR ADD COMPUTE CARRY and
VECTOR SUBTRACT COMPUTE BORROW INDICATION instructions
to implement open-coded IR with those semantics.
Handles integer vector types as well as i128.
Fixes: https://github.com/llvm/llvm-project/issues/129608
Generate more efficient code for zero or sign extensions where
the source is a subvector generated via SHUFFLE_VECTOR.
Specifically, recognize patterns corresponding to (series of)
VECTOR UNPACK instructions, or the VECTOR SIGN EXTEND TO
DOUBLEWORD instruction.
As a special case, also handle zero or sign extensions of a
vector element to i128.
Fixes: https://github.com/llvm/llvm-project/issues/129576
Fixes: https://github.com/llvm/llvm-project/issues/129899
Detect (non-intrinsic) IR patterns corresponding to the semantics
of the various widening and high-word multiplication instructions.
Specifically, this is done by:
- Recognizing even/odd widening multiplication patterns in DAGCombine
- Recognizing widening multiply-and-add on top during ISel
- Implementing the standard MULHS/MUHLU IR opcodes
- Detecting high-word multiply-and-add (which common code does not)
Depending on architecture level, this can support all integer
vector types as well as the scalar i128 type.
Fixes: https://github.com/llvm/llvm-project/issues/129705
Generate efficient code using the condition code set by the
VECTOR (FP) COMPARE family of instructions to implement
vector comparison reductions, e.g. as resulting from
__builtin_reduce_and/or of some vector comparsion.
Fixes: https://github.com/llvm/llvm-project/issues/129434
It has found to be quite a slowdown to traverse the users of a
function from each call site when it is called many (~70k)
times. This patch fixes this for now as long as this verification
is disabled by default, but there is still a need to eventually
cache the results to avoid recomputation.
Fixes#130541
This is straightforward as we already had all the necessary
instructions, they simply were not wired up.
Also allows implementing the vec_round intrinsic via the
standard llvm.roundeven IR instead of a platform intrinsic now.
We can only optimize a uaddo_carry via specialized instruction
if the carry was produced by another uaddo(_carry) instruction;
there is already a check for that.
However, i128 uaddo(_carry) use a completely different mechanism;
they indicate carry in a vector register instead of the CC flag.
Thus, we must also check that we don't mix those two - that check
has been missing.
Fixes: https://github.com/llvm/llvm-project/issues/124001
This patch adds support for the next-generation arch15
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Detection of arch15 as host processor.
- Assembler/disassembler support for new instructions.
- Exploitation of new instructions for code generation.
- New vector (signed|unsigned|bool) __int128 data types.
- New LLVM intrinsics for certain new instructions.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining __VEC__ == 10305.
Note: No currently available Z system supports the arch15
architecture. Once new systems become available, the
official system name will be added as supported -march name.
SDNode::use_iterator now returns an SDUse& when dereferenced.
SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses
work on use_iterator. SDNode::user_begin/user_end/users work on
user_iterator.
We can now write range based for loops using SDUse& and SDNode::uses().
I've converted many of these in this patch. I didn't update loops that
have additional variables updated in their for statement.
Some loops use SDNode::use_iterator::getOperandNo() which also prevents
using range based for loops. I plan to move this into SDUse in a follow
up patch.
Most of these are just places that want the first user and aren't
iterating over the whole list.
While there I changed some use_size() == 1 to hasOneUse() which
is more efficient.
This is part of an effort to rename use_iterator to user_iterator
and provide a use_iterator that dereferences to SDUse&. This patch
helps reduce the diff on later patches.
This function is most often used in range based loops or algorithms
where the iterator is implicitly dereferenced. The dereference returns
an SDNode * of the user rather than SDUse * so users() is a better name.
I've long beeen annoyed that we can't write a range based loop over
SDUse when we need getOperandNo. I plan to rename use_iterator to
user_iterator and add a use_iterator that returns SDUse& on dereference.
This will make it more like IR.
I want to use this function for GISel too so Type * is a better common
interface. All of the callers already convert EVT to Type * as needed
by calling lowering anyway.
This will avoid assertion failures once we disable implicit truncation
in getConstant().
Inside adjustSubwordCmp() I ended up suppressing the issue with an
explicit cast, because this code deals with a mix of unsigned and signed
immediates.
A test case emerged with an i32 truncating store of an i64 constant
operand, where the i64 constant did not fit in 32 bits, which caused
FindReplicatedImm() to crash.
Make sure to truncate the APInt in these cases.
The previous behavior could be harmful in some edge cases, such as
emitting a call to `fma()` in the `fma()` implementation itself.
Do this by just being more accurate in `isFMAFasterThanFMulAndFAdd()`.
This was already done for PowerPC; this commit just extends that to Arm,
z/Arch, and x86. MIPS and SPARC already got it right, but I added tests
for them too, for good measure.
Note: I don't have commit access.
This patch fixes:
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp:9858:18: error:
using the result of an assignment as a condition without parentheses
[-Werror,-Wparentheses]
A (csmith) test case appeared where combineExtract() crashed when the
input vector was a bitcast into a vector of i1:s. Fix this by adding a check
with canTreatAsByteVector() before the call.
For the purpose of verifying proper arguments extensions per the target's ABI,
introduce the NoExt attribute that may be used by a target when neither sign-
or zeroextension is required (e.g. with a struct in register). The purpose of
doing so is to be able to verify that there is always one of these attributes
present and by this detecting cases where sign/zero extension is actually
missing.
As a first step, this patch has the verification step done for the SystemZ
backend only, but left off by default until all known issues have been
addressed.
Other targets/front-ends can now also add NoExt attribute where needed and do
this check in the backend.
A test case showed up where the new vector type is v24i16, which is not a simple
MVT. In order to get an extended value type for cases like this, EVT::getVectorVT()
needs to be called instead of MVT::getVectorVT(), otherwise the following call
to getVectorElementType() in combineExtract() will fail.
Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore we can end up
miscompiling.
This patch also refactors the logic out of isInTailCallPosition() into the callers.
As a result memcpy and memmove are also modified to do the same thing
for consistency.
rdar://131419786
The combineSIGN_EXTEND_INREG routine was using
DAG.getConstant(-1, DL, VT), which does not result in
the expected value when VT has more than 64 bits.
Fix this by using DAG.getAllOnesConstant(DL, VT) instead.
Also add test cases for v1i128 comparisons (which triggers
the bug).
Summary:
These Libcalls represent which functions are available to the backend.
If a runtime call is not available, the target sets the the name to
`nullptr`. Currently, this logic is spread around the various targets.
This patch pulls all of the locations that disable libcalls into the
intializer. This patch is effectively NFC.
The motivation behind this patch is that currently the LTO handling uses
the list of all runtime calls to determine which functions cannot be
internalized and must be extracted from static libraries. We do not want
this to happen for libcalls that are not emitted by the backend. A
follow-up patch will move out this logic so the LTO pass can know which
rtlib calls are actually used by the backend.
Change definition of expandBitCastI128ToF128 and expandBitCastF128ToI128
to allow for simplified use in atomic load/store.
Update logic to split 128-bit loads and stores in DAGCombine to also
handle the f128 case where appropriate. This fixes the regressions
introduced by recent atomic load/store patches.
This is the mirror to the recent atomic load change. The same
bitcast-back-to-integer case is a small code quality regression for the
same reason. This would disappear with a bitcastable legal 128-bit type.
shouldCastAtomicLoadInIR is a hack that should be removed. Simple
bitcasting of operations should be in the domain of ordinary type
legalization and does not need to be done in the IR.
This introduces a code quality regression due to the hack currently used
to avoid using 128-bit values in the case where the floating point value
is ultimately used as an integer. This would be avoidable if there were
always a legal 128-bit type (like v2i64). This is a pretty niche
situation so I assume it's not important.
I implemented about 85% of the work necessary to make v2i64 legal, but
it was taking too long and I lack the necessary familiarity with systemz
to complete it. I've pushed it here for someone to pick up:
https://github.com/arsenm/llvm-project/pull/new/systemz-legal-v2i64
Depends #90861
Fixes#82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.