Support passing and returning values of single-element vector
types (i.e. <1 x i128> and <1 x fp128>).
Now that i128 is a legal type, supporting these types can be
done simply by providing a getRegisterTypeForCallingConv
implementation that handles them.
Fixes https://github.com/llvm/llvm-project/issues/61291
On processors supporting vector registers and SIMD instructions, enable
i128 as legal type in VRs. This allows many operations to be implemented
via native instructions directly in VRs (including add, subtract,
logical operations and shifts). For a few other operations (e.g.
multiply and divide, as well as atomic operations), we need to move the
i128 value back to a GPR pair to use the corresponding instruction
there. Overall, this is still beneficial.
The patch includes the following LLVM changes:
- Enable i128 as legal type
- Set up legal operations (in SystemZInstrVector.td)
- Custom expansion for i128 add/subtract with carry
- Custom expansion for i128 comparisons and selects
- Support for moving i128 to/from GPR pairs when required
- Handle 128-bit integer constant values everywhere
- Use i128 as intrinsic operand type where appropriate
- Updated and new test cases
In addition, clang builtins are updated to reflect the intrinsic operand
type changes (which also improves compatibility with GCC).
Let the AtomicExpand pass do more of the job of expanding
AtomicRMWInst:s in order to simplify the handling in the backend.
The only cases that the backend needs to handle itself are those of
subword size (8/16 bits) and those directly corresponding to a target
instruction.
- Clang FE now has MaxAtomicPromoteWidth / MaxAtomicInlineWidth set to 128, and now produces IR
instead of calls to __atomic instrinsics for 16 bytes as well.
- Atomic __int128 (and long double) variables are now aligned to 16 bytes by default (like gcc 14).
- AtomicExpand pass now expands 16 byte operations as well.
- tests for __atomic builtins for all integer widths, and __atomic_is_lock_free with friends.
- TODO: AtomicExpand pass handles with this patch expansion of i128 atomicrmw:s. As a next step
smaller integer types should also be possible to handle this way instead of by the backend.
Clang currently implements a set of vector rotate builtins
(__builtin_s390_verll*) in terms of platform-specific LLVM
intrinsics. To simplify the IR (and allow for common code
optimizations if applicable), this patch removes those LLVM
intrinsics and implements the builtins in terms of the
platform-independent funnel shift intrinsics instead.
Also, fix the prototype of the __builtin_s390_verll*
builtins for full compatibility with GCC.
GCC supports building individual functions with backchain using the
__attribute__((target("backchain"))) syntax, and Clang should too.
Clang translates this into the "target-features"="+backchain" attribute,
and the -mbackchain command-line option into the "backchain" attribute.
The backend currently checks only the latter; furthermore, the backchain
target feature is not defined.
Handle backchain like soft-float. Define a target feature, convert
function attribute into it in getSubtargetImpl(), and check for target
feature instead of function attribute everywhere. Add an end-to-end test
to the Clang testsuite.
The upcoming OpenMP support for SystemZ requires handling of IR insns
like `atomicrmw fadd`. Normally atomic float operations are expanded by
Clang and such insns do not occur, but OpenMP generates them directly.
Other architectures handle this using the AtomicExpand pass, which
SystemZ did not need so far. Enable it.
Currently AtomicExpand treats atomic load and stores of floats
pessimistically: it casts them to integers, which SystemZ does not need,
since the floating point load and store instructions are already atomic.
However, the way Clang currently expands them is pessimistic as well, so
this change does not make things worse. Optimizing operations on atomic
floats can be a separate change in the future.
This change does not create any differences the Linux kernel build.
When the code is built with -mbackchain, it is possible to retrieve the
caller's frame and return addresses. GCC already can do this, add this
support to Clang as well. Use RISCVTargetLowering and GCC's
s390_return_addr_rtx() as inspiration. Add tests based on what GCC is
emitting.
This PR moves some calculation out of `LowerCall` and into
`SystemZXPLINKFrameLowering::processFunctionBeforeFrameFinalized`.
We need to make this change because LowerCall isn't invoked for
functions that don't have function calls, and it is required for some
tooling to work correctly. A function that does not make any calls is
required to allocate 32 bytes for the parameter area required by the
ABI. However, we allocate 64 bytes because this additional space is
utilized by certain tools, like the debugger.
Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>
On z/OS, many library functions have a non-standard name. This change
initializes the table of runtime function which results from lowering
intrinsics to library calls.
Given a list of constraints for InlineAsm (ex. "imr") I'm looking to
modify the order in which they are chosen. Before doing so, I noticed a
fair
amount of logic is duplicated between SelectionDAGISel and GlobalISel
for this.
That is because SelectionDAGISel is also trying to lower immediates
during selection. If we detangle these concerns into:
1. choose the preferred constraint
2. attempt to lower that constraint
Then we can slide down the list of constraints until we find one that
can be lowered. That allows the implementation to be shared between
instruction selection frameworks.
This makes it so that later I might only need to adjust the priority of
constraints in one place, and have both selectors behave the same.
Irritatingly, atomic_store had operands in the opposite order from
regular store. This made it difficult to share patterns between
regular and atomic stores.
There was a previous incomplete attempt to move atomic_store into the
regular StoreSDNode which would be better.
I think it was a mistake for all atomicrmw to swap the operand order,
so maybe it's better to take this one step further.
https://reviews.llvm.org/D123143
This patch adds support for the ADA (associated data area), doing the following:
-Creates the ADA table to handle displacements
-Emits the ADA section in the SystemZAsmPrinter
-Lowers the ADA_ENTRY node into the appropriate load instruction
Differential Revision: https://reviews.llvm.org/D153788
- Creates the ADA table to handle displacements
- Emits the ADA section in the SystemZAsmPrinter
- Lowers the ADA_ENTRY node into the appropriate load instruction
Differential Revision: https://reviews.llvm.org/D153788
The Length/4 of Params field in the PPA1 ought to be the length of the parameters for the current function. Currently we are storing the length of the parameter area in the current function's stack frame, which represents the length of the params of the longest callee in the current function.
Differential Revision: https://reviews.llvm.org/D152920
Reviewed By: uweigand
The Length/4 of Params field in the PPA1 ought to be the length of the parameters for the current function. Currently we are storing the length of the parameter area in the current function's stack frame, which represents the length of the params of the longest callee in the current function.
Differential revision: https://reviews.llvm.org/D119049
Reviewed By: uweigand
The term "next stack offset" is misleading because the next argument is
not necessarily allocated at this offset due to alignment constrains.
It also does not make much sense when allocating arguments at negative
offsets (introduced in a follow-up patch), because the returned offset
would be past the end of the next argument.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D149566
Define intersectWith and unionWith as two complementary ways of
combining KnownBits. The names are chosen for consistency with
ConstantRange.
Deprecate commonBits as a synonym for intersectWith.
Differential Revision: https://reviews.llvm.org/D150443
Previously, LegalizeVectorOps used the result VT while LegalizeDAG
used the operand VT. This patch makes them both use the operand VT.
This also makes it consistent with how the default cost model works.
I've hacked the AArch64 cost model to maintain old behavior for some
f16 vectors.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D149572
In D79537, `nomerge` was made to only apply to non-tail calls. This fixes it by also applying it to tail calls.
For ARM, I only made the new MI to inherit the flag under `TCRETURNdi` and `TCRETURNri`, because that's the place tail calls got replaced. Not sure if there's any other place needed.
Fixes#61545.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D146749
The new test case showed that the NoPHIs flag needs to be cleared.
Original commit message:
[SystemZ] Bugfix in expansion of memmem operations.
Since NC, OC, and XC clobber CC, the EXRL_Pseudo targeting these must also be
marked to do so.
Original patch by uweigand.
Reviewed by: uweigand
Differential Revision: https://reviews.llvm.org/D150251
Fixes: https://github.com/llvm/llvm-project/issues/62572
Sorry - mir test fails with expensive checks on build bot.
Seems to relate to the fact that there are no PHIs in the .mir input, but after
they are created the verifyer reports "Found PHI instruction with NoPHIs property
set".
This reverts commit 00454a17f361d677d5423905c888daca1a80661a.
Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source.
Differential Revision: https://reviews.llvm.org/D147264
Support bitcasting between int/fp/vector values and 'r'/'f'/'v' inline
assembly operands. This is intended to match GCCs beahvior.
Reviewed By: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D146059
The SystemZ backend will try to reuse an existing subtraction of two values
whenever they are to be compared for equality. This depends on the SystemZ
subtraction instruction setting the condition code, which can also signal
overflow.
A later pass will remove the compare and reuse the CC from the subtraction
directly. However, if that subtraction has the NSW flag set it will not
include the overflow bit in the updated CC user. That was a bug which can
lead to wrong results, as shown by a csmith program.
Fixes: https://github.com/llvm/llvm-project/issues/61268
Reviewed By: nikic, uweigand
Differential Revision: https://reviews.llvm.org/D145811
Returning true from this method for PCREL_WRAPPER and PCREL_OFFSET avoids
problems when a PCREL_OFFSET node ends up with a freeze operand, which is not
handled or expected by the backend.
Fixes#60107
Reviewed By: uweigand, RKSimon
Differential Revision: https://reviews.llvm.org/D142971
isVectorConstantLegal calls findFirstSet and findLastSet, but we don't
rely on their ability to return std::numeric_limits<T>::max() on input
0.
This patch replaces those calls with calls to llvm::countl_zero and
llvm::countr_zero.
Due to an off-by-one error in the original code, the value of Upper
could change at bit N, where N is the index of the highest set bit in
SplatBitsZ, but the difference doesn't matter at the end. Without
this patch, Upper could have bit N set. With this patch, Upper never
has bit N set. Either way, both calls to tryValue have this bit set
because the argument is ORed with SplatBitsZ.
Add support for _FLT_ROUNDS_ in SystemZ.
Patch by Tulio Magno Quites Machado Filho.
Reviewed By: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D140988
We have multiple targets which have defined custom instructions and sdag nodes to represent a compiler memory barrier. This patch consolidates the sdag node definition into common code.
This is a companion to D92842, but a bit different in focus. This change consolidates the existing sdag node definitions; that patch skipped defining a sdag node by instead going straight to a target node. That patch is also not NFC - as being so is quite hard for commoning up the instruction definitions.
I started with two backends to ensure the new common code was reusable while not having a massive diff. Once this lands, I'll submit a series of NFCs for backends where the changes are obvious, or reviews if more discussion is needed.
Differential Revision: https://reviews.llvm.org/D141317
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).
Per reviewers' comment, some useless makeArrayRef have been removed in the process.
This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.
Differential Revision: https://reviews.llvm.org/D140955
The most common case for string attributes parses them as integers. We
don't have a convenient way to do this, and as a result we have
inconsistent missing attribute and invalid attribute handling
scattered around. We also have inconsistent radix usage to
getAsInteger; some places use the default 0 and others use base 10.
Update a few of the uses, but there are quite a lot of these.
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit. This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.