When moving fcti results from float registers to normal registers
through memory, even though MPI was adjusted to account for endianness,
FIPtr was always adjusted for big-endian, which caused loads of wrong
half of a value in little-endian mode.
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
It is common to have ABI requirements for illegal types: For example,
two i64 argument parts that originally came from an fp128 argument may
have a different call ABI than ones that came from a i128 argument.
The current calling convention lowering does not provide access to this
information, so backends come up with various hacks to support it (like
additional pre-analysis cached in CCState, or bypassing the default
logic entirely).
This PR adds the original IR type to InputArg/OutputArg and passes it
down to CCAssignFn. It is not actually used anywhere yet, this just does
the mechanical changes to thread through the new argument.
The information whether a specific argument is vararg or fixed is
currently stored separately from all the other argument information in
ArgFlags. This means that it is not accessible from CCAssign, and
backends have developed all kinds of workarounds for how they can access
it after all.
Move this information to ArgFlags to make it directly available in all
relevant places.
I've opted to invert this and store it as IsVarArg, as I think that both
makes the meaning more obvious and provides for a better default (which
is IsVarArg=false).
We can expand the init intrinsic to create a descriptor for the nested
procedure by combining the entry point and TOC pointer from the global
descriptor with the nest argument. The normal indirect call sequence
then calls the nested procedure through the descriptor like all other
calls. Patch also implements support for a nest parameter by mapping it
to gpr 11.
The patch fixed a bug introduced patch [[PowePC] using MTVSRBMI
instruction instead of constant pool in
power10+](https://github.com/llvm/llvm-project/pull/144084#top).
The issue arose because the layout of vector register elements differs
between little-endian and big-endian modes — specifically, the elements
appear in reverse order. This led to incorrect behavior when loading
constants using MTVSRBMI in big-endian configurations.
Compare sinking is selectable based on the result of
hasMultipleConditionRegisters. This function is too coarse grained by
not taking into account the differences between scalar and vector
compares. This PR extends the interface to take an EVT to allow finer
control.
The new interface is used by AArch64 to disable sinking of scalable
vector compares, but with isProfitableToSinkOperands updated to maintain
the cases that are specifically tested.
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
When arbitrary sized (non-simple type, or non-power of two types)
integers are passed on the stack, these integers are not handled when
lowering formal arguments on AIX as we always assume we will encounter
simple type integers.
However, it is possible for frontends to generate arbitrary sized
immediate values in IR. Specifically in rustc, it will generate an
integer value in LLVM IR for small structures that are less than a
pointer size, which is done for optimization purposes for the Rust ABI.
For example, if a Rust structure of three characters is passed into
function on the stack,
```
struct my_struct {
field1: u8,
field2: u8,
field3: u8,
}
```
This will generate an `i24` type in LLVM IR.
Currently, it is not obvious for the backend to distinguish an integer
versus something that wasn't an integer to begin with (such as a
struct), and the latter case would not have an extend on the parameter.
Thus, this PR allows us to perform a truncation and extend on integers,
both non-simple and simple types.
Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So
that we can use EVT::getVectorVT to generate EVT type in
getOptimalMemOpType.
Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).
This PR takes the work previously done by @pawan-nirpal-031 on X86 in
#106370, and makes it available in common code. This should enable all
targets to use `__builtin_canonicalize` for all `f(16|32|64|128)` data
types.
Canonicalization is implemented here as multiplication by `1.0`, as
suggested in [the
docs](https://llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic).
The instruction MTVSRBMI set 0x00(or 0xFF) to each byte of VSR based on
the bits mask. Using the instruction instead of constant pool can reduce
the asm code size and instructions in power10.
/data/llvm-project/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:5769:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair<unsigned int, llvm::SDValue> const' [-Werror,-Wrange-loop-construct]
for (const auto [Reg, N] : RegsToPass)
^
/data/llvm-project/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:5769:8: note: use reference type 'std::pair<unsigned int, llvm::SDValue> const &' to prevent copying
for (const auto [Reg, N] : RegsToPass)
^~~~~~~~~~~~~~~~~~~~~
&
/data/llvm-project/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:6193:19: error: loop variable '[Reg, N]' creates a
copy from type 'std::pair<unsigned int, llvm::SDValue> const' [-Werror,-Wrange-loop-construct]
for (const auto [Reg, N] : RegsToPass) {
^
/data/llvm-project/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:6193:8: note: use reference type 'std::pair<unsigned int, llvm::SDValue> const &' to prevent copying
for (const auto [Reg, N] : RegsToPass) {
^~~~~~~~~~~~~~~~~~~~~
&
/data/llvm-project/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:6806:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair<unsigned int, llvm::SDValue> const' [-Werror,-Wrange-loop-construct]
for (const auto [Reg, N] : RegsToPass) {
^
/data/llvm-project/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:6806:8: note: use reference type 'std::pair<unsigned int, llvm::SDValue> const &' to prevent copying
for (const auto [Reg, N] : RegsToPass) {
^~~~~~~~~~~~~~~~~~~~~
&
3 errors generated.
Currently, the query assumes that a single undef byte implies the rest of
the `EltSize - 1` bytes are undefs, but that's not always true.
e.g. isSplatShuffleMask(
<0,1,2,3,4,5,6,7,undef,undef,undef,undef,0,1,2,3>, 8) should return
false.
---------
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
When lowering call arguments to stack, specify a stack MPI, as well as
the stack alignment, instead of using the defaults (which would be an
unknown location with ABI alignment).
I believe the asm diffs are just changes in scheduling.
In PowerPC, the AtomicCmpXchgInst is lowered to
ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS. However, this node does not handle
the weak attribute of AtomicCmpXchgInst. As a result, when compiling C++
atomic_compare_exchange_weak_explicit, the generated assembly includes a
"reservation lost" loop — i.e., it branches back and retries if the
stwcx. (store-conditional) fails. This differs from GCC’s codegen, which
does not include that loop for weak compare-exchange.
Since PowerPC uses LL/SC-style atomic instructions, the patch enables
AtomicExpandImpl::expandAtomicCmpXchg for PowerPC. With this, the weak
attribute is properly respected, and the "reservation lost" loop is
removed for weak operations.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
For pwr9, xxspltib is a byte splat with a range -128 to 127 - it can be
used with a following vector extend sign to make splats of i16, i32, or
i64 element size. For pwr8, vspltisw with a following vector extend sign
can be used to make splats of i64 elements in the range -16 to 15.
Add check for P8 to make sure the 64-bit vector ops are there.
For pwr9, xxspltib is a byte splat with a range -128 to 127 - it can be
used with a following vector extend sign to make splats of i16, i32, or
i64 element size. For pwr8, vspltisw with a following vector extend sign
can be used to make splats of i64 elements in the range -16 to 15.
This commit adds support for loading and storing v2048i1 DMR pairs and
introduces Dense Math Facility cryptography instructions: DMSHA2HASH,
DMSHA3HASH, and DMXXSHAPAD, along with their corresponding intrinsics
and tests.
Vector shift word or double requires a shift amount vector of 31 or 63
which is too big for splat immediate and requires a multi-instruction
sequence. However the PPC instructions only use 5 or 6 bits of the shift
amount vector elements so an all ones mask, which we can generate
efficiently, works.
In PowerPC, if a borrow occurs during a subtraction, the carry bit is
zero (unset). The carry bit is set if no borrow occurs.
For ISD::USUBO_CARRY, the nodes produce two results: the normal result
of the addition or subtraction, and a boolean value that is 1 if and
only if there is an outgoing carry or borrow.
Therefore, we need to convert a 1 (which indicates a borrow in
ISD::USUBO_CARRY) to 0 to match PowerPC's definition of borrow.
Similarly, we need to convert a 0 (no borrow in ISD::USUBO_CARRY) to 1
for PowerPC.
To perform this conversion, we use XOR 1 instead of XOR
DAG.getAllOnesConstant(DL, CarryOp.getValueType()).
`
Mutating a node after it has been created isn't a good idea. After
e17f07c4debbe76f5ebcdeeda619e7438700e2ad, we have a version of setStore
that can create a truncating indexed store. Use that instead of
MorphNodeTo+setTruncatingStore in PowerPC.
Unfortunately, if we return the newly created node, DAGCombiner will
visit the node and change the constant. To prevent this, we use
DCI.CombineTo and avoid adding the new node to the worklist.
ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated,
using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO,
UADDO_CARRY, USUBO, USUBO_CARRY in the patch.
Use getSignedConstant() in a few more places, based on a search of
`\bgetConstant(-`. Most of these were fine as-is (e.g. because they work
on 64-bits), but I think it's better to use getSignedConstant()
consistently for negative numbers.
ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated,
using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO,
UADDO_CARRY, USUBO, USUBO_CARRY in the patch.
SDNode::use_iterator now returns an SDUse& when dereferenced.
SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses
work on use_iterator. SDNode::user_begin/user_end/users work on
user_iterator.
We can now write range based for loops using SDUse& and SDNode::uses().
I've converted many of these in this patch. I didn't update loops that
have additional variables updated in their for statement.
Some loops use SDNode::use_iterator::getOperandNo() which also prevents
using range based for loops. I plan to move this into SDUse in a follow
up patch.
Most of these are just places that want the first user and aren't
iterating over the whole list.
While there I changed some use_size() == 1 to hasOneUse() which
is more efficient.
This is part of an effort to rename use_iterator to user_iterator
and provide a use_iterator that dereferences to SDUse&. This patch
helps reduce the diff on later patches.
This function is most often used in range based loops or algorithms
where the iterator is implicitly dereferenced. The dereference returns
an SDNode * of the user rather than SDUse * so users() is a better name.
I've long beeen annoyed that we can't write a range based loop over
SDUse when we need getOperandNo. I plan to rename use_iterator to
user_iterator and add a use_iterator that returns SDUse& on dereference.
This will make it more like IR.