12 bit is not enough for PPC's target specific flags. If 8 bit for the
bitmask flags, 4 bit for the direct mask, PPC can total have 16 direct
mask and 8 bitmask. Not enough for PPC, see this issue in
https://github.com/llvm/llvm-project/pull/66316
Redesign how PPC target set the target specific flags. With this patch,
all ppc target flags are direct flags. No bitmask flag in PPC anymore.
This patch aligns with some targets like X86 which also has many target
specific flags.
The patch also fixes a bug related to flag `MO_TLSGDM_FLAG` and `MO_LO`.
They are the same value and the test case changes in this PR shows the
bug.
It seems TypeSize is currently broken in the sense that:
TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)
without failing its assert that explicitly tests for this case:
assert(LHS.Scalable == RHS.Scalable && ...);
The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.
This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.
The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
This custom combine currently converts `and(anyext(x),c)` into
`anyext(and(x,c))`. This is not correct, because the original expression
guaranteed that the high bits are zero, while the new one sets them to
undef.
Emit `zext(and(x,c))` instead.
Fixes https://github.com/llvm/llvm-project/issues/68783.
PowerPC backend generate calls to libc function calls
for soft-float, regardless of the -nostdlib /-ffreestanding flag.
fma is not a function provided by compiler-rt builtins and
thus should not be generated here.
PR : [[ https://github.com/llvm/llvm-project/issues/55230 | #55230 ]]
Below is patch given by @nemanjai
Reviewed By: jhibbits
Differential Revision: https://reviews.llvm.org/D156344
Given a list of constraints for InlineAsm (ex. "imr") I'm looking to
modify the order in which they are chosen. Before doing so, I noticed a
fair
amount of logic is duplicated between SelectionDAGISel and GlobalISel
for this.
That is because SelectionDAGISel is also trying to lower immediates
during selection. If we detangle these concerns into:
1. choose the preferred constraint
2. attempt to lower that constraint
Then we can slide down the list of constraints until we find one that
can be lowered. That allows the implementation to be shared between
instruction selection frameworks.
This makes it so that later I might only need to adjust the priority of
constraints in one place, and have both selectors behave the same.
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
This patch reverts commit 7614ba0a5db8 to optimize VPERM when one of its
vector operands is XXSWAPD, similar to XXPERM. It also reorganizes the
little-endian swap code on LE, swapping the vector operand after
adjusting the mask operand. This ensures that the vector operand is
swapped at the correct point in the code, resulting in a valid
constant pool for the mask operand.
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D149083
I would like to steal one of these bits to denote whether a kind may be
spilled by the register allocator or not, but I'm afraid to touch of any
this code using bitwise operands.
Make flags a first class type using bitfields, rather than launder data
around via `unsigned`.
This patch utilizes the -maix-small-local-exec-tls option added in
D155544 to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided.
The patch either produces an addi/la with a displacement off of r13
(the thread pointer) when the address is calculated, or it produces an
addi/la followed by a load/store when the address is calculated and
used for further accesses.
This patch also optimizes this sequence a bit more where we can remove
the addi/la when the load/store offset is 0. A follow up patch will
be posted to account for when the load/store offset is non-zero, and
currently in these situations we keep the addi/la that precedes the
load/store.
Furthermore, this access sequence is only performed for TLS variables
that are less than ~32KB in size.
Differential Revision: https://reviews.llvm.org/D155600
On PPC there are instructions to store element from vector(e.g.
stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail
constant in memset and constant splat array initialization.
This patch tries to explore these opportunities.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D138883
Irritatingly, atomic_store had operands in the opposite order from
regular store. This made it difficult to share patterns between
regular and atomic stores.
There was a previous incomplete attempt to move atomic_store into the
regular StoreSDNode which would be better.
I think it was a mistake for all atomicrmw to swap the operand order,
so maybe it's better to take this one step further.
https://reviews.llvm.org/D123143
Should add some minor type safety to the use of this information, since
there's quite a bit of metadata being laundered through an `unsigned`.
I'm looking to potentially add more bitfields to that `unsigned`, but I
find InlineAsm's big ol' bag of enum values and usage of `unsigned`
confusing, type-unsafe, and un-ergonomic. These can probably be better
abstracted.
I think the lack of static_cast outside of InlineAsm indicates the prior
code smell fixed here.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D159242
On AIX, a libatomic supporting inline quadword atomic operations has been released, so that compatibility is not an issue now, we can enable quadword atomics by default.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D151312
Reorder setMaxAtomicSizeInBitsSupported() in numerical and more logical order.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D155379
Future cpu instructions dmxxinstdmr512 and dmxxextfdmr512 insert and extract
quad vectors from the new wide accumulator(wacc) register class.
The introduction of these new instructions renders the p10 instructions
xxmtacc and xxmfacc obsolete since the new wacc register class is a better
choice for handing quad vector operations. This patch ensures that, for
future cpu, instructions dmxxinstdmr512 and dmxxextfdmr512 are generated
by custom lowering the intrinsics for xxm[t|f]acc to produce no instructions.
Reviewed By: amyk, lei
Differential Revision: https://reviews.llvm.org/D153034
The commit originally caused a bootstrap failure on the big endian
PPC bot as the combine was interfering with the legalizer when
applied on illegal types. This update restricts the combine to
the only types for which it is actually needed. Tested on PPC BE
bootstrap locally.
The SDAG will sometimes insert an extend between
the shift and an and (immediate) even though the
immediate is narrower than the narrow size.
This does not allow us to produce a rotate
instruction (such as rlwinm).
This patch just adds a combine to move the extend
onto the and.
Differential revision: https://reviews.llvm.org/D152911
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
This patch adds support for the TLS local-exec access model on AIX to allow
for the ability to generate the 32-bit (specifically, non-optimized) code sequence.
This work is a follow up of D149722.
The particular sequence that is generated for this sequence is as follows:
```
.tc var[TC],var[TL]@le. // variable offset, with the le relocation specifier
bla .__get_tpointer() // get the thread pointer, modifies r3
lwz reg1, var[TC](2) // load the variable offset
add reg2, r3, reg1 // add the variable offset to the retrieved thread pointer
```
Differential Revision: https://reviews.llvm.org/D152669
This patch adds support for the TLS local-exec access model on AIX to allow
for the ability to generate the 64-bit (specifically, non-optimized) code sequence.
For this patch in particular, the sequence that is generated involves a load of the
variable offset, followed by an add of the loaded variable offset to r13 (which is
thread pointer, respectively). This code sequence looks like the following:
```
ld reg1,var[TC](2)
add reg2, reg1, r13 // r13 contains the thread pointer
```
The TOC (.tc pseudo-op) entries generated in the assembly files are also
changed where we add the @le relocation for the variable offset.
Differential Revision: https://reviews.llvm.org/D149722
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.
Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
The build failure should be fixed by de681d53. Follow-up refactor will
be done in future patches.
This reverts commit e7c5ced0b9f0551ea17e1d2b48be86f03a772c59.
Commit 8064caf83fb166b709bfe0e7641c5181341cb064 added a call
to a function that performs this combine without checking whether
the target supports FPCVT. This caused asserts to trip on BE bots
as the default target does not have this feature.
On PowerPC VSX targets, fp-to-int will be transformed into xscv with
mfvsr. When the result is to be stored, mfvsr can be replaced by a
direct store.
This change simplifies the optimization by using existing fp-to-int
code, which helps CSE and handling strictfp cases.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D141473
Previously, `CCState::AllocateStack` always allocated stack space by increasing
offsets. For targets with stack growing up (away from zero) it is more
convenient to allocate arguments by decreasing offsets, so that the first
argument is at the top of the stack. This is important when calling a function
with variable number of arguments: the callee does not know the size of the
stack, but must be able to access "fixed" arguments. For that to work, the
"fixed" arguments should have fixed offsets relative to the stack top, i.e. the
variadic arguments area should be at the stack bottom (at lowest addresses).
The in-tree target with stack growing up is AMDGPU, but it allocates
arguments by increasing addresses. It does not support variadic arguments.
A drive-by change is to promote stack size/offset to 64-bit integer.
This is what MachineFrameInfo expects.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D149575
The term "next stack offset" is misleading because the next argument is
not necessarily allocated at this offset due to alignment constrains.
It also does not make much sense when allocating arguments at negative
offsets (introduced in a follow-up patch), because the returned offset
would be past the end of the next argument.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D149566
In D79537, `nomerge` was made to only apply to non-tail calls. This fixes it by also applying it to tail calls.
For ARM, I only made the new MI to inherit the flag under `TCRETURNdi` and `TCRETURNri`, because that's the place tail calls got replaced. Not sure if there's any other place needed.
Fixes#61545.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D146749
This is rework of;
- rG13e77db2df94 (r328395; MVT)
Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h`
can be restored as well.
Depends on D148767
Differential Revision: https://reviews.llvm.org/D149024
This patch is to fix the xxperm vector operand swap condition so that the
single-use operand is in V2 to prevent copying, it also fixes the subtarget
condition to exploit the xpperm.
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D146632
Power ISA 3.0 introduced new 'test data class' instructions, which
accept flags for: NaN/Infinity/Zero/Denormal. This instruction can be
used to implement custom lowering for llvm.is.fpclass, but some extra
bits provided by the intrinsic are missing (normal and QNaN/SNaN).
For those categories not natively supported, this patch uses a two-way
or three-way combination to implement correct behavior.
Reviewed By: sepavloff, shchenz
Differential Revision: https://reviews.llvm.org/D140381
Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source.
Differential Revision: https://reviews.llvm.org/D147264
This patch partially implements the parameter passing rules outlined in the
ELFv2 ABI within TableGen. Specifically, it implements the parameter assignment
of integers, floats, and vectors within registers - where the GPR numbering will
be "skipped" depending on the ordering of floats and vectors that appear within
a parameter list.
As we begin to adopt GlobalISel to the PowerPC backend, there is a need for a
TableGen definition that encapsulates the ELFv2 parameter passing rules. Thus,
this patch also changes the default calling convention that is returned within
the ccAssignFnForCall() function used in our GlobalISel implementation, and also
adds some additional testing of the calling convention that is implemented.
Future patches that build on top of this initial TableGen definition will aim to
add more of the ABI complexities, including support for additional types and
also in-memory arguments.
Differential Revision: https://reviews.llvm.org/D137504
ConstantSDNode provides some convenience functions like isZero,
getZExtValue, and isMinSignedValue that are named identically to those
provided by APInt, so we can "skip" getAPIntValue.