- Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE
nodes to regular LOAD/STORE nodes, make them legal and select those nodes
properly instead. This avoids exposing them to the DAGCombiner.
- AtomicExpand pass no longer casts float/double atomic load/stores to integer
(FP128 is still casted).
Make __builtin_cpu_{init|supports|is} target independent and provide an
opt-in query for targets that want to support it. Each target is still
responsible for their specific lowering/code-gen. Also provide code-gen
for PowerPC.
I originally proposed this in https://reviews.llvm.org/D152914 and this
addresses the comments I received there.
---------
Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn>
Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>
`llvm.trap` is lowered to `PPC::TRAP` and `PPC::TRAP` is set as
terminator. Verifier complains about terminator should not lie in the
middle of an MBB. See #77095.
Fix it by removing `isTerminator` and `isBarrier` and then set `isTrap`
which was introduced by https://reviews.llvm.org/D48836# and is being
used by X86 and AArch64.
`PPC::TRAP` is not a hardware memory barrier and `llvm.trap` doesn't
indicate a memory barrier either.
This patch adds the majority of the missing extended mnemonics that were
introduced in Power 10.
The only extended mnemonics that were not added are related to the plq
and pstq instructions. These will be added in a separate patch as the
instructions themselves would also have to be added.
The SCV instruciton was added on PowerPC on Power 9. This patch adds the
SCV so that it may be used as part of inline asm but does not provide
patterns for it or scheduling information.
Co-authored-by: Stefan Pintilie <stefanp@ca.ibm.com>
The intrinsic uses ImmArg so TargetConstant would be consistent
with how other intrinsics are handled.
This hides the constants from type legalization so we can remove
the promotion support.
isel patterns are updated accordingly.
mffsl is available since ISA 3.0. The builtin is named with ppc prefix
to follow our convention. For targets earlier than power9, GCC generates
extra code to support the functionality, while this patch does not
implement such behavior.
Reviewed By: nemanjai, tuliom
Differential Revision: https://reviews.llvm.org/D158065
Irritatingly, atomic_store had operands in the opposite order from
regular store. This made it difficult to share patterns between
regular and atomic stores.
There was a previous incomplete attempt to move atomic_store into the
regular StoreSDNode which would be better.
I think it was a mistake for all atomicrmw to swap the operand order,
so maybe it's better to take this one step further.
https://reviews.llvm.org/D123143
This fixes issue 64080. tlbilx exists in ISA 2.07 Book III-E. Since
contents of Book III-E were eliminated after ISA 3.0, tlbilx does not
exist in ISA 3.0 and ISA 3.1.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D156204
Add td definitions and asm/disasm tests for the dfp format
instructions in ISA 3.1 section 5.6.6
Reviewed By: stefanp, kamaub
Differential Revision: https://reviews.llvm.org/D154465
Add td definitions and asm/disasm tests for the quantum adjustment
instructions in ISA 3.1 section 5.6.4
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D154369
This patch adds support for the TLS local-exec access model on AIX to allow
for the ability to generate the 32-bit (specifically, non-optimized) code sequence.
This work is a follow up of D149722.
The particular sequence that is generated for this sequence is as follows:
```
.tc var[TC],var[TL]@le. // variable offset, with the le relocation specifier
bla .__get_tpointer() // get the thread pointer, modifies r3
lwz reg1, var[TC](2) // load the variable offset
add reg2, r3, reg1 // add the variable offset to the retrieved thread pointer
```
Differential Revision: https://reviews.llvm.org/D152669
The build failure should be fixed by de681d53. Follow-up refactor will
be done in future patches.
This reverts commit e7c5ced0b9f0551ea17e1d2b48be86f03a772c59.
On PowerPC VSX targets, fp-to-int will be transformed into xscv with
mfvsr. When the result is to be stored, mfvsr can be replaced by a
direct store.
This change simplifies the optimization by using existing fp-to-int
code, which helps CSE and handling strictfp cases.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D141473
Add the following Decimal Floating Point (DFP) instructions for PowerPC.
dadd, daddq, dsub, dsubq
In order to add these instructions a new register class for a pair
of floating point registers is added.
This patch is only to allow the user to specify the instructions in
assembly. There is no scheduling or patterns for the instructions.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D148597
There are some patterns in td files without MVT/class set
for some operands in target pattern that are from the source
pattern. This prevents GlobalISelEmitter from adding them as
a valid rule, because the target child operand is an
unsupported kind operand. For now, for a leaf child, only
IntInit and DefInit are handled in GlobalISelEmitter.
This issue can be workaround by adding MVT/class to the
patterns in the td files, like the workarounds for patterns
anyext and setcc in PPCInstrInfo.td in D140878.
To avoid adding the same workarounds for other patterns in
td files, this patch tries to handle the UnsetInit case in
GlobalISelEmitter.
Adding the new handling allows us to remove the workarounds
in the td files and also generates many selection rules for
PPC target.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D141247
This is a follow-on to https://reviews.llvm.org/D134073.
Currently, all of the "memri"-style complex operands, which contain
both a register and an immediate, are encoded into a single field in
the instruction definition. This requires complex encoders/decoders,
and instruction definitions that insert and extract the correct parts
of the bits.
Now, switch to naming and encoding/decoding the sub-operands
separately.
Thus, we can now disable useDeprecatedPositionallyEncodedOperands.
Reviewed By: barannikov88
Differential Revision: https://reviews.llvm.org/D137670
This is a follow-on to https://reviews.llvm.org/D134073.
After https://reviews.llvm.org/D137653 we can now switch the PPC
target away from positional operand matching.
This patch fixes all of the "easy" cases. While this changes a large
number of lines of tablegen source, it results in only a single
non-comment change in the code generated by tablegen: the (unused)
codegen-only "MTVRSAVEv" instruction was previously incorrectly
encoding operand 0, and now encodes (correctly) operand 1.
Changes which result in generated-code changes have been split off
into the next (smaller) patch, for ease of review.
Reviewed By: barannikov88
Differential Revision: https://reviews.llvm.org/D137661
This patch also includes:
1: CRRegBank support
2: Some workarounds in PPC table gen for anyext/setcc patterns
selection.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D140878
Previous to this patch we only materialized 0.0 and all other floating point
values would be loaded from the TOC. This patch adds materialization for the
floating point values that can be represented as integers in [-16.0, 15.0].
For example we will now materialize 3.0 and -5.0 but not 4.7.
Reviewed By: nemanjai, lei, #powerpc
Differential Revision: https://reviews.llvm.org/D138844
I noticed a an assertion error when building MIPS code that loaded from
NULL. Loading from NULL ends up being a load with maximum alignment, and
due to integer truncation the value maximum was interpreted as 0 and the
assertion in MipsDAGToDAGISel::Select() failed. This previously happened
to work, but the maximum alignment was increased in
df84c1fe78130a86445d57563dea742e1b85156a, so it no longer fits into a 32
bit integer.
Instead of just fixing the one MIPS case, this patch removes all uses of
the deprecated getAlignment() call and replaces them with getAlign().
Differential Revision: https://reviews.llvm.org/D138420
This patch adds spilling for the new WACC registers.
In order to get the spilling test to work the MMA instructions from Power 10 are
now supported for Future CPU except that they are all using the new WACC
registers instead of the ACC registers from Power 10.
Reviewed By: amyk, saghir
Differential Revision: https://reviews.llvm.org/D136728
A new register class as well as a number of related subregisters are being added
to Future CPU. These registers are Dense Math Registers (DMR) and are 1024 bits
long. These regsiters can also be used in consecutive pairs which leads to a
register that is 2048 bits.
This patch also adds 7 new instructions that use these registers. More
instructions will be added in future patches.
Reviewed By: amyk, saghir
Differential Revision: https://reviews.llvm.org/D136366
Inputs to crnor can come from operands with chains so
if it is being used simply to negate such an operand,
the repeated input cannot be CSE'd. This patch just
adds a code-gen only instruction for this that takes
a single input and duplicates it in the encoding of
the underlying crnor.
Differential revision: https://reviews.llvm.org/D133577
There are a few issues with the code we generate for atomic operations and the way we generate it:
- Hard coded CR0 for compares
- Order of operands for compares not conducive to
emitting compare-immediate or for CSE of compares
- Missing MachineMemOperand for st[bhwd]cx intrinsics
- Missing intrinsic properties for the same
- Unnecessary blocks with store conditional
instructions to clear reservation (which ends
up hindering performance)
- Move from CR instructions just to compare the
result of a store conditional with zero (even
though it is a record-form)
This patch aims to resolve all of those issues.
Differential revision: https://reviews.llvm.org/D134783
This feature implements support for making entries in the exception section
on XCOFF on the direct assembly path using the ".except" pseudo-op. It also
provides functionality to lower entries (comprised of language and reason
codes) into the exception section through the use of annotation metadata
attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler
support will be provided in another review. https://reviews.llvm.org/D133030
needs to merge first for LIT tests
Reviewed By: shchenz, RKSimon
Differential Revision: https://reviews.llvm.org/D132146
Add isNotDuplicable to CTRLoop pseudo instructions, to avoid other pass
such as early-tailduplication break the loop structure by duplicating
pseudo instructions.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D132738
This patch fixes the following two bugs in `PPCInstrInfo::isSignOrZeroExtended` helper, which is used from sign-/zero-extension elimination in PPCMIPeephole pass.
- Registers defined by load with update (e.g. LBZU) were identified as already sign or zero-extended. But it is true only for the first def (loaded value) and not for the second def (i.e. updated pointer).
- Registers defined by ORIS/XORIS were identified as already sign-extended. But, it is not true for sign extension depending on the immediate (while it is ok for zero extension).
To handle the first case, the parameter for the helpers is changed from `MachineInstr` to a register number to distinguish first and second defs. Also, this patch moves the initialization of PPCMIPeepholePass to allow mir test case.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D40554
Map hardware loop intrinsics loop_decrement and set_loop_iteration
to the new PowerPC pseudo instructions, so that the hardware loop
intrinsics will be expanded to normal cmp+branch form or ctrloop
form based on the CTR register usage on MIR level.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D123366
I don't have any evidence these particular uses are actually causing any
issues, but we should avoid accidentally truncating immediate values
depending on the host.
This patch implements a new way to generate the CTR loops. Now the
intrinsics inserted in hardware loop pass will be mapped to pseudo
instructions and these pseudo instructions will be expanded to CTR
loop or normal compare+branch loop in this post ISEL pass.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D122125
On Power PC we have ISA3.0 for Power 9, ISA3.1 for Power 10.
This patchs adds an ISA for mcpu=future. The idea is to have a placeholder ISA
for work that is experimental and may not be supported by existing ISAs.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D126075
This patch implements the following floating point negative absolute value
builtins that required for compatibility with the XL compiler:
```
double __fnabs(double);
float __fnabss(float);
```
These builtins will emit :
- fnabs on PWR6 and below, or if VSX is disabled.
- xsnabsdp on PWR7 and above, if VSX is enabled.
Differential Revision: https://reviews.llvm.org/D125506
Currently the regsiter operand definitions are found in three separate files.
This patch moves all of the definitions into PPCRegisterInfo.td.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D123543
Add a compiler option and the instructions required to set the
special Data Stream Control Register (DSCR). The special register will
not be set by default.
Original patch by: Muhammad Usman
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D117013
Currently in Clang, we have two types of builtins for fnmsub operation:
one for float/double vector, they'll be transformed into IR operations;
one for float/double scalar, they'll generate corresponding intrinsics.
But for the vector version of builtin, the 3 op chain may be recognized
as expensive by some passes (like early cse). We need some way to keep
the fnmsub form until code generation.
This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub
intrinsics.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D116015
Currently all of the MMA instructions as well as the MMA related register info
is bundled with the Power 10 instructions. This patch just splits them out.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D120515
The Linux kernel build uses absolute expressions suffixed with @lo/@ha
relocations. This currently doesn't work for DS/DQ form instructions and
there is no reason for it not to. It also works with GAS.
This patch allows this as long as the value is a multiple of 4/16
for DS/DQ form.
Differential revision: https://reviews.llvm.org/D115419
Power ISA 3.1 adds xsmaxcqp/xsmincqp for quad-precision type-c max/min selection,
and this opens the opportunity to improve instruction selection on: llvm.maxnum.f128,
llvm.minnum.f128, and select_cc ordered gt/lt and (don't care) gt/lt.
Reviewed By: nemanjai, shchenz, amyk
Differential Revision: https://reviews.llvm.org/D117006
Add support for Return Oriented Programming (ROP) protection for 32 bit.
This patch also adds a testing for AIX on both 64 and 32 bit.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D111362
This mnemonic has been supported by GAS for years and
it was added to the PowerPC ISA as of ISA 3.1. We will
support the mnemonic to be compatible with GAS.