The result type of the vector extend intrinsics generated by the
BUILD_VECTOR lowering code should match how they are actually defined.
Currently the result type is defaulting to the operand type there. This
can conflict with calls to the same intrinsic from other paths.
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strlen routine is a millicode implementation;
we use millicode for the strlen function instead of a library call to
improve performance.
The PowerPC changes are caused by shifts created by different IR
operations being CSEd now. This allows consecutive loads to be turned
into vectors earlier. This has effects on the ordering of other combines
and legalizations. This leads to some improvements and some regressions.
This was being used for 2 different purposes.
The TargetMachine constructor prepends +64bit based on isPPC64
triples as a mode switch. The same feature name was also explicitly
added to different processors, making it impossible to perform a pure
feature check for whether 64-bit mode is enabled ir not. i.e.,
checkFeatures("+64bit") would be true even for ppc32 triples.
The comment in tablegen suggests it's relevant to track which processors
support 64-bit mode independently of whether that's the active compile
target, so replace that with a new feature.
Pre-commit test case for exploitation of `xxsel` for ternary operations
of the pattern. This adds support for v4i32, v2i64, v16i8 and v8i16
operand types for the following patterns.
The following are the patterns involved in the change:
```
ternary(A, and(B,C), nor(B,C))
ternary(A, B, nor(B,C))
ternary(A, C, nor(B,C))
ternary(A, xor(B,C), nor(B,C))
ternary(A, not(C), nor(B,C))
ternary(A, not(B), nor(B,C))
ternary(A, nand(B,C), nor(B,C))
ternary(A, or(B,C), eqv(B,C))
ternary(A, nor(B,C), eqv(B,C))
ternary(A, not(C), eqv(B,C))
ternary(A, nand(B,C), eqv(B,C))
ternary(A, and(B,C), not(C))
ternary(A, B, not(C))
ternary(A, xor(B,C), not(C))
ternary(A, or(B,C), not(C))
ternary(A, not(B), not(C))
ternary(A, nand(B,C), not(C))
ternary(A, and(B,C), not(B))
ternary(A, xor(B,C), not(B))
ternary(A, or(B,C), not(B))
ternary(A, nand(B,C), not(B))
ternary(A, B, nand(B,C))
ternary(A, C, nand(B,C))
ternary(A, xor(B,C), nand(B,C))
ternary(A, or(B,C), nand(B,C))
ternary(A, eqv(B,C), nand(B,C))
```
Exploitation of `xxeval` for the above patterns to be added as a follow
up.
Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
`f16` is more functional than just a storage type on the platform,
though it does have some codegen issues [1]. To prepare for future
changes, do the following nonfunctional updates to the existing `half`
test:
* Add tests for passing and returning the type directly.
* Add tests showing bitcast behavior, which is currently incorrect but
serves as a baseline.
* Add tests for `fabs` and `copysign` (trivial operations that shouldn't
require libcalls).
* Add invocations for big-endian and for PPC32.
* Rename the test to `half.ll` to reflect its status, which also matches
other backends.
[1]: https://github.com/llvm/llvm-project/issues/97975
This patch enables `-fpatchable-function-entry` on PPC64 little-endian
Linux. It is mutually exclusive with existing XRay instrumentation on
this target.
NFC patch to add the flags -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr
to the following test files
```
llvm/test/CodeGen/PowerPC/recipest.ll
llvm/test/CodeGen/PowerPC/setcc-logic.ll
llvm/test/CodeGen/PowerPC/vector-popcnt-128-ult-ugt.ll
```
Created this PR based on this discussion:
https://github.com/llvm/llvm-project/pull/151971#issuecomment-3234090675
Co-authored-by: himadhith <himadhith.v@ibm.com>
The patch add branch hint for AtomicExpandImpl::expandAtomicCmpXchg, For
example: in PowerPC, it support branch hint as
```
loop:
lwarx r6,0,r3 # load and reserve
cmpw r4,r6 #1st 2 operands equal? bne- exit #skip if not
bne- exit #skip if not
stwcx. r5,0,r3 #store new value if still res’ved bne- loop #loop if lost reservation
bne- loop #loop if lost reservation
exit:
mr r4,r6 #return value from storage
```
`-` hints not taken,
`+` hints taken,
NFC patch to add the flags `-ppc-asm-full-reg-names
--ppc-vsr-nums-as-vr` to the test file
`llvm/test/CodeGen/PowerPC/check-zero-vector.ll`.
Created this PR based on this discussion:
https://github.com/llvm/llvm-project/pull/151971#issuecomment-3234090675
Co-authored-by: himadhith <himadhith.v@ibm.com>
Co-authored-by: Lei Huang <lei@ca.ibm.com>
This change implements a patfrag based pattern matching ~dag combiner~
that combines consecutive `VSRO (Vector Shift Right Octet)` and `VSR
(Vector Shift Right)` instructions into a single `VSRQ (Vector Shift
Right Quadword)` instruction on Power10+ processors.
Vector right shift operations like `vec_srl(vec_sro(input, byte_shift),
bit_shift)` generate two separate instructions `(VSRO + VSR)` when they
could be optimised into a single `VSRQ `instruction that performs the
equivalent operation.
```
vsr(vsro (input, vsro_byte_shift), vsr_bit_shift) to vsrq(input, vsrq_bit_shift)
where vsrq_bit_shift = (vsro_byte_shift * 8) + vsr_bit_shift
```
Note:
```
vsro : Vector Shift Right by Octet VX-form
- vsro VRT, VRA, VRB
- The contents of VSR[VRA+32] are shifted right by the number of bytes specified in bits 121:124 of VSR[VRB+32].
- Bytes shifted out of byte 15 are lost.
- Zeros are supplied to the vacated bytes on the left.
- The result is placed into VSR[VRT+32].
vsr : Vector Shift Right VX-form
- vsr VRT, VRA, VRB
- The contents of VSR[VRA+32] are shifted right by the number of bits specified in bits 125:127 of VSR[VRB+32]. 3 bits.
- Bits shifted out of bit 127 are lost.
- Zeros are supplied to the vacated bits on the left.
- The result is place into VSR[VRT+32], except if, for any byte element in VSR[VRB+32], the low-order 3 bits are not equal to the shift amount, then VSR[VRT+32] is undefined.
vsrq : Vector Shift Right Quadword VX-form
- vsrq VRT,VRA,VRB
- Let src1 be the contents of VSR[VRA+32]. Let src2 be the contents of VSR[VRB+32].
- src1 is shifted right by the number of bits specified in the low-order 7 bits of src2.
- Bits shifted out the least-significant bit are lost.
- Zeros are supplied to the vacated bits on the left.
- The result is placed into VSR[VRT+32].
```
---------
Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
Adds support for ternary equivalent operations of the form `ternary(A,
X, B)` and `ternary(A, X, C)` where `X=[and(B,C)| nor(B,C)| eqv(B,C)|
nand(B,C)]`.
The following are the patterns involved and the imm values:
| **Operation** | **Immediate Value** |
|----------------------------|---------------------|
| ternary(A, and(B,C), B) | 49 |
| ternary(A, nor(B,C), B) | 56 |
| ternary(A, eqv(B,C), B) | 57 |
| ternary(A, nand(B,C), B) | 62 |
| | |
| ternary(A, and(B,C), C) | 81 |
| ternary(A, nor(B,C), C) | 88 |
| ternary(A, eqv(B,C), C) | 89 |
| ternary(A, nand(B,C), C) | 94 |
eg. `xxeval XT, XA, XB, XC, 49`
- performs `XA ? and(XB, XC) : B`and places the result in `XT`.
This is the continuation of [[PowerPC] Exploit xxeval instruction for
ternary patterns - ternary(A, X,
and(B,C))](https://github.com/llvm/llvm-project/pull/141733#top).
---------
Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
This patch updates PPCInstrInfo::copyPhysReg to support DMR and WACC
register classes and extends the PPCVSXCopy pass to handle specific WACC
copy patterns.
The scalarized IR was written before improvements to SLP / cost models
ensured that the abs intrinsic was easily vectorizable
opt -O3 : https://zig.godbolt.org/z/39T65vh8M
Now that it is we need a more useful llc test
This pseudo-instruction emits a local `bl` writing LR, so that must be
saved and restored for the function to return to the right place. If
not, we'll return to the inline `.long` that the `bl` stepped over.
This fixes the `SIGILL` seen in rayon-rs/rayon#1268.
Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie,
and __guard_local. As far as I can tell these are all just different
names for the same shaped functionality on different systems.
These aren't really functions, but special global variable names. They
should probably be treated the same way; all the same contexts that
need to know about emittable function names also need to know about
this. This avoids a special case check in IRSymtab.
This isn't a complete change, there's a lot more cleanup which
should be done. The stack protector configuration system is a
complete mess. There are multiple overlapping controls, used in
3 different places. Some of the target control implementations overlap
with conditions used in the emission points, and some use correlated
but not identical conditions in different contexts.
i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and
insertSSPDeclarations are all used in inconsistent ways so I don't know
if I've tracked the intention of the system correctly.
The PowerPC test change is a bug fix on linux. Previously the manual
conditions were based around !isOSOpenBSD, which is not the condition
where __stack_chk_guard are used. Now getSDagStackGuard returns the
proper global reference, resulting in LOAD_STACK_GUARD getting a
MachineMemOperand which allows scheduling.
When moving fcti results from float registers to normal registers
through memory, even though MPI was adjusted to account for endianness,
FIPtr was always adjusted for big-endian, which caused loads of wrong
half of a value in little-endian mode.
Support the following BCD format conversion builtins for PowerPC.
- `__builtin_bcdcopysign` – Conversion that returns the decimal value of
the first parameter combined with the sign code of the second parameter.
`
- `__builtin_bcdsetsign` – Conversion that sets the sign code of the
input parameter in packed decimal format.
> Note: This built-in function is valid only when all following
conditions are met:
> -qarch is set to utilize POWER9 technology.
> The bcd.h file is included.
## Prototypes
```c
vector unsigned char __builtin_bcdcopysign(vector unsigned char, vector unsigned char);
vector unsigned char __builtin_bcdsetsign(vector unsigned char, unsigned char);
```
## Usage Details
`__builtin_bcdsetsign`: Returns the packed decimal value of the first
parameter combined with the sign code.
The sign code is set according to the following rules:
- If the packed decimal value of the first parameter is positive, the
following rules apply:
- If the second parameter is 0, the sign code is set to 0xC.
- If the second parameter is 1, the sign code is set to 0xF.
- If the packed decimal value of the first parameter is negative, the
sign code is set to 0xD.
> notes:
> The second parameter can only be 0 or 1.
> You can determine whether a packed decimal value is positive or
negative as follows:
> - Packed decimal values with sign codes **0xA, 0xC, 0xE, or 0xF** are
interpreted as positive.
> - Packed decimal values with sign codes **0xB or 0xD** are interpreted
as negative.
---------
Co-authored-by: Aditi-Medhane <aditi.medhane@ibm.com>
780054d3ff18075a6bc433029f336931792b1d2d added support for
`ISD::AssertNoFPClass`.
This ISD node can be used with the `ppc_fp128` type, which is really
just two `f64s` and requires expanding when used with
`ISD::AssertNoFPClass`. Without the support for expanding the result, we
get an assertion because the legalizer does not know how to expand the
results of `ppc_fp128` with `ISD::AssertNoFPClass`.
```
ExpandFloatResult #0: t7: ppcf128 = AssertNoFPClass t5, TargetConstant:i32<3>
LLVM ERROR: Do not know how to expand the result of this operator!
```
Thus, this patch aims to add support for the expand so we no longer
assert.
This fixes#151375.
In review of bbde6b, I had originally proposed that we support the
legacy text format. As review evolved, it bacame clear this had been a
bad idea (too much complexity), but in order to let that patch finally
move forward, I approved the change with the variant. This change undoes
the variant, and updates all the tests to just use the array form.
In the [ [CodeGen] Store call frame size in
MachineBasicBlock](https://reviews.llvm.org/D156113), it mentions When a
basic block has been split in the middle of a call sequence. the call
frame size may not be zero, it need to set the setCallFrameSize for the
new MachineBasicBlock. but in the function `splitMBB(BlockSplitInfo
&BSI)` in the llvm/lib/Target/PowerPC/PPCReduceCRLogicals.cpp , it do
not setCallFrameSzie for the new MachineBasicBlock `NewMBB`, we will
setCallFrameSzie in the patch.
the patch fix the crash mention in
https://github.com/llvm/llvm-project/pull/144594#issuecomment-2993736654
Many backends are missing either all tests for lrint, or specifically
those for f16, which currently crashes for `softPromoteHalf` targets.
For a number of popular backends, do the following:
* Ensure f16, f32, f64, and f128 are all covered
* Ensure both a 32- and 64-bit target are tested, if relevant
* Add `nounwind` to clean up CFI output
* Add a test covering the above if one did not exist
* Always specify the integer type in intrinsic calls
There are quite a few FIXMEs here, especially for `f16`, but much of
this will be resolved in the near future.
If a copy exists between creation of a crbit and a spill, machine-cp
may delete the copy since it seems unaware of the relation between a cr
and crbit. A fix was previously made for the generic ppc64 lowering. It
should be applied to the pwr9 and pwr10 variants too.
Likewise, relax and extend the pwr8 test to verify pwr9 and pwr10
codegen too.
This fixes#143989.
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __memcmp routine is a millicode implementation;
we use millicode for the memcmp function instead of a library call to
improve performance.
We can expand the init intrinsic to create a descriptor for the nested
procedure by combining the entry point and TOC pointer from the global
descriptor with the nest argument. The normal indirect call sequence
then calls the nested procedure through the descriptor like all other
calls. Patch also implements support for a nest parameter by mapping it
to gpr 11.
The patch fixed a bug introduced patch [[PowePC] using MTVSRBMI
instruction instead of constant pool in
power10+](https://github.com/llvm/llvm-project/pull/144084#top).
The issue arose because the layout of vector register elements differs
between little-endian and big-endian modes — specifically, the elements
appear in reverse order. This led to incorrect behavior when loading
constants using MTVSRBMI in big-endian configurations.
NFC patch to clean up extra lines of code in the file
`llvm/test/CodeGen/PowerPC/check-zero-vector.ll` as the current one has
loop unrolled.
Also removing the file `clang/test/CodeGen/PowerPC/check-zero-vector.c`
as the patch affects only the backend.
Co-authored-by: himadhith <himadhith.v@ibm.com>
When arbitrary sized (non-simple type, or non-power of two types)
integers are passed on the stack, these integers are not handled when
lowering formal arguments on AIX as we always assume we will encounter
simple type integers.
However, it is possible for frontends to generate arbitrary sized
immediate values in IR. Specifically in rustc, it will generate an
integer value in LLVM IR for small structures that are less than a
pointer size, which is done for optimization purposes for the Rust ABI.
For example, if a Rust structure of three characters is passed into
function on the stack,
```
struct my_struct {
field1: u8,
field2: u8,
field3: u8,
}
```
This will generate an `i24` type in LLVM IR.
Currently, it is not obvious for the backend to distinguish an integer
versus something that wasn't an integer to begin with (such as a
struct), and the latter case would not have an extend on the parameter.
Thus, this PR allows us to perform a truncation and extend on integers,
both non-simple and simple types.
LICM tries to reassociate GEPs in order to hoist an invariant GEP.
Currently, it also does this in the case where the GEP has a constant
offset.
This is usually undesirable. From a back-end perspective, constant GEPs
are usually free because they can be folded into addressing modes, so
this just increases register pressume. From a middle-end perspective,
keeping constant offsets last in the chain makes it easier to analyze
the relationship between multiple GEPs on the same base, especially
after CSE.
The worst that can happen here is if we start with something like
```
loop {
p + 4*x
p + 4*x + 1
p + 4*x + 2
p + 4*x + 3
}
```
And LICM converts it into:
```
p.1 = p + 1
p.2 = p + 2
p.3 = p + 3
loop {
p + 4*x
p.1 + 4*x
p.2 + 4*x
p.3 + 4*x
}
```
Which is much worse than leaving it for CSE to convert to:
```
loop {
p2 = p + 4*x
p2 + 1
p2 + 2
p2 + 3
}
```
This tries to reland #123632 (previously reverted by commit
6b1db79887df19bc8e8c946108966aa6021c8b87)
This PR aims to fix coalescing of SUBREG_TO_REG when sub-register
liveness tracking is enabled and this is now the so-manieth
reincarnation of this effort :)
This change is needed in order to enable subreg liveness tracking for
AArch64, because without the implicit-def, Machine Copy Propagation
would remove a 'redundant' copy because it doesn't realise that the
top 32-bits of the register are zeroed, which subsequent instructions
rely on.
Changes compared to previous PR:
* Rather than updating all instructions that define the source register
(SrcReg) of the SUBREG_TO_REG, this new approach only updates
instructions
that define SrcReg when they dominate the SUBREG_TO_REG. The live-ranges
are updated accordingly.
## Description
<!--- Title/Description will be Subject/Body of commit message. -->
<!--- Please be concise and limit the subject line to 50 characters, -->
<!--- and wrap the Description at 72 characters. -->
<!--- Describe why this is required, what problem it solves. -->
Adds support for ternary equivalent operations of the form `ternary(A,
X, and(B,C))` where `X=[xor(B,C)| nor(B,C)| eqv(B,C)| not(B)| not(C)]`.
List of `xxeval` equivalent ternary operations added and the
corresponding `imm` value required:
Ternary Operator| Imm Value
--|--
ternary(A, xor(B,C), and(B,C)) | 22
ternary(A, nor(B,C), and(B,C)) | 24
ternary(A, eqv(B,C), and(B,C)) | 25
ternary(A, not(C), and(B,C)) | 26
ternary(A, not(B), and(B,C)) | 28
eg. `xxeval XT,XA,XB,XC,22`
- performs `XA ? xor(XB, XC) : and(XB,XC)`and places the result in `XT`.
Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
These float operations were expanded for scalar f32/f64/f128, but not
for f16 and more problematically, not for vectors. A small subset of
them was separately set to expand for vectors.
Change these to always expand by default, and adjust targets to mark
these as legal where necessary instead.
This is a much safer default, and avoids unnecessary legalization
failures because a target failed to manually mark them as expand.
Fixes https://github.com/llvm/llvm-project/issues/110753.
Fixes https://github.com/llvm/llvm-project/issues/121390.
This is a partial revert of #145939 (I've kept the BUILD_VECTOR(FREEZE(UNDEF), FREEZE(UNDEF), elt2, ...) canonicalization) as we're getting reports of infinite loops (#148084).
The issue appears to be due to deep chains of nodes and how visitFREEZE replaces all instances of an operand with a common frozen version - other users of the original frozen node then get added back to the worklist but might no longer be able to confirm a node isn't poison due to recursion depth limits on isGuaranteedNotToBeUndefOrPoison.
The issue still exists with the old implementation but by only allowing a single frozen operand it helps prevent cases of interdependent frozen nodes.
I'm still working on supporting multiple operands as its critical for topological DAG handling but need to get a fix in for trunk and 21.x.
Fixes#148084
PPCSubtarget is not always initialized, depending on which passes are
running, and in our downstream fork, -enable-matrix is the default
configuration (regardless of whether matrix intrinsics are present in
the IR), which triggers a fatal error in builtins-ppc-fpconstrained.c.
7dce16f69dc3e26cb74d5ad38b0648a6f47f9640 removed a libcall for
STACKPROTECTOR_CHECK_FAIL from OpenBSD but added no tests.
Add a basic test copied from RISCV into all the backends on
the OpenBSD page of supported architectures before I potentially
break in in RuntimeLibcalls refactoring.
Many tests for floating point libcalls include CFI directives, which
isn't needed for the purpose of these tests. Mark some of the relevant
test functions `nounwind` in order to remove this noise.
NFC patch to add testcase for locking down the support of Zero vector
comparisons using the `vcmpgtuh (vector compare greater than unsigned
halfword)` instruction.
Currently `vcmpequh (vector compare equal unsigned halfword)` is in use.
---------
Co-authored-by: himadhith <himadhith.v@ibm.com>
Co-authored-by: Tony Varghese <tonypalampalliyil@gmail.com>
Use `emitValueToAlignment` as the section does not contain code.
`emitCodeAlignment` would lead to ALIGN relocations on RISC-V and
LoongArch with linker relaxation.
In addition, change the alignment to wordsize, sufficient for the
runtime requirement (`XRayFunctionSledIndex`).
Related to #147322