When moving fcti results from float registers to normal registers
through memory, even though MPI was adjusted to account for endianness,
FIPtr was always adjusted for big-endian, which caused loads of wrong
half of a value in little-endian mode.
Support the following BCD format conversion builtins for PowerPC.
- `__builtin_bcdcopysign` – Conversion that returns the decimal value of
the first parameter combined with the sign code of the second parameter.
`
- `__builtin_bcdsetsign` – Conversion that sets the sign code of the
input parameter in packed decimal format.
> Note: This built-in function is valid only when all following
conditions are met:
> -qarch is set to utilize POWER9 technology.
> The bcd.h file is included.
## Prototypes
```c
vector unsigned char __builtin_bcdcopysign(vector unsigned char, vector unsigned char);
vector unsigned char __builtin_bcdsetsign(vector unsigned char, unsigned char);
```
## Usage Details
`__builtin_bcdsetsign`: Returns the packed decimal value of the first
parameter combined with the sign code.
The sign code is set according to the following rules:
- If the packed decimal value of the first parameter is positive, the
following rules apply:
- If the second parameter is 0, the sign code is set to 0xC.
- If the second parameter is 1, the sign code is set to 0xF.
- If the packed decimal value of the first parameter is negative, the
sign code is set to 0xD.
> notes:
> The second parameter can only be 0 or 1.
> You can determine whether a packed decimal value is positive or
negative as follows:
> - Packed decimal values with sign codes **0xA, 0xC, 0xE, or 0xF** are
interpreted as positive.
> - Packed decimal values with sign codes **0xB or 0xD** are interpreted
as negative.
---------
Co-authored-by: Aditi-Medhane <aditi.medhane@ibm.com>
780054d3ff18075a6bc433029f336931792b1d2d added support for
`ISD::AssertNoFPClass`.
This ISD node can be used with the `ppc_fp128` type, which is really
just two `f64s` and requires expanding when used with
`ISD::AssertNoFPClass`. Without the support for expanding the result, we
get an assertion because the legalizer does not know how to expand the
results of `ppc_fp128` with `ISD::AssertNoFPClass`.
```
ExpandFloatResult #0: t7: ppcf128 = AssertNoFPClass t5, TargetConstant:i32<3>
LLVM ERROR: Do not know how to expand the result of this operator!
```
Thus, this patch aims to add support for the expand so we no longer
assert.
This fixes#151375.
In review of bbde6b, I had originally proposed that we support the
legacy text format. As review evolved, it bacame clear this had been a
bad idea (too much complexity), but in order to let that patch finally
move forward, I approved the change with the variant. This change undoes
the variant, and updates all the tests to just use the array form.
In the [ [CodeGen] Store call frame size in
MachineBasicBlock](https://reviews.llvm.org/D156113), it mentions When a
basic block has been split in the middle of a call sequence. the call
frame size may not be zero, it need to set the setCallFrameSize for the
new MachineBasicBlock. but in the function `splitMBB(BlockSplitInfo
&BSI)` in the llvm/lib/Target/PowerPC/PPCReduceCRLogicals.cpp , it do
not setCallFrameSzie for the new MachineBasicBlock `NewMBB`, we will
setCallFrameSzie in the patch.
the patch fix the crash mention in
https://github.com/llvm/llvm-project/pull/144594#issuecomment-2993736654
Many backends are missing either all tests for lrint, or specifically
those for f16, which currently crashes for `softPromoteHalf` targets.
For a number of popular backends, do the following:
* Ensure f16, f32, f64, and f128 are all covered
* Ensure both a 32- and 64-bit target are tested, if relevant
* Add `nounwind` to clean up CFI output
* Add a test covering the above if one did not exist
* Always specify the integer type in intrinsic calls
There are quite a few FIXMEs here, especially for `f16`, but much of
this will be resolved in the near future.
If a copy exists between creation of a crbit and a spill, machine-cp
may delete the copy since it seems unaware of the relation between a cr
and crbit. A fix was previously made for the generic ppc64 lowering. It
should be applied to the pwr9 and pwr10 variants too.
Likewise, relax and extend the pwr8 test to verify pwr9 and pwr10
codegen too.
This fixes#143989.
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __memcmp routine is a millicode implementation;
we use millicode for the memcmp function instead of a library call to
improve performance.
We can expand the init intrinsic to create a descriptor for the nested
procedure by combining the entry point and TOC pointer from the global
descriptor with the nest argument. The normal indirect call sequence
then calls the nested procedure through the descriptor like all other
calls. Patch also implements support for a nest parameter by mapping it
to gpr 11.
The patch fixed a bug introduced patch [[PowePC] using MTVSRBMI
instruction instead of constant pool in
power10+](https://github.com/llvm/llvm-project/pull/144084#top).
The issue arose because the layout of vector register elements differs
between little-endian and big-endian modes — specifically, the elements
appear in reverse order. This led to incorrect behavior when loading
constants using MTVSRBMI in big-endian configurations.
NFC patch to clean up extra lines of code in the file
`llvm/test/CodeGen/PowerPC/check-zero-vector.ll` as the current one has
loop unrolled.
Also removing the file `clang/test/CodeGen/PowerPC/check-zero-vector.c`
as the patch affects only the backend.
Co-authored-by: himadhith <himadhith.v@ibm.com>
When arbitrary sized (non-simple type, or non-power of two types)
integers are passed on the stack, these integers are not handled when
lowering formal arguments on AIX as we always assume we will encounter
simple type integers.
However, it is possible for frontends to generate arbitrary sized
immediate values in IR. Specifically in rustc, it will generate an
integer value in LLVM IR for small structures that are less than a
pointer size, which is done for optimization purposes for the Rust ABI.
For example, if a Rust structure of three characters is passed into
function on the stack,
```
struct my_struct {
field1: u8,
field2: u8,
field3: u8,
}
```
This will generate an `i24` type in LLVM IR.
Currently, it is not obvious for the backend to distinguish an integer
versus something that wasn't an integer to begin with (such as a
struct), and the latter case would not have an extend on the parameter.
Thus, this PR allows us to perform a truncation and extend on integers,
both non-simple and simple types.
LICM tries to reassociate GEPs in order to hoist an invariant GEP.
Currently, it also does this in the case where the GEP has a constant
offset.
This is usually undesirable. From a back-end perspective, constant GEPs
are usually free because they can be folded into addressing modes, so
this just increases register pressume. From a middle-end perspective,
keeping constant offsets last in the chain makes it easier to analyze
the relationship between multiple GEPs on the same base, especially
after CSE.
The worst that can happen here is if we start with something like
```
loop {
p + 4*x
p + 4*x + 1
p + 4*x + 2
p + 4*x + 3
}
```
And LICM converts it into:
```
p.1 = p + 1
p.2 = p + 2
p.3 = p + 3
loop {
p + 4*x
p.1 + 4*x
p.2 + 4*x
p.3 + 4*x
}
```
Which is much worse than leaving it for CSE to convert to:
```
loop {
p2 = p + 4*x
p2 + 1
p2 + 2
p2 + 3
}
```
This tries to reland #123632 (previously reverted by commit
6b1db79887df19bc8e8c946108966aa6021c8b87)
This PR aims to fix coalescing of SUBREG_TO_REG when sub-register
liveness tracking is enabled and this is now the so-manieth
reincarnation of this effort :)
This change is needed in order to enable subreg liveness tracking for
AArch64, because without the implicit-def, Machine Copy Propagation
would remove a 'redundant' copy because it doesn't realise that the
top 32-bits of the register are zeroed, which subsequent instructions
rely on.
Changes compared to previous PR:
* Rather than updating all instructions that define the source register
(SrcReg) of the SUBREG_TO_REG, this new approach only updates
instructions
that define SrcReg when they dominate the SUBREG_TO_REG. The live-ranges
are updated accordingly.
## Description
<!--- Title/Description will be Subject/Body of commit message. -->
<!--- Please be concise and limit the subject line to 50 characters, -->
<!--- and wrap the Description at 72 characters. -->
<!--- Describe why this is required, what problem it solves. -->
Adds support for ternary equivalent operations of the form `ternary(A,
X, and(B,C))` where `X=[xor(B,C)| nor(B,C)| eqv(B,C)| not(B)| not(C)]`.
List of `xxeval` equivalent ternary operations added and the
corresponding `imm` value required:
Ternary Operator| Imm Value
--|--
ternary(A, xor(B,C), and(B,C)) | 22
ternary(A, nor(B,C), and(B,C)) | 24
ternary(A, eqv(B,C), and(B,C)) | 25
ternary(A, not(C), and(B,C)) | 26
ternary(A, not(B), and(B,C)) | 28
eg. `xxeval XT,XA,XB,XC,22`
- performs `XA ? xor(XB, XC) : and(XB,XC)`and places the result in `XT`.
Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
These float operations were expanded for scalar f32/f64/f128, but not
for f16 and more problematically, not for vectors. A small subset of
them was separately set to expand for vectors.
Change these to always expand by default, and adjust targets to mark
these as legal where necessary instead.
This is a much safer default, and avoids unnecessary legalization
failures because a target failed to manually mark them as expand.
Fixes https://github.com/llvm/llvm-project/issues/110753.
Fixes https://github.com/llvm/llvm-project/issues/121390.
This is a partial revert of #145939 (I've kept the BUILD_VECTOR(FREEZE(UNDEF), FREEZE(UNDEF), elt2, ...) canonicalization) as we're getting reports of infinite loops (#148084).
The issue appears to be due to deep chains of nodes and how visitFREEZE replaces all instances of an operand with a common frozen version - other users of the original frozen node then get added back to the worklist but might no longer be able to confirm a node isn't poison due to recursion depth limits on isGuaranteedNotToBeUndefOrPoison.
The issue still exists with the old implementation but by only allowing a single frozen operand it helps prevent cases of interdependent frozen nodes.
I'm still working on supporting multiple operands as its critical for topological DAG handling but need to get a fix in for trunk and 21.x.
Fixes#148084
PPCSubtarget is not always initialized, depending on which passes are
running, and in our downstream fork, -enable-matrix is the default
configuration (regardless of whether matrix intrinsics are present in
the IR), which triggers a fatal error in builtins-ppc-fpconstrained.c.
7dce16f69dc3e26cb74d5ad38b0648a6f47f9640 removed a libcall for
STACKPROTECTOR_CHECK_FAIL from OpenBSD but added no tests.
Add a basic test copied from RISCV into all the backends on
the OpenBSD page of supported architectures before I potentially
break in in RuntimeLibcalls refactoring.
Many tests for floating point libcalls include CFI directives, which
isn't needed for the purpose of these tests. Mark some of the relevant
test functions `nounwind` in order to remove this noise.
NFC patch to add testcase for locking down the support of Zero vector
comparisons using the `vcmpgtuh (vector compare greater than unsigned
halfword)` instruction.
Currently `vcmpequh (vector compare equal unsigned halfword)` is in use.
---------
Co-authored-by: himadhith <himadhith.v@ibm.com>
Co-authored-by: Tony Varghese <tonypalampalliyil@gmail.com>
Use `emitValueToAlignment` as the section does not contain code.
`emitCodeAlignment` would lead to ALIGN relocations on RISC-V and
LoongArch with linker relaxation.
In addition, change the alignment to wordsize, sufficient for the
runtime requirement (`XRayFunctionSledIndex`).
Related to #147322
The instruction MTVSRBMI set 0x00(or 0xFF) to each byte of VSR based on
the bits mask. Using the instruction instead of constant pool can reduce
the asm code size and instructions in power10.
Pre-commit test case for exploitation of `xxsel` for ternary operations
of the pattern. This adds support for `v4i32`, `v2i64`, `v16i8` and
`v8i16` operand types for the following patterns.
```
ternary(A, X, and(B,C))
ternary(A, X, B)
ternary(A, X, C)
ternary(A, X, xor(B,C))
ternary(A,X,or(B,C))
```
Exploitation of xxeval to be added later.
Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
This PR resolves https://github.com/llvm/llvm-project/issues/144513
The modification include five pattern :
1.vselect Cond, 0, 0 → 0
2.vselect Cond, -1, 0 → bitcast Cond
3.vselect Cond, -1, x → or Cond, x
4.vselect Cond, x, 0 → and Cond, x
5.vselect Cond, 000..., X -> andn Cond, X
1-4 have been migrated to DAGCombine. 5 still in x86 code.
The reason is that you cannot use the andn instruction directly in
DAGCombine, you can only use and+xor, which will introduce optimization
order issues. For example, in the x86 backend, select Cond, 0, x →
(~Cond) & x, the backend will first check whether the cond node of
(~Cond) is a setcc node. If so, it will modify the comparison operator
of the condition.So the x86 backend cannot complete the optimization of
andn.In short, I think it is a better choice to keep the pattern of
vselect Cond, 000..., X instead of and+xor in combineDAG.
For commit, the first is code changes and x86 test(note 1), the second
is tests in other backend(node 2).
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Always try to fold freeze(op(....)) -> op(freeze(),freeze(),freeze(),...).
This patch proposes we drop the opt-in limit for opcodes that are allowed to push a freeze through the op to freeze all its operands, through the tree towards the roots.
I'm struggling to find a strong reason for this limit apart from the DAG freeze handling being immature for so long - as we've improved coverage in canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison it looks like the regressions are not as severe.
Hopefully this will help some of the regression issues in #143102 etc.
Remove `UnsafeFPMath` in `visitFMULForFMADistributiveCombine`,
`visitFSUBForFMACombine` and `visitFDIV`.
All affected tests are fixed by add fast math flags manually.
Propagate fast math flags when lowering fdiv in NVPTX backend, so it can
produce optimized dag when `unsafe-fp-math` is absent.
The insertion point of COPY isn't always optimal and could eventually
lead to a worse block layout, see the regression test in the first
commit.
This change affects many architectures but the amount of total
instructions in the test cases seems too be slightly lower.
LBARX loads a byte from memory into a register, automatically setting
the remaining bits of the register to zero. If a subsequent RLWINM
instruction is used to clear those same bits (which LBARX has already
set to zero), the RLWINM is redundant and can be eliminated.
these redundant clear instructions are introduced by 85a9f2e14859b.
Support the following packed BCD builtins for PowerPC.
```
__builtin_national2packed - Conversion of National format to Packed decimal format.
__builtin_packed2national - Conversion of Packed decimal format to national format.
__builtin_packed2zoned - Conversion of Packed decimal format to Zoned decimal format.
__builtin_zoned2packed - Conversion of Zoned decimal format to Packed decimal format.
```
### Prototypes:
`vector unsigned char __builtin_national2packed(vector unsigned char a,
unsigned char b);`
`vector unsigned char __builtin_packed2zoned(vector unsigned char,
unsigned char);`
`vector unsigned char __builtin_zoned2packed(vector unsigned char,
unsigned char);`
The condition for the 2nd parameter is consistent over all the 3
prototypes (0 or 1 only).
`vector unsigned char __builtin_packed2national(vector unsigned char);`
Co-authored-by: himadhith <himadhith.v@ibm.com>
Co-authored-by: Tony Varghese <tonypalampalliyil@gmail.com>
Currently, the query assumes that a single undef byte implies the rest of
the `EltSize - 1` bytes are undefs, but that's not always true.
e.g. isSplatShuffleMask(
<0,1,2,3,4,5,6,7,undef,undef,undef,undef,0,1,2,3>, 8) should return
false.
---------
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>