4160 Commits

Author SHA1 Message Date
RolandF77
1eb575dcae
[PowerPC] Fix vector extend result types in BUILD_VECTOR lowering (#159398)
The result type of the vector extend intrinsics generated by the
BUILD_VECTOR lowering code should match how they are actually defined.
Currently the result type is defaulting to the operand type there. This
can conflict with calls to the same intrinsic from other paths.
2025-09-19 10:43:22 -04:00
zhijian lin
be6c4d933d
[PowerPC] using milicode call for strlen instead of lib call (#153600)
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strlen routine is a millicode implementation;
we use millicode for the strlen function instead of a library call to
improve performance.
2025-09-19 10:02:21 -04:00
Paul Walker
b7e4edca3d
[LLVM][CodeGen] Update PPCFastISel::SelectRet for ConstantInt based vectors. (#159331)
The current implementation assumes ConstantInt return values are scalar,
which is not true when use-constant-int-for-fixed-length-splat is
enabled.
2025-09-19 13:15:57 +01:00
Craig Topper
f209d63b04
[SelectionDAGBuilder][PPC] Use getShiftAmountConstant. (#158400)
The PowerPC changes are caused by shifts created by different IR
operations being CSEd now. This allows consecutive loads to be turned
into vectors earlier. This has effects on the ordering of other combines
and legalizations. This leads to some improvements and some regressions.
2025-09-16 10:26:49 -07:00
Lei Huang
b22448c9ba
[PowerPC] Add intrinsic definition for load and store with Right Length Left-justified (#148873) 2025-09-16 12:36:28 -04:00
Matt Arsenault
e5bbaa9c8f
PPC: Split 64bit target feature into 64bit and 64bit-support (#157206)
This was being used for 2 different purposes.

The TargetMachine constructor prepends +64bit based on isPPC64
triples as a mode switch. The same feature name was also explicitly
added to different processors, making it impossible to perform a pure
feature check for whether 64-bit mode is enabled ir not. i.e.,
checkFeatures("+64bit") would be true even for ppc32 triples.

The comment in tablegen suggests it's relevant to track which processors
support 64-bit mode independently of whether that's the active compile
target, so replace that with a new feature.
2025-09-16 12:43:53 +09:00
zhijian lin
4bf0001c07
[PowerPC][NFC] Pre-commit test case: Implement a more efficient memcmp in cases where the length is known (#158367)
The newly added test case will be used to verify a more efficient memcmp
in cases where the length is known.
2025-09-15 10:26:01 -04:00
Tony Varghese
30010f49ca
[NFC][PowerPC] Pre-commit testcases for locking down the xxsel instructions for ternary(A, X, eqv(B,C)), ternary(A, X, not(C)), ternary(A, X, not(B)), ternary(A, X, nand(B,C)) and ternary(A, X, nor(B,C)) patterns (#158091)
Pre-commit test case for exploitation of `xxsel` for ternary operations
of the pattern. This adds support for v4i32, v2i64, v16i8 and v8i16
operand types for the following patterns.

The following are the patterns involved in the change:
```
ternary(A,  and(B,C),   nor(B,C))
ternary(A,  B,          nor(B,C))
ternary(A,  C,          nor(B,C))
ternary(A,  xor(B,C),   nor(B,C))
ternary(A,  not(C),     nor(B,C))
ternary(A,  not(B),     nor(B,C))
ternary(A,  nand(B,C),  nor(B,C))

ternary(A,  or(B,C),    eqv(B,C))
ternary(A,  nor(B,C),   eqv(B,C))
ternary(A,  not(C),     eqv(B,C))
ternary(A,  nand(B,C),  eqv(B,C))

ternary(A,  and(B,C),   not(C))	   
ternary(A,  B,          not(C))	   
ternary(A,  xor(B,C),   not(C))	   
ternary(A,  or(B,C),    not(C))	   
ternary(A,  not(B),     not(C))	   
ternary(A,  nand(B,C),  not(C))	   

ternary(A,  and(B,C),   not(B))	   
ternary(A,  xor(B,C),   not(B))	   
ternary(A,  or(B,C),    not(B))	   
ternary(A,  nand(B,C),  not(B))	   

ternary(A,  B,          nand(B,C))
ternary(A,  C,          nand(B,C))
ternary(A,  xor(B,C),   nand(B,C))
ternary(A,  or(B,C),    nand(B,C))
ternary(A,  eqv(B,C),   nand(B,C))
```
Exploitation of `xxeval` for the above patterns to be added as a follow
up.

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-09-12 09:36:37 +05:30
Trevor Gross
a975e64239
[PowerPC] Extend and update the test for half support (NFC) (#152625)
`f16` is more functional than just a storage type on the platform,
though it does have some codegen issues [1]. To prepare for future
changes, do the following nonfunctional updates to the existing `half`
test:

* Add tests for passing and returning the type directly.
* Add tests showing bitcast behavior, which is currently incorrect but
serves as a baseline.
* Add tests for `fabs` and `copysign` (trivial operations that shouldn't
require libcalls).
* Add invocations for big-endian and for PPC32.
* Rename the test to `half.ll` to reflect its status, which also matches
other backends.

[1]: https://github.com/llvm/llvm-project/issues/97975
2025-09-10 09:03:29 +00:00
Maryam Moghadas
2bd0d770af
[PowerPC] Support -fpatchable-function-entry on PPC64LE (#151569)
This patch enables `-fpatchable-function-entry` on PPC64 little-endian
Linux. It is mutually exclusive with existing XRay instrumentation on
this target.
2025-09-09 16:43:18 -04:00
Florian Hahn
74ec38fad0
[SCEV] Fold (C * A /u C) -> A, if A is a multiple of C and C a pow-of-2. (#156730)
Alive2 Proof: https://alive2.llvm.org/ce/z/JoHJE9

PR: https://github.com/llvm/llvm-project/pull/156730
2025-09-05 08:45:13 +00:00
Himadhith
ffbd616210
[NFC][PowerPC] adding the options for register names and VSR to VR (#157007)
NFC patch to add the flags -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr
to the following test files
```
llvm/test/CodeGen/PowerPC/recipest.ll
llvm/test/CodeGen/PowerPC/setcc-logic.ll
llvm/test/CodeGen/PowerPC/vector-popcnt-128-ult-ugt.ll
```

Created this PR based on this discussion:
https://github.com/llvm/llvm-project/pull/151971#issuecomment-3234090675

Co-authored-by: himadhith <himadhith.v@ibm.com>
2025-09-05 10:27:02 +05:30
zhijian lin
36cb33bbca
support branch hint for AtomicExpandImpl::expandAtomicCmpXchg (#152366)
The patch add branch hint for AtomicExpandImpl::expandAtomicCmpXchg, For
example: in PowerPC, it support branch hint as

```
loop:
    lwarx r6,0,r3   #  load and reserve
    cmpw r4,r6      #1st 2 operands equal? bne- exit  #skip if not
    bne- exit       #skip if not
    stwcx. r5,0,r3  #store new value if still res’ved bne- loop #loop if lost reservation
    bne- loop #loop if lost reservation
exit:
    mr  r4,r6       #return value from storage
```

`-`  hints not taken,
`+` hints taken,
2025-09-02 09:33:28 -04:00
Himadhith
09350bd1c5
[NFC][PowerPC] adding the arguments for register names and VSR to VR (#155991)
NFC patch to add the flags `-ppc-asm-full-reg-names
--ppc-vsr-nums-as-vr` to the test file
`llvm/test/CodeGen/PowerPC/check-zero-vector.ll`.

Created this PR based on this discussion:
https://github.com/llvm/llvm-project/pull/151971#issuecomment-3234090675

Co-authored-by: himadhith <himadhith.v@ibm.com>
Co-authored-by: Lei Huang <lei@ca.ibm.com>
2025-09-01 10:17:14 +05:30
Tony Varghese
3fc1aad65b
[PowerPC] Merge vsr(vsro(input, byte_shift), bit_shift) to vsrq(input, res_bit_shift) (#154388)
This change implements a patfrag based pattern matching ~dag combiner~
that combines consecutive `VSRO (Vector Shift Right Octet)` and `VSR
(Vector Shift Right)` instructions into a single `VSRQ (Vector Shift
Right Quadword)` instruction on Power10+ processors.

Vector right shift operations like `vec_srl(vec_sro(input, byte_shift),
bit_shift)` generate two separate instructions `(VSRO + VSR)` when they
could be optimised into a single `VSRQ `instruction that performs the
equivalent operation.

```
vsr(vsro (input, vsro_byte_shift), vsr_bit_shift) to vsrq(input, vsrq_bit_shift) 
where vsrq_bit_shift = (vsro_byte_shift * 8) + vsr_bit_shift
```

Note:
```
 vsro : Vector Shift Right by Octet VX-form
- vsro VRT, VRA, VRB
- The contents of VSR[VRA+32] are shifted right by the number of bytes specified in bits 121:124 of VSR[VRB+32].
	- Bytes shifted out of byte 15 are lost. 
	- Zeros are supplied to the vacated bytes on the left.
- The result is placed into VSR[VRT+32].

vsr : Vector Shift Right VX-form
- vsr VRT, VRA, VRB
- The contents of VSR[VRA+32] are shifted right by the number of bits specified in bits 125:127 of VSR[VRB+32]. 3 bits.
	- Bits shifted out of bit 127 are lost.
	- Zeros are supplied to the vacated bits on the left.
- The result is place into VSR[VRT+32], except if, for any byte element in VSR[VRB+32], the low-order 3 bits are not equal to the shift amount, then VSR[VRT+32] is undefined.

vsrq : Vector Shift Right Quadword VX-form
- vsrq VRT,VRA,VRB 
- Let src1 be the contents of VSR[VRA+32]. Let src2 be the contents of VSR[VRB+32]. 
- src1 is shifted right by the number of bits specified in the low-order 7 bits of src2.
	- Bits shifted out the least-significant bit are lost. 
	- Zeros are supplied to the vacated bits on the left. 
	- The result is placed into VSR[VRT+32].
```

---------

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-09-01 10:14:12 +05:30
Tony Varghese
2e7ea9c945
[PowerPC] Exploit xxeval instruction for operations of the form ternary(A,X,B) and ternary(A,X,C). (#152956)
Adds support for ternary equivalent operations of the form `ternary(A,
X, B)` and `ternary(A, X, C)` where `X=[and(B,C)| nor(B,C)| eqv(B,C)|
nand(B,C)]`.

The following are the patterns involved and the imm values:

| **Operation**              | **Immediate Value** |
|----------------------------|---------------------|
| ternary(A,  and(B,C),   B) | 49                  |
| ternary(A,  nor(B,C),   B) | 56                  |
| ternary(A,  eqv(B,C),   B) | 57                  |
| ternary(A,  nand(B,C),  B) | 62                  |
|                            |                     |
| ternary(A,  and(B,C),   C) | 81                  |
| ternary(A,  nor(B,C),   C) | 88                  |
| ternary(A,  eqv(B,C),   C) | 89                  |
| ternary(A,  nand(B,C),  C) | 94                  |

eg.  `xxeval XT, XA, XB, XC, 49` 
- performs `XA ? and(XB, XC) : B`and places the result in `XT`.

This is the continuation of [[PowerPC] Exploit xxeval instruction for
ternary patterns - ternary(A, X,
and(B,C))](https://github.com/llvm/llvm-project/pull/141733#top).

---------

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-09-01 10:13:54 +05:30
paperchalice
19464d951a
[NFC] #155740 post cleanup (#155966)
Remove all "approx-func-fp-math" in tests.
2025-08-29 12:45:38 +08:00
Maryam Moghadas
242d51afe5
[PowerPC] Add DMR and WACC COPY support (#149129)
This patch updates PPCInstrInfo::copyPhysReg to support DMR and WACC
register classes and extends the PPCVSXCopy pass to handle specific WACC
copy patterns.
2025-08-27 11:07:24 -04:00
Simon Pilgrim
6aed01a2a7
[PowerPC] ppc64-P9-vabsd.ll - update v16i8 abdu test now that it vectorizes in the middle-end (#154712)
The scalarized IR was written before improvements to SLP / cost models
ensured that the abs intrinsic was easily vectorizable

opt -O3 : https://zig.godbolt.org/z/39T65vh8M

Now that it is we need a more useful llc test
2025-08-27 07:29:30 +00:00
Josh Stone
e6ae4e689c
[PowerPC] Indicate that PPC32PICGOT clobbers LR (#154654)
This pseudo-instruction emits a local `bl` writing LR, so that must be
saved and restored for the function to return to the right place. If
not, we'll return to the inline `.long` that the `bl` stepped over.

This fixes the `SIGILL` seen in rayon-rs/rayon#1268.
2025-08-25 15:31:27 -07:00
RolandF77
d1cbe6ed74
[PowerPC] Add DMF builtins for build and disassemble (#153097)
Add support for PPC Dense Math builtins mma_build_dmr and
mma_disassemble_dmr builtins.
2025-08-25 12:14:55 -04:00
Matt Arsenault
65d12622fa
RuntimeLibcalls: Add entries for stackprotector globals (#154930)
Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie,
and __guard_local. As far as I can tell these are all just different
names for the same shaped functionality on different systems.

These aren't really functions, but special global variable names. They
should probably be treated the same way; all the same contexts that
need to know about emittable function names also need to know about
this. This avoids a special case check in IRSymtab.

This isn't a complete change, there's a lot more cleanup which
should be done. The stack protector configuration system is a
complete mess. There are multiple overlapping controls, used in
3 different places. Some of the target control implementations overlap
with conditions used in the emission points, and some use correlated
but not identical conditions in different contexts.

i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and
insertSSPDeclarations are all used in inconsistent ways so I don't know
if I've tracked the intention of the system correctly.

The PowerPC test change is a bug fix on linux. Previously the manual
conditions were based around !isOSOpenBSD, which is not the condition
where __stack_chk_guard are used. Now getSDagStackGuard returns the
proper global reference, resulting in LOAD_STACK_GUARD getting a
MachineMemOperand which allows scheduling.
2025-08-23 10:21:00 +09:00
DanilaZhebryakov
0a3ee7de9c
[PowerPC] fix bug affecting float to int32 conversion on LE PowerPC (#150194)
When moving fcti results from float registers to normal registers
through memory, even though MPI was adjusted to account for endianness,
FIPtr was always adjusted for big-endian, which caused loads of wrong
half of a value in little-endian mode.
2025-08-20 12:37:14 +02:00
Aditi Medhane
948abf1bf5
[PowerPC] Add BCDCOPYSIGN and BCDSETSIGN Instruction Support (#144874)
Support the following BCD format conversion builtins for PowerPC.

- `__builtin_bcdcopysign` – Conversion that returns the decimal value of
the first parameter combined with the sign code of the second parameter.
`
- `__builtin_bcdsetsign` – Conversion that sets the sign code of the
input parameter in packed decimal format.

> Note: This built-in function is valid only when all following
conditions are met:
> -qarch is set to utilize POWER9 technology.
> The bcd.h file is included.

## Prototypes

```c
vector unsigned char __builtin_bcdcopysign(vector unsigned char, vector unsigned char);
vector unsigned char __builtin_bcdsetsign(vector unsigned char, unsigned char);
```

## Usage Details

`__builtin_bcdsetsign`: Returns the packed decimal value of the first
parameter combined with the sign code.
The sign code is set according to the following rules:
- If the packed decimal value of the first parameter is positive, the
following rules apply:
     - If the second parameter is 0, the sign code is set to 0xC.
     - If the second parameter is 1, the sign code is set to 0xF.
- If the packed decimal value of the first parameter is negative, the
sign code is set to 0xD.
> notes:
>     The second parameter can only be 0 or 1.
> You can determine whether a packed decimal value is positive or
negative as follows:
> - Packed decimal values with sign codes **0xA, 0xC, 0xE, or 0xF** are
interpreted as positive.
> - Packed decimal values with sign codes **0xB or 0xD** are interpreted
as negative.

---------

Co-authored-by: Aditi-Medhane <aditi.medhane@ibm.com>
2025-08-19 14:47:27 +05:30
Theodoros Theodoridis
d15b7a83a7
[llvm][LICM] Limit multi-use BOAssociation to FP and Vector (#149829)
Limit the re-association of BOps with multiple users to FP and Vector
arithmetic.
2025-08-14 11:56:55 +01:00
zhijian lin
4936fc5a56
[PowerPC][NFC] Pre-commit test case: use millicode for strlen instead of libcal (#153466)
add test case  to test  lib call are used for the strlen.
2025-08-13 16:34:29 -04:00
Amy Kwan
63cc2e390d
[PowerPC][CodeGen] Expand ISD::AssertNoFPClass for ppc_fp128 (#152357)
780054d3ff18075a6bc433029f336931792b1d2d added support for
`ISD::AssertNoFPClass`.

This ISD node can be used with the `ppc_fp128` type, which is really
just two `f64s` and requires expanding when used with
`ISD::AssertNoFPClass`. Without the support for expanding the result, we
get an assertion because the legalizer does not know how to expand the
results of `ppc_fp128` with `ISD::AssertNoFPClass`.
```
ExpandFloatResult #0: t7: ppcf128 = AssertNoFPClass t5, TargetConstant:i32<3>

LLVM ERROR: Do not know how to expand the result of this operator!
```
Thus, this patch aims to add support for the expand so we no longer
assert.

This fixes #151375.
2025-08-13 15:00:32 -04:00
Philip Reames
4d629f9744
[MIR] Remove std::variant from multiple save/restore point handling [nfc] (#153226)
In review of bbde6b, I had originally proposed that we support the
legacy text format. As review evolved, it bacame clear this had been a
bad idea (too much complexity), but in order to let that patch finally
move forward, I approved the change with the variant. This change undoes
the variant, and updates all the tests to just use the array form.
2025-08-12 11:23:05 -07:00
zhijian lin
598f21e9fc
[PowerPC] need to set CallFrameSize for the pass PPCReduceCRLogicals when insert a new block (#151017)
In the [ [CodeGen] Store call frame size in
MachineBasicBlock](https://reviews.llvm.org/D156113), it mentions When a
basic block has been split in the middle of a call sequence. the call
frame size may not be zero, it need to set the setCallFrameSize for the
new MachineBasicBlock. but in the function `splitMBB(BlockSplitInfo
&BSI)` in the llvm/lib/Target/PowerPC/PPCReduceCRLogicals.cpp , it do
not setCallFrameSzie for the new MachineBasicBlock `NewMBB`, we will
setCallFrameSzie in the patch.

the patch fix the crash mention in
https://github.com/llvm/llvm-project/pull/144594#issuecomment-2993736654
2025-08-12 20:30:28 +09:00
Trevor Gross
00c4be3c9e
[Test] Add and update tests for lrint/llrint (NFC) (#152662)
Many backends are missing either all tests for lrint, or specifically
those for f16, which currently crashes for `softPromoteHalf` targets.
For a number of popular backends, do the following:

* Ensure f16, f32, f64, and f128 are all covered
* Ensure both a 32- and 64-bit target are tested, if relevant
* Add `nounwind` to clean up CFI output
* Add a test covering the above if one did not exist
* Always specify the integer type in intrinsic calls

There are quite a few FIXMEs here, especially for `f16`, but much of
this will be resolved in the near future.
2025-08-12 09:56:51 +09:00
Paul Murphy
5f864560a6
[PowerPC] fix lowering of SPILL_CRBIT on pwr9 and pwr10 (#146424)
If a copy exists between creation of a crbit and a spill, machine-cp
may delete the copy since it seems unaware of the relation between a cr
and crbit. A fix was previously made for the generic ppc64 lowering. It
should be applied to the pwr9 and pwr10 variants too.

Likewise, relax and extend the pwr8 test to verify pwr9 and pwr10
codegen too.

This fixes #143989.
2025-08-08 09:24:22 +02:00
zhijian lin
093439c688
[PowerPC][AIX] Using milicode for memcmp instead of libcall (#147093)
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __memcmp routine is a millicode implementation;
we use millicode for the memcmp function instead of a library call to
improve performance.
2025-08-07 13:13:56 -04:00
Sean Fertile
ab40909810
Implement the trampoline intrinsics and nest parameter for AIX. (#149388)
We can expand the init intrinsic to create a descriptor for the nested
procedure by combining the entry point and TOC pointer from the global
descriptor with the nest argument. The normal indirect call sequence
then calls the nested procedure through the descriptor like all other
calls. Patch also implements support for a nest parameter by mapping it
to gpr 11.
2025-08-06 12:15:27 -04:00
Simon Pilgrim
c4f6d34674
[DAG] getNode - fold (sext (trunc x)) -> x iff the upper bits are already signbits (#151945)
Similar to what we already do for ZERO_EXTEND/ANY_EXTEND patterns.
2025-08-06 14:55:46 +01:00
zhijian lin
23b3203113
[POWERPC] Fixes an error in the handling of the MTVSRBMI instruction for big-endian (#151565)
The patch fixed a bug introduced patch [[PowePC] using MTVSRBMI
instruction instead of constant pool in
power10+](https://github.com/llvm/llvm-project/pull/144084#top).

The issue arose because the layout of vector register elements differs
between little-endian and big-endian modes — specifically, the elements
appear in reverse order. This led to incorrect behavior when loading
constants using MTVSRBMI in big-endian configurations.
2025-08-06 09:36:37 -04:00
Himadhith
1f1b903a64
[NFC][PowerPC] Cleaning up test file and removing redundant front-end test (#151971)
NFC patch to clean up extra lines of code in the file
`llvm/test/CodeGen/PowerPC/check-zero-vector.ll` as the current one has
loop unrolled.
Also removing the file `clang/test/CodeGen/PowerPC/check-zero-vector.c`
as the patch affects only the backend.

Co-authored-by: himadhith <himadhith.v@ibm.com>
2025-08-06 15:59:47 +05:30
Sander de Smalen
ed5bd23867 Revert "Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#134408)"
This reverts commit bae8f1336db6a7f3288a7dcf253f2d484743b257.

Some issues were found:
* https://github.com/llvm/llvm-project/issues/151768
* https://github.com/llvm/llvm-project/issues/151592
* https://github.com/llvm/llvm-project/pull/134408#issuecomment-3145468321
* https://github.com/llvm/llvm-project/issues/151888#issuecomment-3149286820

I'll revert this for the time being while I investigate.
2025-08-04 12:07:30 +00:00
Amy Kwan
f48a8da342
[AIX] Handle arbitrary sized integers when lowering formal arguments passed on the stack (#149351)
When arbitrary sized (non-simple type, or non-power of two types)
integers are passed on the stack, these integers are not handled when
lowering formal arguments on AIX as we always assume we will encounter
simple type integers.

However, it is possible for frontends to generate arbitrary sized
immediate values in IR. Specifically in rustc, it will generate an
integer value in LLVM IR for small structures that are less than a
pointer size, which is done for optimization purposes for the Rust ABI.
For example, if a Rust structure of three characters is passed into
function on the stack,
```
struct my_struct {
  field1: u8,
  field2: u8,
  field3: u8,
}
```
This will generate an `i24` type in LLVM IR.

Currently, it is not obvious for the backend to distinguish an integer
versus something that wasn't an integer to begin with (such as a
struct), and the latter case would not have an extend on the parameter.
Thus, this PR allows us to perform a truncation and extend on integers,
both non-simple and simple types.
2025-08-01 08:01:26 -04:00
Nikita Popov
0a41e7c87e
[LICM] Do not reassociate constant offset GEP (#151492)
LICM tries to reassociate GEPs in order to hoist an invariant GEP.
Currently, it also does this in the case where the GEP has a constant
offset.

This is usually undesirable. From a back-end perspective, constant GEPs
are usually free because they can be folded into addressing modes, so
this just increases register pressume. From a middle-end perspective,
keeping constant offsets last in the chain makes it easier to analyze
the relationship between multiple GEPs on the same base, especially
after CSE.

The worst that can happen here is if we start with something like

```
loop {
   p + 4*x
   p + 4*x + 1
   p + 4*x + 2
   p + 4*x + 3
}
```

And LICM converts it into:

```
p.1 = p + 1
p.2 = p + 2
p.3 = p + 3
loop {
   p + 4*x
   p.1 + 4*x
   p.2 + 4*x
   p.3 + 4*x
}
```

Which is much worse than leaving it for CSE to convert to:
```
loop {
   p2 = p + 4*x
   p2 + 1
   p2 + 2
   p2 + 3
}
```
2025-08-01 09:43:15 +02:00
Sander de Smalen
bae8f1336d
Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#134408)
This tries to reland #123632 (previously reverted by commit
6b1db79887df19bc8e8c946108966aa6021c8b87)

This PR aims to fix coalescing of SUBREG_TO_REG when sub-register
liveness tracking is enabled and this is now the so-manieth
reincarnation of this effort :)

This change is needed in order to enable subreg liveness tracking for 
AArch64, because without the implicit-def, Machine Copy Propagation
would remove a 'redundant' copy because it doesn't realise that the 
top 32-bits of the register are zeroed, which subsequent instructions
rely on. 

Changes compared to previous PR: 

* Rather than updating all instructions that define the source register
(SrcReg) of the SUBREG_TO_REG, this new approach only updates
instructions
that define SrcReg when they dominate the SUBREG_TO_REG. The live-ranges
  are updated accordingly.
2025-07-30 14:42:24 +01:00
Tony Varghese
59c3fe6505
[PowerPC] Exploit xxeval instruction for ternary patterns - ternary(A, X, and(B,C)) (#141733)
## Description
<!--- Title/Description will be Subject/Body of commit message.      -->
<!--- Please be concise and limit the subject line to 50 characters, -->
<!--- and wrap the Description at 72 characters.                     -->
<!--- Describe why this is required, what problem it solves.         -->
Adds support for ternary equivalent operations of the form `ternary(A,
X, and(B,C))` where `X=[xor(B,C)| nor(B,C)| eqv(B,C)| not(B)| not(C)]`.

List of `xxeval` equivalent ternary operations added and the
corresponding `imm` value required:

Ternary Operator| Imm Value
--|--
ternary(A,  xor(B,C), and(B,C))	| 22
ternary(A,  nor(B,C), and(B,C))	| 24
ternary(A,  eqv(B,C), and(B,C))	| 25
ternary(A,  not(C), and(B,C))	| 26
ternary(A,  not(B), and(B,C))	| 28

eg.  `xxeval XT,XA,XB,XC,22` 
- performs `XA ? xor(XB, XC) : and(XB,XC)`and places the result in `XT`.

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-07-29 22:56:05 +05:30
Nikita Popov
fe0dbe0f29
[CodeGen] More consistently expand float ops by default (#150597)
These float operations were expanded for scalar f32/f64/f128, but not
for f16 and more problematically, not for vectors. A small subset of
them was separately set to expand for vectors.

Change these to always expand by default, and adjust targets to mark
these as legal where necessary instead.

This is a much safer default, and avoids unnecessary legalization
failures because a target failed to manually mark them as expand.

Fixes https://github.com/llvm/llvm-project/issues/110753.
Fixes https://github.com/llvm/llvm-project/issues/121390.
2025-07-28 09:46:00 +02:00
Simon Pilgrim
c37942df00
[DAG] visitFREEZE - limit freezing of multiple operands (#149797)
This is a partial revert of #145939 (I've kept the BUILD_VECTOR(FREEZE(UNDEF), FREEZE(UNDEF), elt2, ...) canonicalization) as we're getting reports of infinite loops (#148084).

The issue appears to be due to deep chains of nodes and how visitFREEZE replaces all instances of an operand with a common frozen version - other users of the original frozen node then get added back to the worklist but might no longer be able to confirm a node isn't poison due to recursion depth limits on isGuaranteedNotToBeUndefOrPoison.

The issue still exists with the old implementation but by only allowing a single frozen operand it helps prevent cases of interdependent frozen nodes.

I'm still working on supporting multiple operands as its critical for topological DAG handling but need to get a fix in for trunk and 21.x.

Fixes #148084
2025-07-22 15:40:55 +01:00
Guy David
cb6d1bbfcd
[PowerPC] Test SPE incompatibility with VSX (#147184)
PPCSubtarget is not always initialized, depending on which passes are
running, and in our downstream fork, -enable-matrix is the default
configuration (regardless of whether matrix intrinsics are present in
the IR), which triggers a fatal error in builtins-ppc-fpconstrained.c.
2025-07-17 00:29:38 +03:00
Matt Arsenault
3d50e1f3e8
RuntimeLibcalls: Add some tests for OpenBSD stack protectors (#147888)
7dce16f69dc3e26cb74d5ad38b0648a6f47f9640 removed a libcall for
STACKPROTECTOR_CHECK_FAIL from OpenBSD but added no tests.

Add a basic test copied from RISCV into all the backends on
the OpenBSD page of supported architectures before I potentially
break in in RuntimeLibcalls refactoring.
2025-07-15 15:50:54 +09:00
woruyu
b22b103c3d
[DAG] SelectionDAG::canCreateUndefOrPoison - add ISD::FCOPYSIGN (#148617)
### Summary
This PR resolves https://github.com/llvm/llvm-project/issues/147694
2025-07-14 15:28:52 +01:00
Trevor Gross
0db197adef
[Test] Mark a number of libcall tests nounwind (#148329)
Many tests for floating point libcalls include CFI directives, which
isn't needed for the purpose of these tests. Mark some of the relevant
test functions `nounwind` in order to remove this noise.
2025-07-12 11:57:28 +02:00
Himadhith
f9292c25cf
[NFC][PowerPC] Add test case for lockdown of vector compare greater than support for Zero vector comparisons (#147246)
NFC patch to add testcase for locking down the support of Zero vector
comparisons using the `vcmpgtuh (vector compare greater than unsigned
halfword)` instruction.
Currently `vcmpequh (vector compare equal unsigned halfword)` is in use.

---------

Co-authored-by: himadhith <himadhith.v@ibm.com>
Co-authored-by: Tony Varghese <tonypalampalliyil@gmail.com>
2025-07-11 11:10:22 +05:30
Fangrui Song
68494ae072 [XRay] xray_fn_idx: fix alignment directive
Use `emitValueToAlignment` as the section does not contain code.
`emitCodeAlignment` would lead to ALIGN relocations on RISC-V and
LoongArch with linker relaxation.

In addition, change the alignment to wordsize, sufficient for the
runtime requirement (`XRayFunctionSledIndex`).

Related to #147322
2025-07-08 21:52:53 -07:00
Simon Pilgrim
d3d8ef7e41 [PowerPC] licm-xxsplti.ll - regenerate test checks 2025-07-07 15:19:18 +01:00