4138 Commits

Author SHA1 Message Date
Nikita Popov
7ea7ccd24d
[PowerPC][AIX] Specify pointer info and alignment for stack store (#144526)
When lowering call arguments to stack, specify a stack MPI, as well as
the stack alignment, instead of using the defaults (which would be an
unknown location with ABI alignment).

I believe the asm diffs are just changes in scheduling.
2025-06-18 10:50:17 +02:00
Matt Arsenault
7b9d10d2e6
PowerPC: Fix using long double libm functions for f128 intrinsics (#144382)
This wasn't setting the correct libcall names, which default to the
l suffixed libm names.
2025-06-18 13:26:15 +09:00
Matt Arsenault
af49a650e1
PowerPC: Add baseline tests for more f128 libcall handling (#144381)
Some of these incorrectly call the l suffixed version of libm
functions and others assert.
2025-06-18 13:23:17 +09:00
Nikita Popov
76ea1db174 [PowerPC] Split test into assembly and MIR variants (NFC)
So that both can be generated.
2025-06-17 15:16:24 +02:00
Nikita Popov
3451cd5d20 [PowerPC] Regenerate MIR test checks (NFC) 2025-06-17 15:04:16 +02:00
Nikita Popov
49c6235d1f [PowerPC] Regenerate MIR test checks (NFC) 2025-06-17 12:52:00 +02:00
zhijian lin
ea73fc5f07
[PowerPC] fixed mtvsrbmi.ll test case error caused by run the update_llc_test_checks.py (#144075)
fixed mtvsrbmi.ll test case error which caused by run the
update_llc_test_checks.py
2025-06-13 09:38:54 -04:00
zhijian lin
9c2e0bd59c
[PowerPC][NFC] Pre-commit test case for checking whether mtvsrbmi power10 instruction not used (#143956)
Verify whether the generated assembly for the following function
includes the mtvsrbmi instruction.
 vector unsigned char v00FF()
{
 vector unsigned char x = { 0xFF, 0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0 };
 return x;
 }
2025-06-13 09:19:10 -04:00
zhijian lin
85a9f2e148
[PowerPC] enable AtomicExpandImpl::expandAtomicCmpXchg for powerpc (#142395)
In PowerPC, the AtomicCmpXchgInst is lowered to
ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS. However, this node does not handle
the weak attribute of AtomicCmpXchgInst. As a result, when compiling C++
atomic_compare_exchange_weak_explicit, the generated assembly includes a
"reservation lost" loop — i.e., it branches back and retries if the
stwcx. (store-conditional) fails. This differs from GCC’s codegen, which
does not include that loop for weak compare-exchange.

Since PowerPC uses LL/SC-style atomic instructions, the patch enables
AtomicExpandImpl::expandAtomicCmpXchg for PowerPC. With this, the weak
attribute is properly respected, and the "reservation lost" loop is
removed for weak operations.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-06-13 09:14:48 -04:00
Fangrui Song
28bda77843
Introduce MCAsmInfo::UsesSetToEquateSymbol and prefer = to .set
Introduce MCAsmInfo::UsesSetToEquateSymbol to control the preferred
syntax for symbol equating. We now favor the more readable and common
`symbol = expression` syntax over `.set`. This aligns with pre- https://reviews.llvm.org/D44256 behavior.

On Apple platforms, this resolves a clang -S vs -c behavior difference (resolves #104623).

For targets whose = support is unconfirmed, UsesSetToEquateSymbol is set to false.
This also minimizes test updates.

Pull Request: https://github.com/llvm/llvm-project/pull/142289
2025-06-11 22:19:31 -07:00
Tony Varghese
7a0c9f607a
[NFC][PowerPC] Pre-commit test case for exploitation of xxeval for the pattern ternary(A,X,or(B,C)) (#143693)
Pre-commit test case for exploitation of `xxeval` for ternary operations
of the pattern `ternary(A,X,or(B,C))`.
Exploitation of `xxeval` to be added later.

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-06-11 14:26:15 -04:00
RolandF77
5d6218d311
[PowerPC] extend smaller splats into bigger splats (with fix) (#142194)
For pwr9, xxspltib is a byte splat with a range -128 to 127 - it can be
used with a following vector extend sign to make splats of i16, i32, or
i64 element size. For pwr8, vspltisw with a following vector extend sign
can be used to make splats of i64 elements in the range -16 to 15.

Add check for P8 to make sure the 64-bit vector ops are there.
2025-06-09 14:01:38 -04:00
Lei Huang
649020c680
[PowerPC] Change default for auto gen stxvp for cpu=future (#142826)
For cpu=future, we want to auto generate stxvp instructions by default.
2025-06-09 12:34:50 -04:00
zhijian lin
a91b0d2780
[PowerPC] hoist xxspltiw instruction out of the loop with FMA mutation pass. (#111696)
Summary: 
   
The patch fixes the issue [[PowerPC] missing VSX FMA Mutation optimize
in some case for option -schedule-ppc-vsx-fma-mutation-early
#111906](https://github.com/llvm/llvm-project/issues/111906)
   
In certain cases, the Register Coalescer pass—which eliminates COPY
instructions—can interfere with the PowerPC VSX FMA Mutation pass.
Specifically, it can prevent the mutation of a COPY adjacent to an
XSMADDADP into a single XSMADDMDP instruction. As a result, the xxspltiw
instruction is not hoisted out of the loop as expected, leading to
missed optimization opportunities.

To address this, the patch ensures that the `VSX FMA Mutation` pass runs
before the `Register Coalescer` pass when the
-schedule-ppc-vsx-fma-mutation-early option is enabled.
2025-06-05 09:41:51 -04:00
Nikita Popov
d74831efeb Revert "[SDAG] Fix fmaximum legalization errors (#142170)"
This reverts commit 58cc1675ec7b4aa5bc2dab56180cb7af1b23ade5.

I also made the incorrect assumption that we know both values are
+/-0.0 here as well. Revert for now.
2025-06-04 14:35:30 +02:00
Nikita Popov
42605b8aa3 Revert "[SelectionDAG] Avoid one comparison when legalizing fmaximum (#142732)"
This reverts commit 54da543a14da6dd0e594875241494949cb659b08.

I made a logic error here with the assumption that both values
are known to be +/-0.0.
2025-06-04 14:22:19 +02:00
Nikita Popov
54da543a14
[SelectionDAG] Avoid one comparison when legalizing fmaximum (#142732)
When ordering signed zero, only check the sign of one of the values. We
already know at this point that both values must be +/-0.0, so it is
sufficient to check one of them to correctly order them.

For example, for fmaximum, if we know LHS is `+0.0` then we can always
select LHS, value of RHS does not matter. If LHS is `-0.0` we can always
select RHS, value of RHS doesn't matter.
2025-06-04 10:41:30 +02:00
Tony Varghese
52cf598c78
[NFC][PowerPC] Add testcases for locking down the xxeval instruction support for ternary operators (#141601)
NFC patch to add testcases for locking down the support of ternary
operators using the `xxsel` instructions. Currently ternary operators
are supoprted by emitting `xxsel` instructions instead of `xxeval`.

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-06-03 11:07:58 -04:00
Simon Tatham
56acb06bc6
[ARM,AArch64] Don't put BTI at asm goto branch targets (#141562)
In 'asm goto' statements ('callbr' in LLVM IR), you can specify one or
more labels / basic blocks in the containing function which the assembly
code might jump to. If you're also compiling with branch target
enforcement via BTI, then previously listing a basic block as a possible
jump destination of an asm goto would cause a BTI instruction to be
placed at the start of the block, in case the assembly code used an
_indirect_ branch instruction (i.e. to a destination address read from a
register) to jump to that location. Now it doesn't do that any more:
branches to destination labels from the assembly code are assumed to be
direct branches (to a relative offset encoded in the instruction), which
don't require a BTI at their destination.

This change was proposed in https://discourse.llvm.org/t/85845 and there
seemed to be no disagreement. The rationale is:

1. it brings clang's handling of asm goto in Arm and AArch64 in line
with gcc's, which didn't generate BTIs at the target labels in the first
place.

2. it improves performance in the Linux kernel, which uses a lot of 'asm
goto' in which the assembly language just contains a NOP, and the
label's address is saved elsewhere to let the kernel self-modify at run
time to swap between the original NOP and a direct branch to the label.
This allows hot code paths to be instrumented for debugging, at only the
cost of a NOP when the instrumentation is turned off, instead of the
larger cost of an indirect branch. In this situation a BTI is
unnecessary (if the branch happens it's direct), and since the code
paths are hot, also a noticeable performance hit.

Implementation:

`SelectionDAGBuilder::visitCallBr` is the place where 'asm goto' target
labels are handled. It calls `setIsInlineAsmBrIndirectTarget()` on each
target `MachineBasicBlock`. Previously it also called
`setMachineBlockAddressTaken()`, which made `hasAddressTaken()` return
true, which caused a BTI to be added in the Arm backends.

Now `visitCallBr` doesn't call `setMachineBlockAddressTaken()` any more
on asm goto targets, but `hasAddressTaken()` also checks the flag set by
`setIsInlineAsmBrIndirectTarget()`. So call sites that were using
`hasAddressTaken()` don't need to be modified. But the Arm backends
don't call `hasAddressTaken()` any more: instead they test two more
specific query functions that cover all the reasons `hasAddressTaken()`
might have returned true _except_ being an asm goto target.

Testing:

The new test `AArch64/callbr-asm-label-bti.ll` is testing the actual
change, where it expects not to see a `bti` instruction after
`[[LABEL]]`. The rest of the test changes are all churn, due to the
flags on basic blocks changing. Actual output code hasn't changed in any
of the existing tests, only comments and diagnostics.

Further work:

`RISCVIndirectBranchTracking.cpp` and `X86IndirectBranchTracking.cpp`
also call `hasAddressTaken()` in a way that might benefit from using the
same more specific check I've put in `ARMBranchTargets.cpp` and
`AArch64BranchTargets.cpp`. But I'm not sure of that, so in this commit
I've only changed the Arm backends, and left those alone.
2025-06-03 08:44:13 +01:00
Lei Huang
05f1ca7d17
[PowerPC] Spill and restore DMR register (#141530)
Add spilling and restoring of DMR registers.
2025-06-02 13:11:39 -04:00
Yingwei Zheng
1984c7539e
[ValueTracking] Do not use FMF from fcmp (#142266)
This patch introduces an FMF parameter for
`matchDecomposedSelectPattern` to pass FMF flags from select, instead of
fcmp.

Closes https://github.com/llvm/llvm-project/issues/137998.
Closes https://github.com/llvm/llvm-project/issues/141017.
2025-06-02 18:21:14 +08:00
Nikita Popov
58cc1675ec
[SDAG] Fix fmaximum legalization errors (#142170)
FMAXIMUM is currently legalized via IS_FPCLASS for the signed zero
handling. This is problematic, because it assumes the equivalent integer
type is legal. Many targets have legal fp128, but illegal i128, so this
results in legalization failures.

Fix this by replacing IS_FPCLASS with checking the bitcast to integer
instead. In that case it is sufficient to use any legal integer type, as
we're just interested in the sign bit. This can be obtained via a stack
temporary cast. There is existing FloatSignAsInt functionality used for
legalization of FABS and similar we can use for this purpose.

Fixes https://github.com/llvm/llvm-project/issues/139380.
Fixes https://github.com/llvm/llvm-project/issues/139381.
Fixes https://github.com/llvm/llvm-project/issues/140445.
2025-06-02 10:14:33 +02:00
Hubert Tong
8f486254e4 Revert "[PowerPC] extend smaller splats into bigger splats (#141282)"
The subject commit causes the build to ICE on AIX:
https://lab.llvm.org/buildbot/#/builders/64/builds/3890/steps/5/logs/stdio

This reverts commit 7fa365843d9f99e75c38a6107e8511b324950e74.
2025-05-29 01:10:55 -04:00
RolandF77
7fa365843d
[PowerPC] extend smaller splats into bigger splats (#141282)
For pwr9, xxspltib is a byte splat with a range -128 to 127 - it can be
used with a following vector extend sign to make splats of i16, i32, or
i64 element size. For pwr8, vspltisw with a following vector extend sign
can be used to make splats of i64 elements in the range -16 to 15.
2025-05-28 10:11:28 -04:00
Ruiling, Song
3e47d8deba
MachineScheduler: Reset next cluster candidate for each node (#139513)
When a node is picked, we should reset its next cluster candidate to
null before releasing its successors/predecessors.
2025-05-28 14:53:46 +08:00
Nico Weber
04a96c6900 [PowerPC] Attempt to fix test added in #141263 2025-05-27 17:40:35 -04:00
zhijian lin
7b1a6a8a90
[PowerPC ][NFC] Add a test case for the function atomic_compare_exchange_weak (#141263)
Add test case to test the generated asm of the function
atomic_compare_exchange_weak
2025-05-27 16:36:39 -04:00
Jon Roelofs
714096c132
[LLVM] Skip dumping inline SDag children (#141359)
If they're simple enough to render inline, we don't need to dump them
again in the recursive walk.
2025-05-26 19:40:01 -07:00
Lei Huang
4b09eedf7b
[PowerPC] Update DMF VSX ACC data transfer instructions (#138897)
For cpu=future, acc registers no longer overlap VSRs and are prefixed
with `dm`. The original, xxmfacc/xxmtacc instructions are now extended
menemonics to it's dm* equivalents.
2025-05-26 12:47:12 -04:00
Shimin Cui
b1017a4b84
Use getSignedTargetConstant for offset (#141149)
This is to fix an assertion failure with PeepholePPC64. The load/store
offset can be negative. A reduced case from one of our failures is added
as well.
2025-05-26 11:08:13 -04:00
Maryam Moghadas
a54300b32c
[PowerPC] Add load/store support for v2048i1 and DMF cryptography instructions (#136145)
This commit adds support for loading and storing v2048i1 DMR pairs and
introduces Dense Math Facility cryptography instructions: DMSHA2HASH,
DMSHA3HASH, and DMXXSHAPAD, along with their corresponding intrinsics
and tests.
2025-05-26 10:59:35 -04:00
RolandF77
bbca78fbcb
[PowerPC] vector shift word/double by element size - 1 use all ones (#139794)
Vector shift word or double requires a shift amount vector of 31 or 63
which is too big for splat immediate and requires a multi-instruction
sequence. However the PPC instructions only use 5 or 6 bits of the shift
amount vector elements so an all ones mask, which we can generate
efficiently, works.
2025-05-23 10:49:37 -04:00
Jay Foad
1f0c178411 Fix typo "redudant" 2025-05-22 15:42:22 +01:00
RolandF77
99f0309669
[PowerPC] catch v2i64 shift left by 1 is add case (#138772)
Catch missing case in PPC BE for v2i64 x << 1 and generate x + x.
2025-05-13 11:26:46 -04:00
zhijian lin
41647412c6
[PowerPC] Fix an LowerADDSUBO_CARRY error when converting carry bit for usubo_carry (#137809)
In PowerPC, if a borrow occurs during a subtraction, the carry bit is
zero (unset). The carry bit is set if no borrow occurs.

For ISD::USUBO_CARRY, the nodes produce two results: the normal result
of the addition or subtraction, and a boolean value that is 1 if and
only if there is an outgoing carry or borrow.

Therefore, we need to convert a 1 (which indicates a borrow in
ISD::USUBO_CARRY) to 0 to match PowerPC's definition of borrow.
Similarly, we need to convert a 0 (no borrow in ISD::USUBO_CARRY) to 1
for PowerPC.

To perform this conversion, we use XOR 1 instead of XOR
DAG.getAllOnesConstant(DL, CarryOp.getValueType()).

`
2025-04-30 10:39:09 -04:00
Vikram Hegde
53a8b89003
[CodeGen][NewPM] Port "ShrinkWrap" pass to NPM (#129880) 2025-04-30 13:11:17 +05:30
Maryam Moghadas
82a1d5078d
[PowerPC] Add dense math half-precision floating-point outer-product accumulate to DMR instructions (#133272)
This patch adds the following Dense Math Facility 16-bit half-precision
floating-point calculation instructions: dmxvf16gerx2, dmxvf16gerx2pp,
dmxvf16gerx2pn, dmxvf16gerx2np, dmxvf16gerx2nn, pmdmxvf16gerx2,
pmdmxvf16gerx2pp, pmdmxvf16gerx2pn, pmdmxvf16gerx2np, pmdmxvf16gerx2nn,
along with their corresponding intrinsics and tests.
2025-04-28 16:03:10 -04:00
RolandF77
a903c7b7f5
[PowerPC] Intrinsics and tests for dmr insert/extract (#135653)
Add some intrinsics and LIT tests for PPC dmr insert/extract
instructions.
2025-04-24 11:27:22 -04:00
zhijian lin
3e605b1e1d
[NFC] Add a pre-commit test case for #111696 (#136730)
Add a pre- commit test case for Patch
https://github.com/llvm/llvm-project/pull/111696
 
Test ppc-vsx-fma-mutate pass work with
-schedule-ppc-vsx-fma-mutation-early not hoist the instruction
 
`xxspltiw vs2, 1170469888` out the loop.

---------

Co-authored-by: Amy Kwan <amy.kwan1@ibm.com>
2025-04-24 10:37:24 -04:00
Sergei Barannikov
5080a0251f
[CodeGenPrepare] Unfold slow ctpop when used in power-of-two test (#102731)
DAG combiner already does this transformation, but in some cases it does
not have a chance because either CodeGenPrepare or SelectionDAGBuilder
move icmp to a different basic block.

https://alive2.llvm.org/ce/z/ARzh99

Fixes #94829

Pull Request: https://github.com/llvm/llvm-project/pull/102731
2025-04-23 08:54:10 +03:00
zhijian lin
afda4c295b
Reland [SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#136701)
This patch addresses the signed/zero extension of poison by using a
poison value of the extended type instead of a constant zero of the
extended type.
2025-04-22 17:36:41 -04:00
Maryam Moghadas
c40d3a411c
[PowerPC] Add dense math bfloat16 floating-point outer-product accumulate to DMR instructions (#133109)
This patch adds the following Dense Math Facility bfloat16
floating-point calculation instructions: dmxvbf16gerx2,
dmxvbf16gerx2pp,dmxvbf16gerx2pn, dmxvbf16gerx2np, dmxvbf16gerx2nn,
pmdmxvbf16gerx2, pmdmxvbf16gerx2pp, pmdmxvbf16gerx2pn,
pmdmxvbf16gerx2np, pmdmxvbf16gerx2nn, along with their corresponding
intrinsics and tests.
2025-04-21 18:39:44 -04:00
Nico Weber
e18a77cfbe Revert "[SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#122741)"
This reverts commit f12078e72601e7c03e5d66afab034313caf8f791.

Breaks `check-llvm`, see comments on https://github.com/llvm/llvm-project/pull/122741
2025-04-21 10:51:03 -04:00
zhijian lin
f12078e726
[SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#122741)
The PR will fix the issue
https://github.com/llvm/llvm-project/issues/122728

This patch addresses the signed/zero extension of poison by using a
poison value of the extended type instead of a constant zero of the
extended type.
2025-04-21 10:02:21 -04:00
Yingwei Zheng
7e5317139d
[PowerPC] Pre-commit tests for PR130742. NFC. (#135606)
Needed by https://github.com/llvm/llvm-project/pull/130742.
2025-04-17 17:52:49 +08:00
Matt Arsenault
393c783a10 LICM: Avoid looking at use list of constant data (#134690)
The codegen test changes seem incidental. Either way,
sms-grp-order.ll seems to already not hit the original issue.
2025-04-13 17:06:38 +02:00
Douglas Yung
b03aa291b8 Add 'REQUIRES: asserts' to test undef-args.ll added in #135247 to skip test when asserts are not present.
Should fix bot failure: https://lab.llvm.org/buildbot/#/builders/202/builds/601
2025-04-11 02:18:10 +00:00
zhijian lin
5aeeebc1f4
[NFC] add a pre-commit test case for patch 122741 (#135247)
[NFC] add a pre-commit test case for patch [Eliminating li of 0 into arg
registers of unused
arguments](https://github.com/llvm/llvm-project/pull/122741)

The test case tests that extend poison are lower to undef and also test
there are redendunt instrution load 0 into argument registers for unused
arguments.
2025-04-10 16:33:40 -04:00
zhijian lin
378ac572ac
Reland "[SelectionDAG] Introducing a new ISD::POISON SDNode to represent the poison value in the IR." (#135056)
A new ISD::POISON SDNode is introduced to represent the poison value in
the IR, replacing the previous use of ISD::UNDEF
2025-04-10 11:29:14 -04:00
Lei Huang
3479c57466
PowerPC32:PIC: Update to bcl to fix branch prediction mis-predict issue (#134140)
Update `bl` to `bcl 20, 31, .+4` for 32bit PIC code gen so the link
stack is 
not corrupted and cause mis-predict for the branch predictor.

fixes: https://github.com/llvm/llvm-project/issues/128644
2025-04-07 15:50:21 -04:00