4054 Commits

Author SHA1 Message Date
zhijian lin
41647412c6
[PowerPC] Fix an LowerADDSUBO_CARRY error when converting carry bit for usubo_carry (#137809)
In PowerPC, if a borrow occurs during a subtraction, the carry bit is
zero (unset). The carry bit is set if no borrow occurs.

For ISD::USUBO_CARRY, the nodes produce two results: the normal result
of the addition or subtraction, and a boolean value that is 1 if and
only if there is an outgoing carry or borrow.

Therefore, we need to convert a 1 (which indicates a borrow in
ISD::USUBO_CARRY) to 0 to match PowerPC's definition of borrow.
Similarly, we need to convert a 0 (no borrow in ISD::USUBO_CARRY) to 1
for PowerPC.

To perform this conversion, we use XOR 1 instead of XOR
DAG.getAllOnesConstant(DL, CarryOp.getValueType()).

`
2025-04-30 10:39:09 -04:00
Vikram Hegde
53a8b89003
[CodeGen][NewPM] Port "ShrinkWrap" pass to NPM (#129880) 2025-04-30 13:11:17 +05:30
Maryam Moghadas
82a1d5078d
[PowerPC] Add dense math half-precision floating-point outer-product accumulate to DMR instructions (#133272)
This patch adds the following Dense Math Facility 16-bit half-precision
floating-point calculation instructions: dmxvf16gerx2, dmxvf16gerx2pp,
dmxvf16gerx2pn, dmxvf16gerx2np, dmxvf16gerx2nn, pmdmxvf16gerx2,
pmdmxvf16gerx2pp, pmdmxvf16gerx2pn, pmdmxvf16gerx2np, pmdmxvf16gerx2nn,
along with their corresponding intrinsics and tests.
2025-04-28 16:03:10 -04:00
RolandF77
a903c7b7f5
[PowerPC] Intrinsics and tests for dmr insert/extract (#135653)
Add some intrinsics and LIT tests for PPC dmr insert/extract
instructions.
2025-04-24 11:27:22 -04:00
zhijian lin
3e605b1e1d
[NFC] Add a pre-commit test case for #111696 (#136730)
Add a pre- commit test case for Patch
https://github.com/llvm/llvm-project/pull/111696
 
Test ppc-vsx-fma-mutate pass work with
-schedule-ppc-vsx-fma-mutation-early not hoist the instruction
 
`xxspltiw vs2, 1170469888` out the loop.

---------

Co-authored-by: Amy Kwan <amy.kwan1@ibm.com>
2025-04-24 10:37:24 -04:00
Sergei Barannikov
5080a0251f
[CodeGenPrepare] Unfold slow ctpop when used in power-of-two test (#102731)
DAG combiner already does this transformation, but in some cases it does
not have a chance because either CodeGenPrepare or SelectionDAGBuilder
move icmp to a different basic block.

https://alive2.llvm.org/ce/z/ARzh99

Fixes #94829

Pull Request: https://github.com/llvm/llvm-project/pull/102731
2025-04-23 08:54:10 +03:00
zhijian lin
afda4c295b
Reland [SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#136701)
This patch addresses the signed/zero extension of poison by using a
poison value of the extended type instead of a constant zero of the
extended type.
2025-04-22 17:36:41 -04:00
Maryam Moghadas
c40d3a411c
[PowerPC] Add dense math bfloat16 floating-point outer-product accumulate to DMR instructions (#133109)
This patch adds the following Dense Math Facility bfloat16
floating-point calculation instructions: dmxvbf16gerx2,
dmxvbf16gerx2pp,dmxvbf16gerx2pn, dmxvbf16gerx2np, dmxvbf16gerx2nn,
pmdmxvbf16gerx2, pmdmxvbf16gerx2pp, pmdmxvbf16gerx2pn,
pmdmxvbf16gerx2np, pmdmxvbf16gerx2nn, along with their corresponding
intrinsics and tests.
2025-04-21 18:39:44 -04:00
Nico Weber
e18a77cfbe Revert "[SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#122741)"
This reverts commit f12078e72601e7c03e5d66afab034313caf8f791.

Breaks `check-llvm`, see comments on https://github.com/llvm/llvm-project/pull/122741
2025-04-21 10:51:03 -04:00
zhijian lin
f12078e726
[SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#122741)
The PR will fix the issue
https://github.com/llvm/llvm-project/issues/122728

This patch addresses the signed/zero extension of poison by using a
poison value of the extended type instead of a constant zero of the
extended type.
2025-04-21 10:02:21 -04:00
Yingwei Zheng
7e5317139d
[PowerPC] Pre-commit tests for PR130742. NFC. (#135606)
Needed by https://github.com/llvm/llvm-project/pull/130742.
2025-04-17 17:52:49 +08:00
Matt Arsenault
393c783a10 LICM: Avoid looking at use list of constant data (#134690)
The codegen test changes seem incidental. Either way,
sms-grp-order.ll seems to already not hit the original issue.
2025-04-13 17:06:38 +02:00
Douglas Yung
b03aa291b8 Add 'REQUIRES: asserts' to test undef-args.ll added in #135247 to skip test when asserts are not present.
Should fix bot failure: https://lab.llvm.org/buildbot/#/builders/202/builds/601
2025-04-11 02:18:10 +00:00
zhijian lin
5aeeebc1f4
[NFC] add a pre-commit test case for patch 122741 (#135247)
[NFC] add a pre-commit test case for patch [Eliminating li of 0 into arg
registers of unused
arguments](https://github.com/llvm/llvm-project/pull/122741)

The test case tests that extend poison are lower to undef and also test
there are redendunt instrution load 0 into argument registers for unused
arguments.
2025-04-10 16:33:40 -04:00
zhijian lin
378ac572ac
Reland "[SelectionDAG] Introducing a new ISD::POISON SDNode to represent the poison value in the IR." (#135056)
A new ISD::POISON SDNode is introduced to represent the poison value in
the IR, replacing the previous use of ISD::UNDEF
2025-04-10 11:29:14 -04:00
Lei Huang
3479c57466
PowerPC32:PIC: Update to bcl to fix branch prediction mis-predict issue (#134140)
Update `bl` to `bcl 20, 31, .+4` for 32bit PIC code gen so the link
stack is 
not corrupted and cause mis-predict for the branch predictor.

fixes: https://github.com/llvm/llvm-project/issues/128644
2025-04-07 15:50:21 -04:00
Lei Huang
b518242156
[PowerPC] Fix instruction name for dmr insert (#134301) 2025-04-04 15:56:30 -04:00
zhijian lin
1a540c3b8b
[PowerPC] Deprecate uses of ISD::ADDC/ISD::ADDE/ISD::SUBC/ISD::SUBE (#133155)
ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated,
using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO,
UADDO_CARRY, USUBO, USUBO_CARRY in the patch.
2025-04-03 13:22:49 -04:00
Nikita Popov
9356091a98
[GlobalMerge][PPC] Don't merge globals in llvm.metadata section (#131801)
The llvm.metadata section is not emitted and has special semantics. We
should not merge globals in it, similarly to how we already skip merging
of `llvm.xyz` globals.

Fixes https://github.com/llvm/llvm-project/issues/131394.
2025-04-02 10:40:53 +02:00
Fangrui Song
04a67528d3
[MC] Simplify MCBinaryExpr/MCUnaryExpr printing by reducing parentheses (#133674)
The existing pretty printer generates excessive parentheses for
MCBinaryExpr expressions. This update removes unnecessary parentheses
of MCBinaryExpr with +/- operators and MCUnaryExpr.
Since relocatable expressions only use + and -, this change improves
readability in most cases.

Examples:

- (SymA - SymB) + C now prints as SymA - SymB + C.
  This updates the output of -fexperimental-relative-c++-abi-vtables for
  AArch64 and x86 to `.long _ZN1B3fooEv@PLT-_ZTV1B-8`
- expr + (MCTargetExpr) now prints as expr + MCTargetExpr, with this
  change primarily affecting AMDGPUMCExpr.
2025-03-30 22:03:14 -07:00
Tony Varghese
ff9c5c334a
[shrinkwrap] PowerPC's FP register should be honored when processing the save point for prologue. (#129855)
When generating code for functions that have `__builtin_frame_address`
calls and `noinline` attribute, prologue was not emitted correctly
leading to an assertion failure in PowerPC. The issue was due to
improper insertion of prologue for a function that contain llvm
`__builtin_frame_address`.
Shrink-wrap pass computes the save and restore points of a function.
Default points are the entry and exit points of the function. During
shrink-wrapping the frame-pointer was not honored like the stack pointer
and it was considered as a callee-saved register. This change will treat
the FP similar to SP and will insert the prolog on top the instruction
containing FP.

---------

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-03-21 12:55:39 -04:00
RolandF77
0489447b07
[PowerPC] dmr extract update assembly operand order (#132083)
The operand order of the assembly for dmr extract instructions has
changed since they were added. The results now come before the uses.
2025-03-20 13:40:40 -04:00
Lei Huang
ade22fc1d9
[PowerPC] Support conversion between f16 and f128 (#130158)
Enables conversion between f16 and f128.
Expanding on pre-Power9 targets and using HW instructions on Power9.

Fixes https://github.com/llvm/llvm-project/issues/92866
Commandeer of:  https://github.com/llvm/llvm-project/pull/97677

---------

Co-authored-by: esmeyi <esme.yi@ibm.com>
2025-03-19 10:19:57 -04:00
Lei Huang
dbc7665b24
PowerPC: Use REG_SEQUENCE instead of INSERT_SUBREG (#129941)
Update to use REG_SEQUENCE when possible.

This patch only update td pattern to utilize REG_SEQUENCE for
INSERT_SUBREG for cases where it does not produce
a nesting of REG_SEQUENCE. This seem to show some improvement in code
gen for `llvm/test/CodeGen/PowerPC/mmaplus-intrinsics.ll`.

Fixes part of https://github.com/llvm/llvm-project/issues/125502
2025-03-18 13:41:24 -04:00
Tony Varghese
aab4ce4d5e
[NFC][shrinkwrap] Add test point to capture the prologue and epilogue insertion by shrinkwrap pass for powerpc. (#131192)
This is NFC patch to capture the insertion of prologue and epilogue by
`shrinkwrap` pass for Powerpc target for functions that contain llvm
`__builtin_frame_address`.

---------

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-03-18 10:26:44 -04:00
Maryam Moghadas
22c6674f1d
[PowerPC] Add Dense Math binary integer outer-Product accumulate to DMR Instructions (#130791)
This commit adds the following Dense Math Facility integer calculation
instructions: dmxvi8gerx4, dmxvi8gerx4pp, dmxvi8gerx4spp, pmdmxvi8gerx4,
pmdmxvi8gerx4pp, and pmdmxvi8gerx4spp, along with their corresponding
intrinsics and tests.
2025-03-18 09:40:07 -04:00
Hubert Tong
2091547d4c [PPC codegen test] NFC: Fix RUN line; fix DATA checks to match 64-bit 2025-03-15 21:20:22 -04:00
Frederik Harwath
6962cf1700
Rename ExpandLargeFpConvertPass to ExpandFpPass (#131128)
This is meant as a preparation for PR #130988 "[AMDGPU] Implement IR
expansion for frem instruction" which implements the expansion of
another instruction in this pass. The more general name seems more
appropriate given this change and quite reasonable even without it.
2025-03-14 13:11:45 +01:00
RolandF77
4518780c3c
[PowerPC] Add intrinsics and tests for basic Dense Math enablement instructions (#129913)
Add intrinsics and tests for Dense Math basic enablement instructions
dmsetdmrz, dmmr, dmxor.
2025-03-12 12:55:29 -04:00
Daniel Paoliello
16e051f0b9
[win] NFC: Rename EHCatchret to EHCont to allow for EH Continuation targets that aren't catchret instructions (#129953)
This change splits out the renaming and comment updates from #129612 as a non-functional change.
2025-03-06 09:28:44 -08:00
zhijian lin
0303fd2746
[PowerPC] hoist xxspltib out of loop body (#127121)
Fixes https://github.com/llvm/llvm-project/issues/127119

Remove `hasSideEffects` from `xxspltib` since there is no special
restriction specified in the ISA that prevent it from being reordered,
move, CSE, or LICM. Removing this restriction will allow `xxspltib` to
be hoisted out of loop bodies.
2025-03-03 11:14:02 -05:00
Benjamin Maxwell
55fdeccc45
[SDAG][X86] Remove hack needed to avoid missing x87 FPU stack pops (#128055)
If a (two-result) node like `FMODF` or `FFREXP` is expanded to a library
call, where said library has the function prototype like: `float(float,
float*)` -- that is it returns a float from the call and via an output
pointer. The first result of the node maps to the value returned by
value and the second result maps to the value returned via the output
pointer.

If only the second result is used after the expansion, we hit an issue
on x87 targets:

```
// Before expansion: 
t0, t1 = fmodf x
return t1  // t0 is unused
```

Expanded result:
```
ptr = alloca
ch0 = call modf ptr
t0, ch1 = copy_from_reg, ch0 // t0 unused
t1, ch2 = ldr ptr, ch1
return t1
```

So far things are alright, but the DAGCombiner optimizes this to:
```
ptr = alloca
ch0 = call modf ptr
// copy_from_reg optimized out
t1, ch1 = ldr ptr, ch0
return t1
```

On most targets this is fine. The optimized out `copy_from_reg` is
unused and is a NOP. However, x87 uses a floating-point stack, and if
the `copy_from_reg` is optimized out it won't emit a pop needed to
remove the unused result.

The prior solution for this was to attach the chain from the
`copy_from_reg` to the root, which did work, however, the root is not
always available (it's set to null during legalize types). So the
alternate solution in this patch is to replace the `copy_from_reg` with
an `X86ISD::POP_FROM_X87_REG` within the X86 call lowering. This node is
the same as `copy_from_reg` except this node makes it explicit that it
may lower to an x87 FPU stack pop. Optimizations should be more cautious
when handling this node than a normal CopyFromReg to avoid removing a
required FPU stack pop.

```
ptr = alloca
ch0 = call modf ptr
t0, ch1 = pop_from_x87_reg, ch0 // t0 unused
t1, ch2 = ldr ptr, ch1
return t1
```

Using this node ensures a required x87 FPU pop is not removed due to the
DAGCombiner.

This is an alternate solution for #127976.
2025-03-03 12:23:28 +00:00
Akshat Oke
77f44a9642
[CodeGen][NewPM] Port MachineSink to NPM (#115434)
Targets can set the EnableSinkAndFold option in CGPassBuilderOptions for
the NPM pipeline in buildCodeGenPipeline(... &Opts, ...)
2025-03-03 15:49:37 +05:30
RolandF77
a73e591f33
[PowerPC] custom lower v1024i1 load/store (#126969)
Support moving PPC dense math register values to and from storage with
LLVM IR load/store.
2025-02-28 10:25:07 -05:00
Lucas Ramirez
15e295d30a
[MachineScheduler][AMDGPU] Allow scheduling of single-MI regions (#128739)
The MI scheduler skips regions containing a single MI during scheduling.
This can prevent targets that perform multi-stage scheduling and move
MIs between regions during some stages to reason correctly about the
entire IR, since some MIs will not be assigned to a region at the
beginning.

This makes the machine scheduler no longer skip single-MI regions. Only
a few unit tests are affected (mainly those which check for the
scheduler's debug output).
2025-02-27 11:27:07 +01:00
Simon Pilgrim
bae41127e2
[DAG] replaceShuffleOfInsert - add support for shuffle_vector(scalar_to_vector(x),y) -> insert_vector_elt(y,x,c) (#127210)
Begin extending replaceShuffleOfInsert to handle other forms of scalar insertion into a vector.

I've limited this to targets that have Custom/Legal ISD::INSERT_VECTOR_ELT handling for now - although we can probably always fold this before LegalOperations.
2025-02-27 08:41:58 +00:00
Benjamin Maxwell
ea4e19df53
[SDAG] Add missing ppc_fp128 ExpandFloatRes for sincos[pi] (#128514) 2025-02-25 08:56:16 +00:00
zhijian lin
481e1eba3a
[NFC] add a pre-commit test case for patch #127121 that hoists xxsplitib out of loop (#127701)
This is a pre-commit test case for patch
https://github.com/llvm/llvm-project/pull/127121 that hoists xxsplitib
out of loop
2025-02-21 10:29:52 -05:00
Benjamin Maxwell
f178e51747
[SDAG] Add missing ppc_fp128 ExpandFloatRes legalization for modf (#127895)
Should fix: https://lab.llvm.org/buildbot/#/builders/72/builds/8380

(`test_modf_ppcf128` is the test case that needed the additional
legalization)
2025-02-20 09:50:16 +07:00
David Tenty
aa9e519b24 Revert "[PowerPC] Deprecate uses of ISD::ADDC/ISD::ADDE/ISD::SUBC/ISD::SUBE (#116984)"
This reverts commit 7763119c6eb0976e4836f81c9876c49a36d46d73 (leaving the modifications from 03cb46d248b08)..
2025-02-19 09:44:39 -05:00
Nikita Popov
cc539138ac
[CodeGen] Use __extendhfsf2 and __truncsfhf2 by default (#126880)
The standard libcalls for half to float and float to half conversion are
__extendhfsf2 and __truncsfhf2. However, LLVM currently uses
__gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these
libcalls are an ARM-ism and only provided by libgcc on that platform.
compiler-rt always provides both libcalls.

Use the standard libcalls by default, and only use the __gnu libcalls on
ARM.
2025-02-19 10:16:57 +01:00
Craig Topper
256145b4b0
[PowerPC] Use getSignedTargetConstant in SelectOptimalAddrMode. (#127305)
Fixes #127298.
2025-02-15 14:13:32 -08:00
zhijian lin
7763119c6e
[PowerPC] Deprecate uses of ISD::ADDC/ISD::ADDE/ISD::SUBC/ISD::SUBE (#116984)
ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated,
using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO,
UADDO_CARRY, USUBO, USUBO_CARRY in the patch.
2025-02-13 09:09:17 -05:00
Akshat Oke
7b60e03d73
Reland "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703)" (#126684)
`RegisterClassInfo` was supposed to be kept alive between pass runs,
which wasn't being done leading to recomputations increasing the compile
time.

Now the Impl class is a member of the legacy and new passes so that it
is not reconstructed on every pass run.

---------

Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>
2025-02-12 18:54:39 +05:30
Akshat Oke
564b9b7f4d
Revert "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703)" (#126268)
This reverts commit 5aa4979c47255770cac7b557f3e4a980d0131d69 while I
investigate what's causing the compile-time regression.
2025-02-08 15:36:48 +05:30
Matt Arsenault
58a88001f3
PeepholeOpt: Fix looking for def of current copy to coalesce (#125533)
This fixes the handling of subregister extract copies. This
will allow AMDGPU to remove its implementation of
shouldRewriteCopySrc, which exists as a 10 year old workaround
to this bug. peephole-opt-fold-reg-sequence-subreg.mir will
show the expected improvement once the custom implementation
is removed.

The copy coalescing processing here is overly abstracted
from what's actually happening. Previously when visiting
coalescable copy-like instructions, we would parse the
sources one at a time and then pass the def of the root
instruction into findNextSource. This means that the
first thing the new ValueTracker constructed would do
is getVRegDef to find the instruction we are currently
processing. This adds an unnecessary step, placing
a useless entry in the RewriteMap, and required skipping
the no-op case where getNewSource would return the original
source operand. This was a problem since in the case
of a subregister extract, shouldRewriteCopySource would always
say that it is useful to rewrite and the use-def chain walk
would abort, returning the original operand. Move the process
to start looking at the source operand to begin with.

This does not fix the confused handling in the uncoalescable
copy case which is proving to be more difficult. Some currently
handled cases have multiple defs from a single source, and other
handled cases have 0 input operands. It would be simpler if
this was implemented with isCopyLikeInstr, rather than guessing
at the operand structure as it does now.

There are some improvements and some regressions. The
regressions appear to be downstream issues for the most part. One
of the uglier regressions is in PPC, where a sequence of insert_subrgs
is used to build registers. I opened #125502 to use reg_sequence instead,
which may help.

The worst regression is an absurd SPARC testcase using a <251 x fp128>,
which uses a very long chain of insert_subregs.

We need improved subregister handling locally in PeepholeOptimizer,
and other pasess like MachineCSE to fix some of the other regressions.
We should handle subregister composes and folding more indexes
into insert_subreg and reg_sequence.
2025-02-05 23:29:02 +07:00
Christudasan Devadasan
5aa4979c47
CodeGen][NewPM] Port MachineScheduler to NPM. (#125703) 2025-02-05 12:17:59 +05:30
Sergei Barannikov
ff9c041d96
[MachineScheduler] Fix physreg dependencies of ExitSU (#123541)
Providing the correct operand index allows addPhysRegDataDeps to compute
the correct latency.

Pull Request: https://github.com/llvm/llvm-project/pull/123541
2025-02-01 20:40:50 +03:00
Alexander Richardson
213a939a79
[LegalizeDAG] Use Base+Offset instead of Offset+Base for jump tables
This is needed for architectures that actually use strict pointer
arithmetic instead of integers such as AArch64 with FEAT_CPA (see
https://github.com/llvm/llvm-project/pull/105669) or CHERI. Using an
index as the first operand of pointer arithmetic may result in an
invalid output.

While there are quite a few codegen changes here, these only change the
order of registers in add instructions. One MIPS combine had to be
updated to handle the new node order.

Reviewed By: topperc

Pull Request: https://github.com/llvm/llvm-project/pull/125279
2025-01-31 14:05:34 -08:00
Alex Richardson
c7d4ccfd83 [PowerPC] Autogenerate a test checks in preparation for follow-up commit
This just adds more lines that are checked
2025-01-31 12:01:31 -08:00