4037 Commits

Author SHA1 Message Date
zhijian lin
1a540c3b8b
[PowerPC] Deprecate uses of ISD::ADDC/ISD::ADDE/ISD::SUBC/ISD::SUBE (#133155)
ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated,
using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO,
UADDO_CARRY, USUBO, USUBO_CARRY in the patch.
2025-04-03 13:22:49 -04:00
Nikita Popov
9356091a98
[GlobalMerge][PPC] Don't merge globals in llvm.metadata section (#131801)
The llvm.metadata section is not emitted and has special semantics. We
should not merge globals in it, similarly to how we already skip merging
of `llvm.xyz` globals.

Fixes https://github.com/llvm/llvm-project/issues/131394.
2025-04-02 10:40:53 +02:00
Fangrui Song
04a67528d3
[MC] Simplify MCBinaryExpr/MCUnaryExpr printing by reducing parentheses (#133674)
The existing pretty printer generates excessive parentheses for
MCBinaryExpr expressions. This update removes unnecessary parentheses
of MCBinaryExpr with +/- operators and MCUnaryExpr.
Since relocatable expressions only use + and -, this change improves
readability in most cases.

Examples:

- (SymA - SymB) + C now prints as SymA - SymB + C.
  This updates the output of -fexperimental-relative-c++-abi-vtables for
  AArch64 and x86 to `.long _ZN1B3fooEv@PLT-_ZTV1B-8`
- expr + (MCTargetExpr) now prints as expr + MCTargetExpr, with this
  change primarily affecting AMDGPUMCExpr.
2025-03-30 22:03:14 -07:00
Tony Varghese
ff9c5c334a
[shrinkwrap] PowerPC's FP register should be honored when processing the save point for prologue. (#129855)
When generating code for functions that have `__builtin_frame_address`
calls and `noinline` attribute, prologue was not emitted correctly
leading to an assertion failure in PowerPC. The issue was due to
improper insertion of prologue for a function that contain llvm
`__builtin_frame_address`.
Shrink-wrap pass computes the save and restore points of a function.
Default points are the entry and exit points of the function. During
shrink-wrapping the frame-pointer was not honored like the stack pointer
and it was considered as a callee-saved register. This change will treat
the FP similar to SP and will insert the prolog on top the instruction
containing FP.

---------

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-03-21 12:55:39 -04:00
RolandF77
0489447b07
[PowerPC] dmr extract update assembly operand order (#132083)
The operand order of the assembly for dmr extract instructions has
changed since they were added. The results now come before the uses.
2025-03-20 13:40:40 -04:00
Lei Huang
ade22fc1d9
[PowerPC] Support conversion between f16 and f128 (#130158)
Enables conversion between f16 and f128.
Expanding on pre-Power9 targets and using HW instructions on Power9.

Fixes https://github.com/llvm/llvm-project/issues/92866
Commandeer of:  https://github.com/llvm/llvm-project/pull/97677

---------

Co-authored-by: esmeyi <esme.yi@ibm.com>
2025-03-19 10:19:57 -04:00
Lei Huang
dbc7665b24
PowerPC: Use REG_SEQUENCE instead of INSERT_SUBREG (#129941)
Update to use REG_SEQUENCE when possible.

This patch only update td pattern to utilize REG_SEQUENCE for
INSERT_SUBREG for cases where it does not produce
a nesting of REG_SEQUENCE. This seem to show some improvement in code
gen for `llvm/test/CodeGen/PowerPC/mmaplus-intrinsics.ll`.

Fixes part of https://github.com/llvm/llvm-project/issues/125502
2025-03-18 13:41:24 -04:00
Tony Varghese
aab4ce4d5e
[NFC][shrinkwrap] Add test point to capture the prologue and epilogue insertion by shrinkwrap pass for powerpc. (#131192)
This is NFC patch to capture the insertion of prologue and epilogue by
`shrinkwrap` pass for Powerpc target for functions that contain llvm
`__builtin_frame_address`.

---------

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-03-18 10:26:44 -04:00
Maryam Moghadas
22c6674f1d
[PowerPC] Add Dense Math binary integer outer-Product accumulate to DMR Instructions (#130791)
This commit adds the following Dense Math Facility integer calculation
instructions: dmxvi8gerx4, dmxvi8gerx4pp, dmxvi8gerx4spp, pmdmxvi8gerx4,
pmdmxvi8gerx4pp, and pmdmxvi8gerx4spp, along with their corresponding
intrinsics and tests.
2025-03-18 09:40:07 -04:00
Hubert Tong
2091547d4c [PPC codegen test] NFC: Fix RUN line; fix DATA checks to match 64-bit 2025-03-15 21:20:22 -04:00
Frederik Harwath
6962cf1700
Rename ExpandLargeFpConvertPass to ExpandFpPass (#131128)
This is meant as a preparation for PR #130988 "[AMDGPU] Implement IR
expansion for frem instruction" which implements the expansion of
another instruction in this pass. The more general name seems more
appropriate given this change and quite reasonable even without it.
2025-03-14 13:11:45 +01:00
RolandF77
4518780c3c
[PowerPC] Add intrinsics and tests for basic Dense Math enablement instructions (#129913)
Add intrinsics and tests for Dense Math basic enablement instructions
dmsetdmrz, dmmr, dmxor.
2025-03-12 12:55:29 -04:00
Daniel Paoliello
16e051f0b9
[win] NFC: Rename EHCatchret to EHCont to allow for EH Continuation targets that aren't catchret instructions (#129953)
This change splits out the renaming and comment updates from #129612 as a non-functional change.
2025-03-06 09:28:44 -08:00
zhijian lin
0303fd2746
[PowerPC] hoist xxspltib out of loop body (#127121)
Fixes https://github.com/llvm/llvm-project/issues/127119

Remove `hasSideEffects` from `xxspltib` since there is no special
restriction specified in the ISA that prevent it from being reordered,
move, CSE, or LICM. Removing this restriction will allow `xxspltib` to
be hoisted out of loop bodies.
2025-03-03 11:14:02 -05:00
Benjamin Maxwell
55fdeccc45
[SDAG][X86] Remove hack needed to avoid missing x87 FPU stack pops (#128055)
If a (two-result) node like `FMODF` or `FFREXP` is expanded to a library
call, where said library has the function prototype like: `float(float,
float*)` -- that is it returns a float from the call and via an output
pointer. The first result of the node maps to the value returned by
value and the second result maps to the value returned via the output
pointer.

If only the second result is used after the expansion, we hit an issue
on x87 targets:

```
// Before expansion: 
t0, t1 = fmodf x
return t1  // t0 is unused
```

Expanded result:
```
ptr = alloca
ch0 = call modf ptr
t0, ch1 = copy_from_reg, ch0 // t0 unused
t1, ch2 = ldr ptr, ch1
return t1
```

So far things are alright, but the DAGCombiner optimizes this to:
```
ptr = alloca
ch0 = call modf ptr
// copy_from_reg optimized out
t1, ch1 = ldr ptr, ch0
return t1
```

On most targets this is fine. The optimized out `copy_from_reg` is
unused and is a NOP. However, x87 uses a floating-point stack, and if
the `copy_from_reg` is optimized out it won't emit a pop needed to
remove the unused result.

The prior solution for this was to attach the chain from the
`copy_from_reg` to the root, which did work, however, the root is not
always available (it's set to null during legalize types). So the
alternate solution in this patch is to replace the `copy_from_reg` with
an `X86ISD::POP_FROM_X87_REG` within the X86 call lowering. This node is
the same as `copy_from_reg` except this node makes it explicit that it
may lower to an x87 FPU stack pop. Optimizations should be more cautious
when handling this node than a normal CopyFromReg to avoid removing a
required FPU stack pop.

```
ptr = alloca
ch0 = call modf ptr
t0, ch1 = pop_from_x87_reg, ch0 // t0 unused
t1, ch2 = ldr ptr, ch1
return t1
```

Using this node ensures a required x87 FPU pop is not removed due to the
DAGCombiner.

This is an alternate solution for #127976.
2025-03-03 12:23:28 +00:00
Akshat Oke
77f44a9642
[CodeGen][NewPM] Port MachineSink to NPM (#115434)
Targets can set the EnableSinkAndFold option in CGPassBuilderOptions for
the NPM pipeline in buildCodeGenPipeline(... &Opts, ...)
2025-03-03 15:49:37 +05:30
RolandF77
a73e591f33
[PowerPC] custom lower v1024i1 load/store (#126969)
Support moving PPC dense math register values to and from storage with
LLVM IR load/store.
2025-02-28 10:25:07 -05:00
Lucas Ramirez
15e295d30a
[MachineScheduler][AMDGPU] Allow scheduling of single-MI regions (#128739)
The MI scheduler skips regions containing a single MI during scheduling.
This can prevent targets that perform multi-stage scheduling and move
MIs between regions during some stages to reason correctly about the
entire IR, since some MIs will not be assigned to a region at the
beginning.

This makes the machine scheduler no longer skip single-MI regions. Only
a few unit tests are affected (mainly those which check for the
scheduler's debug output).
2025-02-27 11:27:07 +01:00
Simon Pilgrim
bae41127e2
[DAG] replaceShuffleOfInsert - add support for shuffle_vector(scalar_to_vector(x),y) -> insert_vector_elt(y,x,c) (#127210)
Begin extending replaceShuffleOfInsert to handle other forms of scalar insertion into a vector.

I've limited this to targets that have Custom/Legal ISD::INSERT_VECTOR_ELT handling for now - although we can probably always fold this before LegalOperations.
2025-02-27 08:41:58 +00:00
Benjamin Maxwell
ea4e19df53
[SDAG] Add missing ppc_fp128 ExpandFloatRes for sincos[pi] (#128514) 2025-02-25 08:56:16 +00:00
zhijian lin
481e1eba3a
[NFC] add a pre-commit test case for patch #127121 that hoists xxsplitib out of loop (#127701)
This is a pre-commit test case for patch
https://github.com/llvm/llvm-project/pull/127121 that hoists xxsplitib
out of loop
2025-02-21 10:29:52 -05:00
Benjamin Maxwell
f178e51747
[SDAG] Add missing ppc_fp128 ExpandFloatRes legalization for modf (#127895)
Should fix: https://lab.llvm.org/buildbot/#/builders/72/builds/8380

(`test_modf_ppcf128` is the test case that needed the additional
legalization)
2025-02-20 09:50:16 +07:00
David Tenty
aa9e519b24 Revert "[PowerPC] Deprecate uses of ISD::ADDC/ISD::ADDE/ISD::SUBC/ISD::SUBE (#116984)"
This reverts commit 7763119c6eb0976e4836f81c9876c49a36d46d73 (leaving the modifications from 03cb46d248b08)..
2025-02-19 09:44:39 -05:00
Nikita Popov
cc539138ac
[CodeGen] Use __extendhfsf2 and __truncsfhf2 by default (#126880)
The standard libcalls for half to float and float to half conversion are
__extendhfsf2 and __truncsfhf2. However, LLVM currently uses
__gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these
libcalls are an ARM-ism and only provided by libgcc on that platform.
compiler-rt always provides both libcalls.

Use the standard libcalls by default, and only use the __gnu libcalls on
ARM.
2025-02-19 10:16:57 +01:00
Craig Topper
256145b4b0
[PowerPC] Use getSignedTargetConstant in SelectOptimalAddrMode. (#127305)
Fixes #127298.
2025-02-15 14:13:32 -08:00
zhijian lin
7763119c6e
[PowerPC] Deprecate uses of ISD::ADDC/ISD::ADDE/ISD::SUBC/ISD::SUBE (#116984)
ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated,
using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO,
UADDO_CARRY, USUBO, USUBO_CARRY in the patch.
2025-02-13 09:09:17 -05:00
Akshat Oke
7b60e03d73
Reland "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703)" (#126684)
`RegisterClassInfo` was supposed to be kept alive between pass runs,
which wasn't being done leading to recomputations increasing the compile
time.

Now the Impl class is a member of the legacy and new passes so that it
is not reconstructed on every pass run.

---------

Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>
2025-02-12 18:54:39 +05:30
Akshat Oke
564b9b7f4d
Revert "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703)" (#126268)
This reverts commit 5aa4979c47255770cac7b557f3e4a980d0131d69 while I
investigate what's causing the compile-time regression.
2025-02-08 15:36:48 +05:30
Matt Arsenault
58a88001f3
PeepholeOpt: Fix looking for def of current copy to coalesce (#125533)
This fixes the handling of subregister extract copies. This
will allow AMDGPU to remove its implementation of
shouldRewriteCopySrc, which exists as a 10 year old workaround
to this bug. peephole-opt-fold-reg-sequence-subreg.mir will
show the expected improvement once the custom implementation
is removed.

The copy coalescing processing here is overly abstracted
from what's actually happening. Previously when visiting
coalescable copy-like instructions, we would parse the
sources one at a time and then pass the def of the root
instruction into findNextSource. This means that the
first thing the new ValueTracker constructed would do
is getVRegDef to find the instruction we are currently
processing. This adds an unnecessary step, placing
a useless entry in the RewriteMap, and required skipping
the no-op case where getNewSource would return the original
source operand. This was a problem since in the case
of a subregister extract, shouldRewriteCopySource would always
say that it is useful to rewrite and the use-def chain walk
would abort, returning the original operand. Move the process
to start looking at the source operand to begin with.

This does not fix the confused handling in the uncoalescable
copy case which is proving to be more difficult. Some currently
handled cases have multiple defs from a single source, and other
handled cases have 0 input operands. It would be simpler if
this was implemented with isCopyLikeInstr, rather than guessing
at the operand structure as it does now.

There are some improvements and some regressions. The
regressions appear to be downstream issues for the most part. One
of the uglier regressions is in PPC, where a sequence of insert_subrgs
is used to build registers. I opened #125502 to use reg_sequence instead,
which may help.

The worst regression is an absurd SPARC testcase using a <251 x fp128>,
which uses a very long chain of insert_subregs.

We need improved subregister handling locally in PeepholeOptimizer,
and other pasess like MachineCSE to fix some of the other regressions.
We should handle subregister composes and folding more indexes
into insert_subreg and reg_sequence.
2025-02-05 23:29:02 +07:00
Christudasan Devadasan
5aa4979c47
CodeGen][NewPM] Port MachineScheduler to NPM. (#125703) 2025-02-05 12:17:59 +05:30
Sergei Barannikov
ff9c041d96
[MachineScheduler] Fix physreg dependencies of ExitSU (#123541)
Providing the correct operand index allows addPhysRegDataDeps to compute
the correct latency.

Pull Request: https://github.com/llvm/llvm-project/pull/123541
2025-02-01 20:40:50 +03:00
Alexander Richardson
213a939a79
[LegalizeDAG] Use Base+Offset instead of Offset+Base for jump tables
This is needed for architectures that actually use strict pointer
arithmetic instead of integers such as AArch64 with FEAT_CPA (see
https://github.com/llvm/llvm-project/pull/105669) or CHERI. Using an
index as the first operand of pointer arithmetic may result in an
invalid output.

While there are quite a few codegen changes here, these only change the
order of registers in add instructions. One MIPS combine had to be
updated to handle the new node order.

Reviewed By: topperc

Pull Request: https://github.com/llvm/llvm-project/pull/125279
2025-01-31 14:05:34 -08:00
Alex Richardson
c7d4ccfd83 [PowerPC] Autogenerate a test checks in preparation for follow-up commit
This just adds more lines that are checked
2025-01-31 12:01:31 -08:00
Stefan Pintilie
340706f311
[PowerPC] Fix saving of Link Register when using ROP Protect (#123101)
An optimization was added that tries to move the uses of the mflr
instruction away from the instruction itself. However, this doesn't work
when we are using the hashst instruction because that instruction needs
to be run before the stack frame is obtained.

This patch disables moving instructions away from the mflr in the case
where ROP protection is being used.

---------

Co-authored-by: Lei Huang <lei@ca.ibm.com>
2025-01-22 13:44:20 -05:00
Sander de Smalen
6b1db79887 Revert "Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#123632)"
There's a regression with one of the bootstrap builds for x86.
I'll revert this while I investigate.

This reverts commit 4df6d3df24ae9cff07c70c96a1663cbba6e1dca5.
2025-01-22 10:11:32 +00:00
Sander de Smalen
4df6d3df24
Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#123632)
This PR aims to reland work done by @arsenm which was previously
reverted due to some tangentially related scheduler issues as discussed
on #76416.

This PR cherry-picks the original commit (0e46b49de433), and adds
another patch on top with the following changes:

* The code in `updateRegDefsUses` now updates subranges when
  subreg-liveness-tracking is enabled.

* When adding an implicit-def operand for the super-register,
  the code in `reMaterializeTrivialDef` which tries to remove
  undefined subranges should now take into account that the lanes
  from the super-reg are no longer undefined.

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2025-01-22 09:07:46 +00:00
TiborGY
3630d9ef65
[PartiallyInlineLibCalls] Add infrastructure for emitting optimization remarks from PartiallyInlineLibCalls (#122654)
I am planning to add some optimization remarks to the
`PartiallyInlineLibCalls` pass. However, since this pass does not emit any 
optimization remarks yet, I have to add the "infrastructure" for that first, which 
is what this PR is about.
2025-01-22 13:15:40 +07:00
Matt Arsenault
5e79ae60a6
DAG: Fix vector_shuffle -> splat fold defining undef lanes (#123596)
For shuffle vector splats with undef lanes in the mask,
this was introducing real values. Filter out build_vector
results based on the undef elements in the mask.

This avoids AMDGPU test regressions in a future change.

test/CodeGen/X86/urem-seteq-illegal-types.ll looks worse
but I didn't investigate.
2025-01-21 23:55:50 +07:00
Guy David
1a935d7a17
[llvm] Mark scavenging spill-slots as *spilled* stack objects. (#122673)
This seems like an oversight when copying code from other backends.
2025-01-14 10:18:31 +02:00
Hubert Tong
e438513f2e
[AIX][AsmPrinter] Fix unsigned subtraction wrap-around (#122214)
Unsigned subtraction wrap-around occurs in `emitGlobalConstantImpl` on
an AIX-specific code path from 8e4423eb0888 when a structure type has
zero elements.

With assertions enabled, this manifests as:
```
TypeSize llvm::StructLayout::getElementOffset(unsigned int) const: Assertion `Idx < NumElements && "Invalid element idx!"' failed.
```
2025-01-09 00:07:57 -04:00
Fangrui Song
5e0be962fe
[PowerPC] Support PIC Secure PLT for CALL_RM
https://reviews.llvm.org/D111433 introduced PPCISD::CALL_RM for
-frounding-math. -msecure-plt -frounding-math {-fpic,-fPIC} codegen for
PPC32 became incorrect when a function contains function calls but no
global variable references (GlobalBaseReg).

As reported by @q66 , musl/src/dirent/closedir.c implements such a
function, which is miscompiled.

PPCISD::CALL has custom logic to set up the base register
(https://reviews.llvm.org/D42112). Add an extra case for CALL_RM.

While here, improve the test to

* actually test `case PPCISD::CALL`: we need a non-leaf function that
  doesn't access global variables (global variables lead to
  GlobalBaseReg, which call `getGlobalBaseReg()` as well).
* test `ExternalSymbolSDNode` with a memset.

Supersedes: #72758

Pull Request: https://github.com/llvm/llvm-project/pull/121281
2025-01-06 08:59:42 -08:00
Craig Topper
4dfea22e77
[ExpandMemCmp][AArch64][PowerPC][RISCV][X86] Use llvm.ucmp instead of (sub (zext (icmp ugt)), (zext (icmp ult))). (#121530)
AArch64 and PowerPC look like a improvements.
RISC-V is neutral.
X86 trades a dependency breaking xor before a seta for a movsx after a
sbbb. Depending on how the result is used, this movsx might go away.
2025-01-03 09:19:32 -08:00
Simon Pilgrim
b3a7ab6f1f [DAG] Don't allow implicit truncation in extract_element(bitcast(scalar_to_vector(X))) -> trunc(srl(X,C)) fold
Limits #117900 to only fold when scalar_to_vector doesn't perform implicit truncation, as the scaled shift calculation doesn't currently account for this - this can be addressed in a future update.

Fixes #121306
2024-12-30 16:08:35 +00:00
Fangrui Song
9efa7d7af3
Remove -print-lsr-output in favor of --stop-after=loop-reduce
Pull Request: https://github.com/llvm/llvm-project/pull/121305
2024-12-29 18:58:30 -08:00
Fangrui Song
7b23f413d1 MCAsmStreamer: Omit initial ".text"
llvm-mc --assemble prints an initial `.text` from `initSections`.
This is weird for quick assembly tasks that do not specify `.text`.

Omit the .text by moving section directive printing from `changeSection`
to `switchSection`. switchSectionNoPrint now correctly calls the
`changeSection` hook (needed by MachO).

The initial directives of clang -S are now reordered. On ELF targets, we
get `.file "a.c"; .text` instead of `.text; .file "a.c"`.
If there is no function, `.text` will be omitted.
2024-12-22 22:03:44 -08:00
Benjamin Maxwell
a7dafea384
[SDAG] Allow folding stack slots into sincos/frexp in more cases (#118117)
This adds a new helper `canFoldStoreIntoLibCallOutputPointers()` to
check that it is safe to fold a store into a node that will expand to a
library call that takes output pointers. This requires checking for two
(independent) properties:

1. The store is not within a CALLSEQ_START..CALLSEQ_END pair
* If it is, the expansion would lead to nested call sequences (which is
invalid)
2. The node does not appear as a predecessor to the store
* If it does, attempting to merge the store into the call would result
in a cycle in the DAG

These two properties are checked as part of the same traversal in
`canFoldStoreIntoLibCallOutputPointers()`
2024-12-17 10:54:17 +00:00
Matt Arsenault
bb18e49edb
RegAlloc: Use DiagnosticInfo to report register allocation failures (#119492)
Improve the non-fatal cases to use DiagnosticInfo, which will now
provide a location. The allocators attempt to report different errors
if it happens to see inline assembly is involved (this detection is
quite unreliable) using srcloc instead of dbgloc. For now, leave this
behavior unchanged. I think reporting the full location and context
function would be more useful.
2024-12-16 10:49:08 +09:00
Fangrui Song
133352feb3 [test] Remove redundant -march= when target triple is specified in IR 2024-12-15 12:42:17 -08:00
Stefan Pintilie
67eb05b292
[PowerPC] Add special handling for arguments that are smaller than pointer size. (#119003)
When arguments are passed in memory instead of registers we currently
load the entire pointer size even though the argument may be smaller.
For exmaple if the pointer size if i32 then we use a load word even if
the argument is only an i8. This patch zeros / extends the bits that are
not required to ensure that we are getting the correct value even if the
load is larger.
2024-12-12 09:43:53 -05:00
Guillaume DI FATTA
a1ee1a9126
[CodeGen] @llvm.experimental.stackmap make operands immediate (#117932)
This pull request modifies the behavior of the
`@llvm.experimental.stackmap` intrinsic to require that its two first
operands (`id` and `numShadowBytes`) be **immediate values**. This
change ensures that variables cannot be passed as two first arguments to
this intrinsic.


Related Issue: https://github.com/llvm/llvm-project/issues/115733

### Testing
- Added new test cases to ensure errors are emitted for non-immediate
operands.
- Ran the full LLVM test suite to verify no regressions were introduced.
2024-12-11 17:41:19 +08:00