3951 Commits

Author SHA1 Message Date
Maryam Moghadas
8a0cb9ac86
[PowerPC] Add custom lowering for ssubo (#111748)
This patch is to improve the codegen for ssubo node for i32 in 64-bit
mode by custom lowering.
2024-10-29 15:43:05 -04:00
Simon Pilgrim
32aa782ea2 [PowerPC] copysignl.ll - regenerate to reduce the diff in #111269 2024-10-29 10:57:43 +00:00
Serge Pavlov
819abe412d
[Test] Fix usage of constrained intrinsics (#113523)
Some tests contain errors in constrained intrinsic usage, such as missed
or extra type parameters, wrong type parameters order and some other.

---------

Co-authored-by: Andy Kaylor <andy_kaylor@yahoo.com>
2024-10-28 14:07:32 +07:00
Zaara Syeda
f3131c99bf
[GlobalMerge] Aggressively merge constants to reduce TOC entries (#111756)
Symbols that get mapped into the read-only section are loaded as part of
the text segment and will always need a TOC entry to be addressable. Add
an option to aggressively merge these read only globals to reduce TOC
usage.
2024-10-24 10:16:39 -04:00
Lei Huang
522f34cfff
[PowerPC] Expand global named register support (#113482)
Enable all valid registers for intrinsics that read from and write
to global named registers.
2024-10-24 10:05:18 -04:00
Lei Huang
a19f05b9ec
Revert "[PowerPC] Expand global named register support" (#113457)
Reverts llvm/llvm-project#112603
2024-10-23 09:36:28 -04:00
Lei Huang
06d192925d
[PowerPC] Expand global named register support (#112603)
Enable all valid registers for intrinsics that read from and write
to global named registers.
2024-10-22 14:34:24 -04:00
RolandF77
fc59f2cc0f
[PowerPC] special case small int constant for custom scalar_to_vector (#109850)
Special case small int constant in the PPC custom lowering of
scalar_to_vector.
2024-10-21 12:19:07 -04:00
Zaara Syeda
c5ca1b8626
[PPC] Add custom lowering for uaddo (#110137)
Improve the codegen for uaddo node for i64 in 64-bit mode and i32 in
32-bit mode by custom lowering.
2024-10-21 11:13:16 -04:00
Alex Rønne Petersen
ad4a582fd9
[llvm] Consistently respect naked fn attribute in TargetFrameLowering::hasFP() (#106014)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.

Note: I don't have commit access.
2024-10-18 09:35:42 +04:00
Keith Packard
44b020a381
[PowerPC][ISelLowering] Support -mstack-protector-guard=tls (#110928)
Add support for using a thread-local variable with a specified offset
for holding the stack guard canary value. This supports both 32- and 64-
bit PowerPC targets.

This mirrors changes from #108942 but targeting PowerPC instead of
RISCV. Because both of these PRs modify the same driver functions, this
series is stack on top of the RISC-V one.

---------

Signed-off-by: Keith Packard <keithp@keithp.com>
2024-10-17 19:06:47 -07:00
Qiongsi Wu
f9d0789064
[PGO] Initialize GCOV Writeout and Reset Functions in the Runtime on AIX (#108570)
This PR registers the writeout and reset functions for `gcov` for all
modules in the PGO runtime, instead of registering them
using global constructors in each module. The change is made for AIX
only, but the same mechanism works on Linux on Power.

When registering such functions using global constructors in each module
without `-ffunction-sections`, the AIX linker cannot garbage collect
unused undefined symbols, because such symbols are grouped in the same
section as the `__sinit` symbol. Keeping such undefined symbols causes
link errors (see test case
https://github.com/llvm/llvm-project/pull/108570/files#diff-500a7e1ba871e1b6b61b523700d5e30987900002add306e1b5e4972cf6d5a4f1R1
for this scenario). This PR implements the initialization in the
runtime, hence avoiding introducing `__sinit` into each module.

The implementation adds a new global variable `__llvm_covinit_functions`
to each module. This new global variable contains the function pointers
to the `Writeout` and `Reset` functions. `__llvm_covinit_functions`'s
section is the named section `__llvm_covinit`. The linker will aggregate
all the `__llvm_covinit` sections from each module
to form one single named section in the final binary. The pair of
functions
```
const __llvm_gcov_init_func_struct *__llvm_profile_begin_covinit();
const __llvm_gcov_init_func_struct *__llvm_profile_end_covinit();
```
are implemented to return the start and end address of this named
section in the final binary, and they are used in function
```
__llvm_profile_gcov_initialize()
```
(which is a constructor function in the runtime) so the runtime knows
the addresses of all the `Writeout` and `Reset` functions from all the
modules.

One noticeable implementation detail relevant to AIX is that to preserve
the `__llvm_covinit` from the linker's garbage collection, a `.ref`
pseudo instruction is inserted into them, referring to the section that
contains the `__llvm_gcov_ctr` variables, which are used in the
instrumented code. The `__llvm_gcov_ctr` variables did not belong to
named sections before, but this PR added them to the
`__llvm_gcov_ctr_section` named section, so we can add a `.ref` pseudo
instruction that refers to them in the `__llvm_covinit` section.
2024-10-17 09:32:10 -04:00
Stefan Pintilie
dcc5ba4a4d
[PowerPC] Add missing patterns for lround when i32 is returned. (#111863)
The patch adds support for lround when the output type of the rounding
is i32.
The support for a rounding result of type i64 existed before this patch.
2024-10-16 10:25:09 -04:00
Christudasan Devadasan
488d3924dd
[CodeGen][NewPM] Port EarlyIfConversion pass to NPM. (#108508) 2024-10-16 13:22:57 +05:30
Akshat Oke
8b20f1b924
[MIR] Fix tests for flags in register info (#112179)
[MIR] Serialize virtual register flags #110228 introduces register flags
which appear empty in .mir dumps. Future tests should use
`-simplify-mir`.
2024-10-14 18:28:54 +05:30
Oliver Stannard
1e49670b31
[DAGISel] Keep flags when converting FP load/store to integer (#111679)
This DAG combine replaces a floating-point load/store pair which has no
other uses with an integer one, but did not copy the memory operand
flags to the new instructions, resulting in it dropping the volatile
flag. This optimisation is still valid if one or both of the
instructions is volatile, so we can copy over the whole
MachineMemOperand to generate volatile integer loads and stores where
needed.
2024-10-10 09:17:50 +01:00
Simon Pilgrim
55890968ac [PowerPC] vec-min-max.ll - regenerate with common check prefixes to reduce duplication. NFC. 2024-10-08 17:36:35 +01:00
Ramkumar Ramachandra
45817aa726
LICM: hoist BO assoc for and, or, xor (#111146)
Trivially lift the Opcode limitation on hoistBOAssociation to also hoist
and, or, and xor.

Alive2 proofs: https://alive2.llvm.org/ce/z/rVNP2X
2024-10-04 19:13:51 +01:00
Matt Arsenault
187dcd8e22
DAG: Preserve disjoint flag when emitting final instructions (#110795) 2024-10-02 19:37:04 +04:00
Craig Topper
92a8b81bdf
[LegalizeVectorOps] Enable ExpandFABS/COPYSIGN to use integer ops for fixed vectors in some cases. (#109232)
Copy the same FSUB check from ExpandFNEG to avoid breaking AArch64 and
ARM.
2024-09-30 11:44:49 -07:00
Timothy Pearson
90c1474863
[SDAG] Honor signed arguments in floating point libcalls (#109134)
In ExpandFPLibCall, an assumption is made that all floating point
libcalls that take integer arguments use unsigned integers. In the case
of ldexp and frexp, this assumption is incorrect, leading to
miscompilation and subsequent target-dependent incorrect operation.

Indicate that ldexp and frexp utilize signed arguments in
ExpandFPLibCall.

Fixes #108904

Signed-off-by: Timothy Pearson <tpearson@solidsilicon.com>
2024-09-25 11:09:50 +04:00
futog
3e0a76b1fd
[Codegen][LegalizeIntegerTypes] Improve shift through stack (#96151)
Minor improvement on cc39c3b17fb2598e20ca0854f9fe6d69169d85c7.

Use an aligned stack slot to store the shifted value.
Use the native register width as shifting unit, so the load of the
shift result is aligned.

If the shift amount is a multiple of the native register width, there is
no need to do a follow-up shift after the load. I added new tests for
these cases.

Co-authored-by: Gergely Futo <gergely.futo@hightec-rt.com>
2024-09-23 11:45:43 +02:00
Akshat Oke
d2d78e584b
[NewPM][CodeGen] Port MachineLICM to NPM (#107376) 2024-09-20 11:34:18 +05:30
Zaara Syeda
22067a8eb4
[PowerPC] Fix assert exposed by PR 95931 in LowerBITCAST (#108062)
Hit Assertion failed: Num < NumOperands && "Invalid child # of SDNode!" 
Fix by checking opcode and value type before calling getOperand.
2024-09-10 14:14:01 -04:00
Qiu Chaofan
06c331163e
[PowerPC] Implement llvm.set.rounding intrinsic (#67302) 2024-09-10 14:30:31 +08:00
Jeremy Morse
7a930ce327
[DWARF] Emit a minimal line-table for totally empty functions (#107267)
In degenerate but legal inputs, we can have functions that have no source
locations at all -- all the DebugLocs attached to instructions are empty.
LLVM didn't produce any source location for the function; with this patch
it will at least emit the function-scope source location. Demonstrated by
empty-line-info.ll

The XCOFF test modified has similar symptoms -- with this patch, the size
of the ".dwline" section grows a bit, thus shifting some of the file
internal offsets, which I've updated.
2024-09-09 12:54:45 +01:00
anjenner
4af249fe6e
Add usub_cond and usub_sat operations to atomicrmw (#105568)
These both perform conditional subtraction, returning the minuend and
zero respectively, if the difference is negative.
2024-09-06 16:19:20 +01:00
Matt Arsenault
100d9b8994 Reapply "AtomicExpand: Allow incrementally legalizing atomicrmw" (#107307)
This reverts commit 63da545ccdd41d9eb2392a8d0e848a65eb24f5fa.

Use reverse iteration in the instruction loop to avoid sanitizer
errors. This also has the side effect of avoiding the AArch64
codegen quality regressions.

Closes #107309
2024-09-06 18:37:34 +04:00
Simon Pilgrim
6ec889e53f [DAG] Add support for neg(abd(x,y)) patterns.
Currently limited to cases which have legal/custom ABDS/ABDU handling - I'll extend this for all targets in future (similar to how we support neg(abs(x))) once I've addressed some outstanding regressions on aarch64/riscv.

Helps avoid a lot of extra cmov instructions on x86 in particular, and allows us to more easily improve the codegen in future commits.
2024-09-06 13:16:09 +01:00
Matt Arsenault
fc3e6a8186
DAG: Handle lowering unordered compare with inf (#100378)
Try to take advantage of the nan check behavior of fcmp.
x86_64 looks better, x86_32 looks worse.
2024-09-05 19:54:32 +04:00
RolandF77
26ba186bd0
[PowerPC] Improve pwr7 codegen for v4i8 load (#104507)
There are no partial vector loads on pwr7 so current v4i8 codegen is an
int load then store to vector sized temp and re-load as vector. Try to
use lfiwax to load 32 bits into an FP reg and take advantage of VSX FP
and vector reg sharing to move the result to the right vector position.
2024-09-04 12:55:27 -04:00
Christudasan Devadasan
6c143a86cd
[CodeGen][NewPM] Port MachineCSE pass to new pass manager. (#106605) 2024-09-04 18:54:07 +05:30
paperchalice
69657eb7f6
[llc] Provide opt like verifier options (#106665)
- Support `verify-each` option.
- Default behavior is verifying output only.
2024-09-04 17:37:34 +08:00
Michael Marjieh
00c198b2ca
[MachinePipeliner] Make Recurrence MII More Accurate (#105475)
Current RecMII calculation is bigger than it needs to be. The
calculation was refined in this patch.
2024-09-03 16:15:17 +09:00
Craig Topper
aa91d90cb0
[LegalizeVectorOps][PowerPC] Use xor to expand fneg. (#106595)
This preserves the semantis of fneg and matches what we do in
LegalizeDAG.

I kept the legal FSUB check to force unrolling for some targets that
don't have FSUB but have XOR. On Aarch64, using xor broke some tests that
expected to see a (v1f64 (fma (insertvector_elt (f64 (fneg
(extractvectorelt X)))))) pattern.
2024-08-29 15:00:23 -07:00
Stephen Tozer
3d08ade7bd
[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149)
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:

https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850

This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).

Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
2024-08-29 17:53:32 +01:00
Matt Arsenault
7b7b0b95b2
DAG: Check if is_fpclass is custom, instead of isLegalOrCustom (#105577)
For some reason, isOperationLegalOrCustom is not the same as
isOperationLegal || isOperationCustom. Unfortunately, it checks
if the type is legal which makes it uesless for custom lowering
on non-legal types (which is always ppcf128).

Really the DAG builder shouldn't be going to expand this in the
builder, it makes it difficult to work with. It's only here to work
around the DAG requiring legal integer types the same size as
the FP type after type legalization.
2024-08-29 14:05:43 +04:00
RolandF77
89bbcbe285
[PowerPC] fix legalization crash (#105563)
If v2i64 scalar_to_vector is made custom, llc can crash in certain
legalization cases where v2i64 vectors are injected, even if they
weren't otherwise present. The code generated would be fine, but that
operation is not handled in ReplaceNodeResults. Add handling.
2024-08-28 11:22:23 -04:00
Kai Luo
8e901c255d
[PowerPC] Retire PPCExpandISel pass (#84289)
We can decide whether to expand isel or not in instruction selection
pass and early-if-conversion pass. The transformation implemented in
PPCExpandISel can be retired considering PPC backend doesn't generate
`isel` instructions post-RA.
Also if we are seeking performant branch-or-isel decision, we can turn
to selectoptimize pass.

---------

Co-authored-by: Kai Luo <lkail@cn.ibm.com>
2024-08-27 09:43:52 +08:00
Zaara Syeda
327edbe07a
[PowerPC] Fix mask for __st[d/w/h/b]cx builtins (#104453)
These builtins are currently returning CR0 which will have the format
[0, 0, flag_true_if_saved, XER].
We only want to return flag_true_if_saved. This patch adds a shift to
remove the XER bit before returning.
2024-08-22 09:55:46 -04:00
Sergei Barannikov
c91cc459d3
[DataLayout] Refactor the rest of parseSpecification (#104545)
The aim is to improve test coverage of data layout string parsing.

Pull Request: https://github.com/llvm/llvm-project/pull/104545
2024-08-20 11:25:49 +03:00
Qiu Chaofan
b6d1df2afd
[PowerPC] Support -mno-red-zone option (#94581) 2024-08-19 17:58:08 +08:00
Amy Kwan
cf721e29c6
[PowerPC] Do not merge TLS constants within PPCMergeStringPool.cpp (#94059)
This patch prevents thread-local constants to be merged within
PPCMergeStringPool.cpp.

The PPCMergeStringPool pass primarily merges non-thread-local constants
together, and thread-local constants should not be mixed together with
other (non-thread-local) constants. In the event that thread-local and
other non-thread-local constants are pooled together, the
llvm.threadlocal.address intrinsic can fail as it expects its argument
to be a thread-local global value, but the merged string structure
created by the PPCMergeStringPool pass is not thread-local as a whole.
2024-08-16 15:06:50 -04:00
Amy Kwan
9325381998
[PowerPC][GlobalMerge] Enable GlobalMerge by default on AIX (#101226)
This patch turns on the GlobalMerge pass by default on AIX and updates
LIT tests accordingly.
2024-08-15 15:25:54 -04:00
Craig Topper
abc1acf8df
[TargetLowering][AMDGPU][ARM][RISCV][X86] Teach SimplifyDemandedBits to combine (srl (sra X, C1), ShAmt) -> sra(X, C1+ShAmt) (#101751)
If the upper bits of the shr aren't demanded.

This helps with cases where the outer srl was originally an sra and was
converted to a srl by SimplifyDemandedBits before it had a chance to
combine with the inner sra. This can occur when the inner sra was part
of a sign_extend_inreg expansion.

There are some regressions in ARM and Thumb2.
2024-08-14 08:44:57 -07:00
Amy Kwan
5e990b0b7f
[PowerPC][GlobalMerge] Reduce TOC usage by merging internal and private global data (#101224)
This patch aims to reduce TOC usage by merging internal and private
global data.

Moreover, we also add the GlobalMerge pass within the PPCTargetMachine
pipeline, which is disabled by default. This transformation can be
enabled by -ppc-global-merge.
2024-08-14 10:14:33 -04:00
RolandF77
8b6e9de3dd
[PowerPC] improve P10 store forwarding on P7 scalar to vector (#102330)
Try to make P7 code with scalar to vector operations that use store/re-load to run smoother on P10 by supplying enough store width to cover the load and allow hardware store forwarding.
2024-08-12 12:30:06 -04:00
Peter Rong
74e4694b8c
[LTO] enable ObjCARCContractPass only on optimized build (#101114)
\#92331 tried to make `ObjCARCContractPass` by default, but it caused a
regression on O0 builds and was reverted.
This patch trys to bring that back by:

1. reverts the
[revert](1579e9ca9c).
2. `createObjCARCContractPass` only on optimized builds.

Tests are updated to refelect the changes. Specifically, all `O0` tests
should not include `ObjCARCContractPass`

Signed-off-by: Peter Rong <PeterRong@meta.com>
2024-08-09 13:04:25 -07:00
Simon Pilgrim
13d04fa560 [DAG] Add legalization handling for ABDS/ABDU (#92576) (REAPPLIED)
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.

abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv

REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization
2024-08-08 11:39:05 +01:00
Tim Gymnich
408d82d352
[PowerPC] Respect endianness when bitcasting to fp128 (#95931)
Fixes #92246

Match the behaviour of `bitcast v2i64 (BUILD_PAIR %lo %hi)` when
encountering `bitcast fp128 (BUILD_PAIR %lo $hi)`.
by inserting a missing swap of the arguments based on endianness.

### Current behaviour: 
**fp128**
bitcast fp128 (BUILD_PAIR %lo $hi) => BUILD_FP128 %lo %hi
BUILD_FP128 %lo %hi => MTVSRDD %hi %lo

**v2i64**
bitcast v2i64 (BUILD_PAIR %lo %hi) => BUILD_VECTOR %hi %lo
BUILD_VECTOR %hi %lo => MTVSRDD %lo %hi
2024-08-08 08:51:04 +08:00