3716 Commits

Author SHA1 Message Date
Kai Luo
b42738805a [PowerPC] Auto gen test checks for #69299. NFC. 2023-10-18 02:21:22 +00:00
Kai Luo
3104681686
[PowerPC][Atomics] Remove redundant block to clear reservation (#68430)
This PR is following what https://reviews.llvm.org/D134783 does for
quardword CAS.
2023-10-13 10:59:27 +08:00
Nikita Popov
127ed9ae26
[PowerPC] Use zext instead of anyext in custom and combine (#68784)
This custom combine currently converts `and(anyext(x),c)` into
`anyext(and(x,c))`. This is not correct, because the original expression
guaranteed that the high bits are zero, while the new one sets them to
undef.

Emit `zext(and(x,c))` instead.

Fixes https://github.com/llvm/llvm-project/issues/68783.
2023-10-12 09:32:17 +02:00
Nikita Popov
0ead1faef0 [PowerPC] Add test for #68783 (NFC) 2023-10-11 12:15:26 +02:00
Jay Foad
7b3bbd83c0 Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.
2023-10-09 12:31:32 +01:00
Jay Foad
2501ae58e3
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
2023-10-09 11:44:41 +01:00
Lei
529ad40e05
[PowerPC] Fix missing kill flag update for XVCVDPSP transformations (#67997)
Add transformed register to kill flag work list for XVCVDPSP tranformations.

Ref: reviews.llvm.org/D133103
2023-10-06 10:24:54 -04:00
Kishan Parmar
696ea67f19 Disable call to fma for soft-float
PowerPC backend generate calls to libc function calls
for soft-float, regardless of the -nostdlib /-ffreestanding flag.
fma is not a function provided by compiler-rt builtins and
thus should not be generated here.
PR : [[ https://github.com/llvm/llvm-project/issues/55230 | #55230 ]]

Below is patch given by @nemanjai

Reviewed By: jhibbits

Differential Revision: https://reviews.llvm.org/D156344
2023-09-28 14:06:54 +05:30
Qiu Chaofan
cc627828f5 Pre-commit some PowerPC test cases 2023-09-28 15:51:14 +08:00
Wael Yehia
da55b1b52f
[XCOFF] Do not generate the special .ref for zero-length sections (#66805)
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
2023-09-28 01:33:41 -04:00
esmeyi
d7195c57d8 Reland https://reviews.llvm.org/D159073.
The patch failed in test-suite due to a liveness error after rebasing on https://reviews.llvm.org/D133103, and now it's fixed.

```
[PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel.

Summary: rldicl/rldicr can be eliminated if it's used to clear thehigh-order or low-order n bits and all bits cleared will be ANDed with 0 byandi/andis. Or they can be folded to `andi 0` if all bits to AND are alreadyzero in the input.

Reviewed By: qiucf, shchenz

Differential Revision: https://reviews.llvm.org/D159073
```
2023-09-26 06:24:47 -04:00
Kai Luo
5fabc8ba22 [PowerPC] Add test to show wrong target flags printed at MO_TLSGDM_FLAG operand. NFC. 2023-09-26 05:13:26 +00:00
esmeyi
77147a95b8 Revert "[PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel."
This reverts commit 2de74e1bd4d540063d7495fa6254781abd41e179.

A test-suite failure occurs due to this commit, will fix soon.
2023-09-25 23:31:34 -04:00
esmeyi
2de74e1bd4 [PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel.
Summary: rldicl/rldicr can be eliminated if it's used to clear the high-order or low-order n bits and all bits cleared will be ANDed with 0 by andi/andis. Or they can be folded to `andi 0` if all bits to AND are already zero in the input.

Reviewed By: qiucf, shchenz

Differential Revision: https://reviews.llvm.org/D159073
2023-09-25 23:11:34 -04:00
Matthias Braun
740ee00a4c
PPCBranchCoalescing: Fix invalid branch weights (#67211)
Re-normalize branch-weights after removing a block successor to avoid
branch-weights not adding up to 100%. This changes MIR for the
`test/CodeGen/PowerPC/branch_coalesce.ll` test like this:

```diff
-  successors: %bb.6(0x40000000); %bb.6(50.00%)
+  successors: %bb.6(0x80000000); %bb.6(100.00%)
```

This doesn't affect codegen on its own but fixing this helps with
fluctuations I have with some of my upcoming changes.
2023-09-25 10:41:04 -07:00
Nemanja Ivanovic
46d5d264fc [PowerPC] Improve kill flag computation and add verification after MI peephole
The MI Peephole pass has grown to include a large number of transformations over the years. Many of the transformations require re-computation of kill flags but don't do a good job of re-computing them. This causes us to have very common failures when the compiler is built with expensive checks. Over time, we added and augmented a function that is supposed to go and fix up kill flags after each transformation but we keep missing cases.
This patch does the following:
- Removes the function to re-compute kill flags
- Adds LiveVariables to compute and maintain kill flags while transforming code
- Adds re-computation of kill flags for the post-RA peepholes for each block that contains a transformed instruction

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D133103
2023-09-22 15:26:39 -04:00
Jay Foad
e0919b189b [CodeGen] Renumber slot indexes before register allocation (#66334)
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.

This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.
2023-09-19 11:18:12 +01:00
Craig Topper
f71a9e8bb7
[SelectionDAG][RISCV][PowerPC][X86] Use TargetConstant for immediates for ISD::PREFETCH. (#66601)
The intrinsic uses ImmArg so TargetConstant would be consistent
with how other intrinsics are handled.

This hides the constants from type legalization so we can remove
the promotion support.

isel patterns are updated accordingly.
2023-09-18 08:58:50 -07:00
Guozhi Wei
cbdccb30c2 [RA] Split a virtual register in cold blocks if it is not assigned preferred physical register
If a virtual register is not assigned preferred physical register, it means some
COPY instructions will be changed to real register move instructions. In this
case we can try to split the virtual register in colder blocks, if success, the
original COPY instructions can be deleted, and the new COPY instructions in
colder blocks will be generated as register move instructions. It results in
fewer dynamic register move instructions executed.

The new test case split-reg-with-hint.ll gives an example, the hot path contains
24 instructions without this patch, now it is only 4 instructions with this
patch.

Differential Revision: https://reviews.llvm.org/D156491
2023-09-15 19:52:50 +00:00
Maryam Moghadas
7b021f2e64 [PowerPC] Optimize VPERM and fix code order for swapping vector operands on LE
This patch reverts commit 7614ba0a5db8 to optimize VPERM when one of its
vector operands is XXSWAPD, similar to XXPERM. It also reorganizes the
little-endian swap code on LE, swapping the vector operand after
adjusting the mask operand. This ensures that the vector operand is
swapped at the correct point in the code, resulting in a valid
constant pool for the mask operand.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D149083
2023-09-13 15:00:49 -05:00
Simon Pilgrim
e6b85c3027 [DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case (REAPPLIED)
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction

As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.

Reapplied after reversion at e1e3c75c7dad72 with a tweak to the pseudo-probe-peep.ll test

Differential Revision: https://reviews.llvm.org/D158068
2023-09-13 12:33:39 +01:00
Simon Pilgrim
e1e3c75c7d Revert rG6c56cf71ee82ec3a28e0dfc2b751bd10c16929da "[DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case"
Need to address a missed test change
2023-09-13 11:27:47 +01:00
Simon Pilgrim
6c56cf71ee [DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction

As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.

Differential Revision: https://reviews.llvm.org/D158068
2023-09-13 11:01:58 +01:00
Qiu Chaofan
69b056d563 [PowerPC] Implement SchedModel for Power7
Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D158704
2023-09-13 14:55:07 +08:00
Qiu Chaofan
d4d0b5eaab Fix MIR failure after b922a362 2023-09-08 16:33:45 +08:00
Qiu Chaofan
b922a36211 [PowerPC] Define SchedModel for Power8
PowerPC subtargets prior to Power9 use the 'legacy' itinerary way to
provide scheduling information. This patch re-writes the tablegen file
to define the scheduling information in the new SchedModel way, which
can bring improvements to some benchmarks.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D154488
2023-09-08 15:43:21 +08:00
bzEq
d9efcb54c9
[PEI][PowerPC] Fix false alarm of stack size limit (#65559)
PPC64 allows stack size up to ((2^63)-1) bytes. Currently llc reports
```
warning: stack frame size (4294967568) exceeds limit (4294967295) in function 'main'
```
if the stack allocated is larger than 4G.
2023-09-08 15:16:00 +08:00
Amy Kwan
3f46e5453d [AIX][TLS] Produce a faster local-exec access sequence with -maix-small-local-exec-tls (And optimize when load/store offsets are 0)
This patch utilizes the -maix-small-local-exec-tls option added in
D155544 to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided.

The patch either produces an addi/la with a displacement off of r13
(the thread pointer) when the address is calculated, or it produces an
addi/la followed by a load/store when the address is calculated and
used for further accesses.

This patch also optimizes this sequence a bit more where we can remove
the addi/la when the load/store offset is 0. A follow up patch will
be posted to account for when the load/store offset is non-zero, and
currently in these situations we keep the addi/la that precedes the
load/store.

Furthermore, this access sequence is only performed for TLS variables
that are less than ~32KB in size.

Differential Revision: https://reviews.llvm.org/D155600
2023-09-07 20:05:29 -05:00
Amy Kwan
8bdbee8aaa [AIX][TLS] Add target attribute for -maix-small-local-exec-tls option.
This patch adds a target attribute for an AIX-specific option that
informs the compiler that it can use a faster access sequence for the
local-exec TLS model (formally named aix-small-local-exec-tls).

The Clang portion of this option is in D155544.
The initial implementation to generate the faster access sequence is in
D155600.

Differential Revision: https://reviews.llvm.org/D156203
2023-09-07 20:05:29 -05:00
stefanp-ibm
0a4a8bec34
[PowerPC] Turn string pooling on by default. (#65628)
This patch turns the string pooling pass on by default. Some tests are
updated as required.
2023-09-07 16:49:31 -04:00
Wael Yehia
11d5c7bd28 [AIX] Add threadId and use nanosecond timestamp in sinit/sterm symbols
With ThinLTO, when compiling SPEC 2017 omnetpp_r with -threads=4, two
small modules can end up with the same timestamp in their sinit symbols
when calculating time in seconds, creating duplicate definitions.

This patch uses a timestamp in nanoseconds.
Because the race can be between threads, embed the thread ID as well.

Reviewed By: xingxue, daltenty

Differential Revision: https://reviews.llvm.org/D159319
2023-09-07 17:46:41 +00:00
Amy Kwan
f94f85348d Revert "[AIX][TLS] Generate .extern and .ref references to __tls_get_addr for local-exec accesses."
This reverts commit f0b2f6954101c9052763a99a1e7ac135770e779a.
The implementation is incorrect and breaks compiling local-exec programs.
2023-09-07 12:10:37 -05:00
esmeyi
b85a9b3093 [PowerPC] Try to use less instructions to materialize 64-bit constant when High32=Low32.
Summary: Materialization a 64-bit constant with High32=Low32 only requires 2 instructions instead of 3 when Low32 can be materialized in 1 instruction.

Reviewed By: qiucf

Differential Revision: https://reviews.llvm.org/D158495
2023-09-07 13:03:17 -04:00
Stefan Pintilie
84e2fd7ee4 [PowerPC] Add a pass to merge all of the constant global arrays into one pool.
On PowerPC the number of TOC entries must be kept low for large
applications. In order to reduce the number of constant global arrays
we can pool them into one structure and then access them as the base
address of that structure plus some offset. The constant global arrays
may be arrays of `i8` which are constant strings but they may also be
arrays of `i32, i64, etc...`.

Reviewed By: lei, amyk

Differential Revision: https://reviews.llvm.org/D155730
2023-09-07 11:14:56 -04:00
Stefan Pintilie
492c1f3d7c [PowerPC] Merge rotate and clear into single instruction.
This patch tries to catch a codegen opportunity where the rotate and
mask can be merged into a single RLDCL instruction.

Reviewed By: lei, amyk

Differential Revision: https://reviews.llvm.org/D158328
2023-09-07 09:25:41 -04:00
Ting Wang
71be020dda [SelectionDAG][PowerPC] Memset reuse vector element for tail store
On PPC there are instructions to store element from vector(e.g.
stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail
constant in memset and constant splat array initialization.

This patch tries to explore these opportunities.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D138883
2023-09-06 01:52:38 -04:00
Amy Kwan
f0b2f69541 [AIX][TLS] Generate .extern and .ref references to __tls_get_addr for local-exec accesses.
Compiling with TLS variables requires -pthread, but if the user omits this
option, the compiler will not show any obvious indication during compilation
that -pthread is needed for programs using TLS variables. Instead, the user will
experience a segmentation fault when running programs with TLS variables in them
and without specifying -pthread.

This patch aims to generate .extern/.ref references to __tls_get_addr[DS] for
local-exec accesses, in order to trigger an error from the linker to indicate
that there is an undefined symbol to __tls_get_addr. Doing so will remind the
user to compile/link with -pthread.

Differential Revision: https://reviews.llvm.org/D151335
2023-09-05 12:15:14 -05:00
Qiu Chaofan
082c5d7f63 [PowerPC] Implement builtin for mffsl
mffsl is available since ISA 3.0. The builtin is named with ppc prefix
to follow our convention. For targets earlier than power9, GCC generates
extra code to support the functionality, while this patch does not
implement such behavior.

Reviewed By: nemanjai, tuliom

Differential Revision: https://reviews.llvm.org/D158065
2023-09-05 11:22:09 +08:00
Matt Arsenault
b14e83d1a4 IR: Add llvm.exp10 intrinsic
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.

https://reviews.llvm.org/D157871
2023-09-01 19:45:03 -04:00
Chen Zheng
a69cb20768 [NFC] Fix the PowerPC broken cases in D152215.
Reviewed By: qiucf

Differential Revision: https://reviews.llvm.org/D159052
2023-09-01 02:07:48 -04:00
Stephen Peckham
282da83756 [XCOFF][AIX] Issue an error when specifying an alias for a common symbol
Summary:

There is no support in XCOFF for labels on common symbols. Therefore, an alias for a common symbol is not supported. Issue an error in the front end when an aliasee is a common symbol. Issue a similar error in the back end in case an IR specifies an alias for a common symbol.

Reviewed by: hubert.reinterpretcast, DiggerLin

Differential Revision:  https://reviews.llvm.org/D158739
2023-08-31 11:43:47 -04:00
Qiu Chaofan
21bea1a208 [PowerPC] Support initial-exec TLS relocation on AIX
Add TLS_IE relocation type to XCOFF writer, and emit code sequence for
initial-exec TLS variables.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D156292
2023-08-30 16:22:16 +08:00
Chen Zheng
732f63d96d [PowerPC]set default min-jump-table-entries to 64 on PPC
Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D159050
2023-08-29 21:42:22 -04:00
Chen Zheng
833b1e307f [NFC] add testcase for MinimumJumpTableEntries change on PowerPC. 2023-08-29 21:13:50 -04:00
Serguei Katkov
a701b7e368 [CGP] Remove dead PHI nodes before elimination of mostly empty blocks
Before elimination of mostly empty block it makes sense to remove dead PHI nodes.
It open more opportunity for elimination plus eliminates dead code itself.

It appeared that change results in failing many unit tests and some of
them I've updated and for another one I disable this optimization.
The pattern I observed in the tests is that there is a infinite loop
without side effects. As a result after elimination of dead phi node all other
related instruction are also removed and tests stops to check what it is expected.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D158503
2023-08-29 04:35:06 +00:00
esmeyi
8514d207ba [AIX] Handle ReadOnlyWithRel kind on AIX.
Summary: This patch handles the SectionKind of ReadOnlyWithRel on AIX. The failure was discovered during sanitizer enablement and occured with `-fsanitize-coverage` option.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D157483
2023-08-28 00:21:09 -04:00
Craig Topper
2ad50f354a [DAGCombiner][RISCV][AArch64][PowerPC] Restrict foldAndOrOfSETCC from using SMIN/SMAX where and OR/AND would do.
This removes some diffs created by D153502.

I'm assuming an AND/OR won't be worse than an SMIN/SMAX. For
RISC-V at least, AND/OR can be a shorter encoding than SMIN/SMAX.

It's weird that we have two different functions responsible for
folding logic of setccs, but I'm not ready to try to untangle that.

I'm unclear if the PowerPC chang is a regression or not. It looks
like it might use more registers, but I don't understand PowerPC
register so I'm not sure.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158292
2023-08-23 20:26:23 -07:00
Kai Luo
1ceaec3e81 [PowerPC][altivec] Optimize codegen of vec_promote
According to https://www.ibm.com/docs/en/xl-c-and-cpp-linux/16.1.1?topic=functions-vec-promote, elements not specified by the input index argument are undefined. So that we don't need to set these elements to be zeros.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D158487
2023-08-24 02:10:13 +00:00
esmeyi
96b5ea6e00 [NFC][PowerPC] Add cases for 64-bit constants. 2023-08-23 04:10:16 -04:00
Stefan Pintilie
d0e1e7649b [NFC][PowerPC] Add a test case for rotate and clear.
Added a test case for situations where a rotate is followed by a clear.
NFC because only a test case is added.
2023-08-21 11:01:47 -04:00