3729 Commits

Author SHA1 Message Date
stephenpeckham
1d1fede493
[XCOFF] Ensure .file is emitted before any .info pseudo-ops (#71577)
When generating the assembly code for AIX/XCOFF, the .file pseudo-op
needs to be emitted first, before any csects are generated. Otherwise,
information such as the embedded command line will be associated with
part of the object file rather than the entire object file.
2023-11-09 16:03:45 -06:00
Juergen Ributzka
6d1d7be133
Obsolete WebKit Calling Convention (#71567)
The WebKit Calling Convention was created specifically for the WebKit
FTL. FTL
doesn't use LLVM anymore and therefore this calling convention is
obsolete.

This commit removes the WebKit CC, its associated tests, and
documentation.
2023-11-09 09:08:41 -08:00
Qiu Chaofan
5f295552f1
[PowerPC] Fix incorrect symbol name of frexp libcall (#71626)
frexpl is for ppc_fp128. The correct symbol name for f128 is frexpf128.
2023-11-08 14:41:19 +08:00
Qiu Chaofan
d199fd76f7 [NFC] Add f128 frexp intrinsics for PowerPC 2023-11-08 11:27:40 +08:00
Nikita Popov
e4a4122eb6
[IR] Remove zext and sext constant expressions (#71040)
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.

There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.

This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
2023-11-03 10:46:07 +01:00
Nikita Popov
060de415af Reapply [InstCombine] Simplify and/or of icmp eq with op replacement (#70335)
Relative to the first attempt, this contains two changes:

First, we only handle the case where one side simplifies to true or
false, instead of calling simplification recursively. The previous
approach would return poison if one operand simplified to poison
(under the equality assumption), which is incorrect.

Second, we do not fold llvm.is.constant in simplifyWithOpReplaced().
We may be assuming that a value is constant, if the equality holds,
but it may not actually be constant. This is nominally just a QoI
issue, but the std::list implementation in libstdc++ relies on the
precise behavior in a way that causes miscompiles.

-----

and/or in logical (select) form benefit from generic simplifications via
simplifyWithOpReplaced(). However, the corresponding fold for plain
and/or currently does not exist.

Similar to selects, there are two general cases for this fold
(illustrated with `and`, but there are `or` conjugates).

The basic case is something like `(a == b) & c`, where the replacement
of a with b or b with a inside c allows it to fold to true or false.
Then the whole operation will fold to either false or `a == b`.

The second case is something like `(a != b) & c`, where the replacement
inside c allows it to fold to false. In that case, the operand can be
replaced with c, because in the case where a == b (and thus the icmp is
false), c itself will already be false.

As the test diffs show, this catches quite a lot of patterns in existing
test coverage. This also obsoletes quite a few existing special-case
and/or of icmp folds we have (e.g. simplifyAndOrOfICmpsWithLimitConst),
but I haven't removed anything as part of this patch in the interest of
risk mitigation.

Fixes #69050.
Fixes #69091.
2023-11-03 10:16:15 +01:00
Kai Luo
7b5505b0d5
[PowerPC] Change registers used in test due to ABI breakage. NFC. (#70758)
Usage of `r30` and `r31` has broken current traceback table's convention
on AIX. Avoid using CSRs in livein list.
2023-11-03 07:08:33 +08:00
Qiu Chaofan
b46e768455
[DAGCombine] Fold setcc_eq infinity into is.fpclass (#67829) 2023-11-01 11:51:15 +09:00
Nikita Popov
e46dd6fbc0 Revert "[InstCombine] Simplify and/or of icmp eq with op replacement (#70335)"
This reverts commit 1770a2e325192f1665018e21200596da1904a330.

Stage 2 llvm-tblgen crashes when generating X86GenAsmWriter.inc and
other files.
2023-10-30 18:33:03 +01:00
Nikita Popov
1770a2e325
[InstCombine] Simplify and/or of icmp eq with op replacement (#70335)
and/or in logical (select) form benefit from generic simplifications via
simplifyWithOpReplaced(). However, the corresponding fold for plain
and/or currently does not exist.

Similar to selects, there are two general cases for this fold
(illustrated with `and`, but there are `or` conjugates).

The basic case is something like `(a == b) & c`, where the replacement
of a with b or b with a inside c allows it to fold to true or false.
Then the whole operation will fold to either false or `a == b`.

The second case is something like `(a != b) & c`, where the replacement
inside c allows it to fold to false. In that case, the operand can be
replaced with c, because in the case where a == b (and thus the icmp is
false), c itself will already be false.

As the test diffs show, this catches quite a lot of patterns in existing
test coverage. This also obsoletes quite a few existing special-case
and/or of icmp folds we have (e.g. simplifyAndOrOfICmpsWithLimitConst),
but I haven't removed anything as part of this patch in the interest of
risk mitigation.

Fixes #69050.
Fixes #69091.
2023-10-30 10:05:39 +01:00
Simon Pilgrim
c9c9bf0f20 [DAG] WidenVectorOperand - add basic handling for *_EXTEND_VECTOR_INREG nodes
Fixes Issue #70208
2023-10-25 16:52:15 +01:00
Matthias Braun
e3cf80c5c1
BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:

* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
2023-10-24 20:27:39 -07:00
Ramkumar Ramachandra
98c90a13c6
ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering (#66924)
The issue #55208 noticed that std::rint is vectorized by the
SLPVectorizer, but a very similar function, std::lrint, is not.
std::lrint corresponds to ISD::LRINT in the SelectionDAG, and
std::llrint is a familiar cousin corresponding to ISD::LLRINT. Now,
neither ISD::LRINT nor ISD::LLRINT have a corresponding vector variant,
and the LangRef makes this clear in the documentation of llvm.lrint.*
and llvm.llrint.*.

This patch extends the LangRef to include vector variants of
llvm.lrint.* and llvm.llrint.*, and lays the necessary ground-work of
scalarizing it for all targets. However, this patch would be devoid of
motivation unless we show the utility of these new vector variants.
Hence, the RISCV target has been chosen to implement a custom lowering
to the vfcvt.x.f.v instruction. The patch also includes a CostModel for
RISCV, and a trivial follow-up can potentially enable the SLPVectorizer
to vectorize std::lrint and std::llrint, fixing #55208.

The patch includes tests, obviously for the RISCV target, but also for
the X86, AArch64, and PowerPC targets to justify the addition of the
vector variants to the LangRef.
2023-10-19 13:05:04 +01:00
Kai Luo
b42738805a [PowerPC] Auto gen test checks for #69299. NFC. 2023-10-18 02:21:22 +00:00
Kai Luo
3104681686
[PowerPC][Atomics] Remove redundant block to clear reservation (#68430)
This PR is following what https://reviews.llvm.org/D134783 does for
quardword CAS.
2023-10-13 10:59:27 +08:00
Nikita Popov
127ed9ae26
[PowerPC] Use zext instead of anyext in custom and combine (#68784)
This custom combine currently converts `and(anyext(x),c)` into
`anyext(and(x,c))`. This is not correct, because the original expression
guaranteed that the high bits are zero, while the new one sets them to
undef.

Emit `zext(and(x,c))` instead.

Fixes https://github.com/llvm/llvm-project/issues/68783.
2023-10-12 09:32:17 +02:00
Nikita Popov
0ead1faef0 [PowerPC] Add test for #68783 (NFC) 2023-10-11 12:15:26 +02:00
Jay Foad
7b3bbd83c0 Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.
2023-10-09 12:31:32 +01:00
Jay Foad
2501ae58e3
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
2023-10-09 11:44:41 +01:00
Lei
529ad40e05
[PowerPC] Fix missing kill flag update for XVCVDPSP transformations (#67997)
Add transformed register to kill flag work list for XVCVDPSP tranformations.

Ref: reviews.llvm.org/D133103
2023-10-06 10:24:54 -04:00
Kishan Parmar
696ea67f19 Disable call to fma for soft-float
PowerPC backend generate calls to libc function calls
for soft-float, regardless of the -nostdlib /-ffreestanding flag.
fma is not a function provided by compiler-rt builtins and
thus should not be generated here.
PR : [[ https://github.com/llvm/llvm-project/issues/55230 | #55230 ]]

Below is patch given by @nemanjai

Reviewed By: jhibbits

Differential Revision: https://reviews.llvm.org/D156344
2023-09-28 14:06:54 +05:30
Qiu Chaofan
cc627828f5 Pre-commit some PowerPC test cases 2023-09-28 15:51:14 +08:00
Wael Yehia
da55b1b52f
[XCOFF] Do not generate the special .ref for zero-length sections (#66805)
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
2023-09-28 01:33:41 -04:00
esmeyi
d7195c57d8 Reland https://reviews.llvm.org/D159073.
The patch failed in test-suite due to a liveness error after rebasing on https://reviews.llvm.org/D133103, and now it's fixed.

```
[PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel.

Summary: rldicl/rldicr can be eliminated if it's used to clear thehigh-order or low-order n bits and all bits cleared will be ANDed with 0 byandi/andis. Or they can be folded to `andi 0` if all bits to AND are alreadyzero in the input.

Reviewed By: qiucf, shchenz

Differential Revision: https://reviews.llvm.org/D159073
```
2023-09-26 06:24:47 -04:00
Kai Luo
5fabc8ba22 [PowerPC] Add test to show wrong target flags printed at MO_TLSGDM_FLAG operand. NFC. 2023-09-26 05:13:26 +00:00
esmeyi
77147a95b8 Revert "[PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel."
This reverts commit 2de74e1bd4d540063d7495fa6254781abd41e179.

A test-suite failure occurs due to this commit, will fix soon.
2023-09-25 23:31:34 -04:00
esmeyi
2de74e1bd4 [PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel.
Summary: rldicl/rldicr can be eliminated if it's used to clear the high-order or low-order n bits and all bits cleared will be ANDed with 0 by andi/andis. Or they can be folded to `andi 0` if all bits to AND are already zero in the input.

Reviewed By: qiucf, shchenz

Differential Revision: https://reviews.llvm.org/D159073
2023-09-25 23:11:34 -04:00
Matthias Braun
740ee00a4c
PPCBranchCoalescing: Fix invalid branch weights (#67211)
Re-normalize branch-weights after removing a block successor to avoid
branch-weights not adding up to 100%. This changes MIR for the
`test/CodeGen/PowerPC/branch_coalesce.ll` test like this:

```diff
-  successors: %bb.6(0x40000000); %bb.6(50.00%)
+  successors: %bb.6(0x80000000); %bb.6(100.00%)
```

This doesn't affect codegen on its own but fixing this helps with
fluctuations I have with some of my upcoming changes.
2023-09-25 10:41:04 -07:00
Nemanja Ivanovic
46d5d264fc [PowerPC] Improve kill flag computation and add verification after MI peephole
The MI Peephole pass has grown to include a large number of transformations over the years. Many of the transformations require re-computation of kill flags but don't do a good job of re-computing them. This causes us to have very common failures when the compiler is built with expensive checks. Over time, we added and augmented a function that is supposed to go and fix up kill flags after each transformation but we keep missing cases.
This patch does the following:
- Removes the function to re-compute kill flags
- Adds LiveVariables to compute and maintain kill flags while transforming code
- Adds re-computation of kill flags for the post-RA peepholes for each block that contains a transformed instruction

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D133103
2023-09-22 15:26:39 -04:00
Jay Foad
e0919b189b [CodeGen] Renumber slot indexes before register allocation (#66334)
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.

This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.
2023-09-19 11:18:12 +01:00
Craig Topper
f71a9e8bb7
[SelectionDAG][RISCV][PowerPC][X86] Use TargetConstant for immediates for ISD::PREFETCH. (#66601)
The intrinsic uses ImmArg so TargetConstant would be consistent
with how other intrinsics are handled.

This hides the constants from type legalization so we can remove
the promotion support.

isel patterns are updated accordingly.
2023-09-18 08:58:50 -07:00
Guozhi Wei
cbdccb30c2 [RA] Split a virtual register in cold blocks if it is not assigned preferred physical register
If a virtual register is not assigned preferred physical register, it means some
COPY instructions will be changed to real register move instructions. In this
case we can try to split the virtual register in colder blocks, if success, the
original COPY instructions can be deleted, and the new COPY instructions in
colder blocks will be generated as register move instructions. It results in
fewer dynamic register move instructions executed.

The new test case split-reg-with-hint.ll gives an example, the hot path contains
24 instructions without this patch, now it is only 4 instructions with this
patch.

Differential Revision: https://reviews.llvm.org/D156491
2023-09-15 19:52:50 +00:00
Maryam Moghadas
7b021f2e64 [PowerPC] Optimize VPERM and fix code order for swapping vector operands on LE
This patch reverts commit 7614ba0a5db8 to optimize VPERM when one of its
vector operands is XXSWAPD, similar to XXPERM. It also reorganizes the
little-endian swap code on LE, swapping the vector operand after
adjusting the mask operand. This ensures that the vector operand is
swapped at the correct point in the code, resulting in a valid
constant pool for the mask operand.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D149083
2023-09-13 15:00:49 -05:00
Simon Pilgrim
e6b85c3027 [DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case (REAPPLIED)
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction

As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.

Reapplied after reversion at e1e3c75c7dad72 with a tweak to the pseudo-probe-peep.ll test

Differential Revision: https://reviews.llvm.org/D158068
2023-09-13 12:33:39 +01:00
Simon Pilgrim
e1e3c75c7d Revert rG6c56cf71ee82ec3a28e0dfc2b751bd10c16929da "[DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case"
Need to address a missed test change
2023-09-13 11:27:47 +01:00
Simon Pilgrim
6c56cf71ee [DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction

As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.

Differential Revision: https://reviews.llvm.org/D158068
2023-09-13 11:01:58 +01:00
Qiu Chaofan
69b056d563 [PowerPC] Implement SchedModel for Power7
Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D158704
2023-09-13 14:55:07 +08:00
Qiu Chaofan
d4d0b5eaab Fix MIR failure after b922a362 2023-09-08 16:33:45 +08:00
Qiu Chaofan
b922a36211 [PowerPC] Define SchedModel for Power8
PowerPC subtargets prior to Power9 use the 'legacy' itinerary way to
provide scheduling information. This patch re-writes the tablegen file
to define the scheduling information in the new SchedModel way, which
can bring improvements to some benchmarks.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D154488
2023-09-08 15:43:21 +08:00
bzEq
d9efcb54c9
[PEI][PowerPC] Fix false alarm of stack size limit (#65559)
PPC64 allows stack size up to ((2^63)-1) bytes. Currently llc reports
```
warning: stack frame size (4294967568) exceeds limit (4294967295) in function 'main'
```
if the stack allocated is larger than 4G.
2023-09-08 15:16:00 +08:00
Amy Kwan
3f46e5453d [AIX][TLS] Produce a faster local-exec access sequence with -maix-small-local-exec-tls (And optimize when load/store offsets are 0)
This patch utilizes the -maix-small-local-exec-tls option added in
D155544 to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided.

The patch either produces an addi/la with a displacement off of r13
(the thread pointer) when the address is calculated, or it produces an
addi/la followed by a load/store when the address is calculated and
used for further accesses.

This patch also optimizes this sequence a bit more where we can remove
the addi/la when the load/store offset is 0. A follow up patch will
be posted to account for when the load/store offset is non-zero, and
currently in these situations we keep the addi/la that precedes the
load/store.

Furthermore, this access sequence is only performed for TLS variables
that are less than ~32KB in size.

Differential Revision: https://reviews.llvm.org/D155600
2023-09-07 20:05:29 -05:00
Amy Kwan
8bdbee8aaa [AIX][TLS] Add target attribute for -maix-small-local-exec-tls option.
This patch adds a target attribute for an AIX-specific option that
informs the compiler that it can use a faster access sequence for the
local-exec TLS model (formally named aix-small-local-exec-tls).

The Clang portion of this option is in D155544.
The initial implementation to generate the faster access sequence is in
D155600.

Differential Revision: https://reviews.llvm.org/D156203
2023-09-07 20:05:29 -05:00
stefanp-ibm
0a4a8bec34
[PowerPC] Turn string pooling on by default. (#65628)
This patch turns the string pooling pass on by default. Some tests are
updated as required.
2023-09-07 16:49:31 -04:00
Wael Yehia
11d5c7bd28 [AIX] Add threadId and use nanosecond timestamp in sinit/sterm symbols
With ThinLTO, when compiling SPEC 2017 omnetpp_r with -threads=4, two
small modules can end up with the same timestamp in their sinit symbols
when calculating time in seconds, creating duplicate definitions.

This patch uses a timestamp in nanoseconds.
Because the race can be between threads, embed the thread ID as well.

Reviewed By: xingxue, daltenty

Differential Revision: https://reviews.llvm.org/D159319
2023-09-07 17:46:41 +00:00
Amy Kwan
f94f85348d Revert "[AIX][TLS] Generate .extern and .ref references to __tls_get_addr for local-exec accesses."
This reverts commit f0b2f6954101c9052763a99a1e7ac135770e779a.
The implementation is incorrect and breaks compiling local-exec programs.
2023-09-07 12:10:37 -05:00
esmeyi
b85a9b3093 [PowerPC] Try to use less instructions to materialize 64-bit constant when High32=Low32.
Summary: Materialization a 64-bit constant with High32=Low32 only requires 2 instructions instead of 3 when Low32 can be materialized in 1 instruction.

Reviewed By: qiucf

Differential Revision: https://reviews.llvm.org/D158495
2023-09-07 13:03:17 -04:00
Stefan Pintilie
84e2fd7ee4 [PowerPC] Add a pass to merge all of the constant global arrays into one pool.
On PowerPC the number of TOC entries must be kept low for large
applications. In order to reduce the number of constant global arrays
we can pool them into one structure and then access them as the base
address of that structure plus some offset. The constant global arrays
may be arrays of `i8` which are constant strings but they may also be
arrays of `i32, i64, etc...`.

Reviewed By: lei, amyk

Differential Revision: https://reviews.llvm.org/D155730
2023-09-07 11:14:56 -04:00
Stefan Pintilie
492c1f3d7c [PowerPC] Merge rotate and clear into single instruction.
This patch tries to catch a codegen opportunity where the rotate and
mask can be merged into a single RLDCL instruction.

Reviewed By: lei, amyk

Differential Revision: https://reviews.llvm.org/D158328
2023-09-07 09:25:41 -04:00
Ting Wang
71be020dda [SelectionDAG][PowerPC] Memset reuse vector element for tail store
On PPC there are instructions to store element from vector(e.g.
stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail
constant in memset and constant splat array initialization.

This patch tries to explore these opportunities.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D138883
2023-09-06 01:52:38 -04:00
Amy Kwan
f0b2f69541 [AIX][TLS] Generate .extern and .ref references to __tls_get_addr for local-exec accesses.
Compiling with TLS variables requires -pthread, but if the user omits this
option, the compiler will not show any obvious indication during compilation
that -pthread is needed for programs using TLS variables. Instead, the user will
experience a segmentation fault when running programs with TLS variables in them
and without specifying -pthread.

This patch aims to generate .extern/.ref references to __tls_get_addr[DS] for
local-exec accesses, in order to trigger an error from the linker to indicate
that there is an undefined symbol to __tls_get_addr. Doing so will remind the
user to compile/link with -pthread.

Differential Revision: https://reviews.llvm.org/D151335
2023-09-05 12:15:14 -05:00