3681 Commits

Author SHA1 Message Date
Ting Wang
71be020dda [SelectionDAG][PowerPC] Memset reuse vector element for tail store
On PPC there are instructions to store element from vector(e.g.
stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail
constant in memset and constant splat array initialization.

This patch tries to explore these opportunities.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D138883
2023-09-06 01:52:38 -04:00
Amy Kwan
f0b2f69541 [AIX][TLS] Generate .extern and .ref references to __tls_get_addr for local-exec accesses.
Compiling with TLS variables requires -pthread, but if the user omits this
option, the compiler will not show any obvious indication during compilation
that -pthread is needed for programs using TLS variables. Instead, the user will
experience a segmentation fault when running programs with TLS variables in them
and without specifying -pthread.

This patch aims to generate .extern/.ref references to __tls_get_addr[DS] for
local-exec accesses, in order to trigger an error from the linker to indicate
that there is an undefined symbol to __tls_get_addr. Doing so will remind the
user to compile/link with -pthread.

Differential Revision: https://reviews.llvm.org/D151335
2023-09-05 12:15:14 -05:00
Qiu Chaofan
082c5d7f63 [PowerPC] Implement builtin for mffsl
mffsl is available since ISA 3.0. The builtin is named with ppc prefix
to follow our convention. For targets earlier than power9, GCC generates
extra code to support the functionality, while this patch does not
implement such behavior.

Reviewed By: nemanjai, tuliom

Differential Revision: https://reviews.llvm.org/D158065
2023-09-05 11:22:09 +08:00
Matt Arsenault
b14e83d1a4 IR: Add llvm.exp10 intrinsic
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.

https://reviews.llvm.org/D157871
2023-09-01 19:45:03 -04:00
Chen Zheng
a69cb20768 [NFC] Fix the PowerPC broken cases in D152215.
Reviewed By: qiucf

Differential Revision: https://reviews.llvm.org/D159052
2023-09-01 02:07:48 -04:00
Stephen Peckham
282da83756 [XCOFF][AIX] Issue an error when specifying an alias for a common symbol
Summary:

There is no support in XCOFF for labels on common symbols. Therefore, an alias for a common symbol is not supported. Issue an error in the front end when an aliasee is a common symbol. Issue a similar error in the back end in case an IR specifies an alias for a common symbol.

Reviewed by: hubert.reinterpretcast, DiggerLin

Differential Revision:  https://reviews.llvm.org/D158739
2023-08-31 11:43:47 -04:00
Qiu Chaofan
21bea1a208 [PowerPC] Support initial-exec TLS relocation on AIX
Add TLS_IE relocation type to XCOFF writer, and emit code sequence for
initial-exec TLS variables.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D156292
2023-08-30 16:22:16 +08:00
Chen Zheng
732f63d96d [PowerPC]set default min-jump-table-entries to 64 on PPC
Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D159050
2023-08-29 21:42:22 -04:00
Chen Zheng
833b1e307f [NFC] add testcase for MinimumJumpTableEntries change on PowerPC. 2023-08-29 21:13:50 -04:00
Serguei Katkov
a701b7e368 [CGP] Remove dead PHI nodes before elimination of mostly empty blocks
Before elimination of mostly empty block it makes sense to remove dead PHI nodes.
It open more opportunity for elimination plus eliminates dead code itself.

It appeared that change results in failing many unit tests and some of
them I've updated and for another one I disable this optimization.
The pattern I observed in the tests is that there is a infinite loop
without side effects. As a result after elimination of dead phi node all other
related instruction are also removed and tests stops to check what it is expected.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D158503
2023-08-29 04:35:06 +00:00
esmeyi
8514d207ba [AIX] Handle ReadOnlyWithRel kind on AIX.
Summary: This patch handles the SectionKind of ReadOnlyWithRel on AIX. The failure was discovered during sanitizer enablement and occured with `-fsanitize-coverage` option.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D157483
2023-08-28 00:21:09 -04:00
Craig Topper
2ad50f354a [DAGCombiner][RISCV][AArch64][PowerPC] Restrict foldAndOrOfSETCC from using SMIN/SMAX where and OR/AND would do.
This removes some diffs created by D153502.

I'm assuming an AND/OR won't be worse than an SMIN/SMAX. For
RISC-V at least, AND/OR can be a shorter encoding than SMIN/SMAX.

It's weird that we have two different functions responsible for
folding logic of setccs, but I'm not ready to try to untangle that.

I'm unclear if the PowerPC chang is a regression or not. It looks
like it might use more registers, but I don't understand PowerPC
register so I'm not sure.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158292
2023-08-23 20:26:23 -07:00
Kai Luo
1ceaec3e81 [PowerPC][altivec] Optimize codegen of vec_promote
According to https://www.ibm.com/docs/en/xl-c-and-cpp-linux/16.1.1?topic=functions-vec-promote, elements not specified by the input index argument are undefined. So that we don't need to set these elements to be zeros.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D158487
2023-08-24 02:10:13 +00:00
esmeyi
96b5ea6e00 [NFC][PowerPC] Add cases for 64-bit constants. 2023-08-23 04:10:16 -04:00
Stefan Pintilie
d0e1e7649b [NFC][PowerPC] Add a test case for rotate and clear.
Added a test case for situations where a rotate is followed by a clear.
NFC because only a test case is added.
2023-08-21 11:01:47 -04:00
Sean Fertile
cef56b9318 Revert "[XCOFF][AIX] Peephole optimization for toc-data."
This reverts commit 5e28d30f1fb10faf2db2f8bf0502e7fd72e6ac2e.
2023-08-15 10:40:35 -04:00
Sean Fertile
ce658829c9 Revert "[PPC][AIX] Fix toc-data peephole bug and some related cleanup."
This reverts commit b37c7ed0c95c7f24758b1532f04275b4bb65d3c1.
2023-08-15 10:40:35 -04:00
Nikita Popov
9deee6bffa [SDAG] Don't transfer !range metadata without !noundef to SDAG (PR64589)
D141386 changed the semantics of !range metadata to return poison
on violation. If !range is combined with !noundef, violation is
immediate UB instead, matching the old semantics.

In theory, these IR semantics should also carry over into SDAG.
In practice, DAGCombine has at least one key transform that is
invalid in the presence of poison, namely the conversion of logical
and/or to bitwise and/or (c7b537bf09/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L11252)).
Ideally, we would fix this transform, but this will require
substantial work to avoid codegen regressions.

In the meantime, avoid transferring !range metadata without
!noundef, effectively restoring the old !range metadata semantics
on the SDAG layer.

Fixes https://github.com/llvm/llvm-project/issues/64589.

Differential Revision: https://reviews.llvm.org/D157685
2023-08-14 09:04:27 +02:00
Sean Fertile
b37c7ed0c9 [PPC][AIX] Fix toc-data peephole bug and some related cleanup.
Set the ReplaceFlags variable to false, since there is code meant only
for the ADDItocHi/ADDItocL nodes. This has the side effect of disabling
the peephole when the load/store instruction has a non-zero offset.
This patch also fixes retrieving the `ImmOpnd` node from the AIX small
code model pseduos and does the same for the register operand node.
This allows cleaning up the later calls to replaceOperands.
Finally move calculating the MaxOffset into the code guarded by
ReplaceFlags as it is only used there and the comment is specific to the ELF
ABI.

Fixes https://github.com/llvm/llvm-project/issues/63927

Differential Revision: https://reviews.llvm.org/D155957
2023-08-10 10:23:15 -04:00
Wael Yehia
9d4e8c09f4 [XCOFF] Do not put MergeableCStrings in their own section
The current implementation generates a csect with a
".rodata.str.x.y" prefix for a MergeableCString variable definition.
However, a reference to such variable does not get the prefix in its
name because there's not enough information in the containing IR.
In particular, without seeing the initializer and absent of some other
indicators, we cannot tell that the referenced variable is a null-
terminated string.

When the AIX codegen in llvm was being developed, the prefixing was copied
from ELF without having the linker take advantage of the info.
Currently, the AIX linker does not have the capability to merge
MergeableCString variables. If such feature would ever get implemented,
the contract between the linker and compiler would have to be reconsidered.

Here's the before and after of this change:
```
@a = global i64 320255973571806, align 8
@strA = unnamed_addr constant [7 x i8] c"hello\0A\00", align 1  ;; Mergeable1ByteCString
@strB = unnamed_addr constant [8 x i8] c"Blahah\0A\00", align 1 ;; Mergeable1ByteCString
@strC = unnamed_addr constant [2 x i16] [i16 1, i16 0], align 2 ;; Mergeable2ByteCString
@strD = unnamed_addr constant [2 x i16] [i16 1, i16 1], align 2 ;; !isMergeableCString
@strE = external unnamed_addr constant [2 x i16], align 2

-fdata-sections:
  .text  extern        .rodata.str1.1strA        .text  extern        strA
    0    SD       RO                               0    SD       RO
  .text  extern        .rodata.str1.1strB        .text  extern        strB
    0    SD       RO                               0    SD       RO
  .text  extern        .rodata.str2.2strC  ===>  .text  extern        strC
    0    SD       RO                               0    SD       RO
  .text  extern        strD                      .text  extern        strD
    0    SD       RO                               0    SD       RO
  .data  extern        a                         .data  extern        a
    0    SD       RW                               0    SD       RW
  undef  extern        strE                      undef  extern        strE
    0    ER       UA                               0    ER       UA

-fno-data-sections:
  .text  unamex        .rodata.str1.1            .text  unamex        .rodata
    0    SD       RO                               0    SD       RO
  .text  extern        strA                      .text  extern        strA
    0    LD       RO                               0    LD       RO
  .text  extern        strB                      .text  extern        strB
    0    LD       RO                               0    LD       RO
  .text  unamex        .rodata.str2.2      ===>  .text  extern        strC
    0    SD       RO                               0    LD       RO
  .text  extern        strC                      .text  extern        strD
    0    LD       RO                               0    LD       RO
  .text  unamex        .rodata                   .data  unamex        .data
    0    SD       RO                               0    SD       RW
  .text  extern        strD                      .data  extern        a
    0    LD       RO                               0    LD       RW
  .data  unamex        .data                     undef  extern        strE
    0    SD       RW                               0    ER       UA
  .data  extern        a
    0    LD       RW
  undef  extern        strE
    0    ER       UA
```

Reviewed by: David Tenty, Fangrui Song

Differential Revision: https://reviews.llvm.org/D156202
2023-07-29 03:24:21 +00:00
Kevin P. Neal
7e0e8b7ace [FPEnv][PowerPC] Correct strictfp tests.
Correct PowerPC strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

Mostly these tests just needed the strictfp attribute on function
definitions.  I've also removed the strictfp attribute from uses
of the constrained intrinsics because it comes by default since
D154991, but I only did this in tests I was changing anyway.

I have removed attributes added to declare lines of intrinsics. The
attributes of intrinsics cannot be changed in a test so I eliminated
attempts to do so.

Test changes verified with D146845.
2023-07-26 09:12:29 -04:00
esmeyi
e83b8a5e71 [XCOFF] Enable available_externally linkage for functions.
Summary: D80642 added support for emitting AvailableExternally Linkage on AIX. However, an assertion of "Trying to get csect representation of this symbol but none was set." occurred when a function is declared as available_externally. This is due to we missing to generate a csect for the function. This patch fixes it.

Reviewed By: hubert.reinterpretcast, shchenz

Differential Revision: https://reviews.llvm.org/D156213

Signed-off-by: Esme Yi <esme.yi@ibm.com>
2023-07-25 22:47:11 -04:00
Kai Luo
f26af16e2c [PowerPC][AIX] Enable quadword atomics by default for AIX
On AIX, a libatomic supporting inline quadword atomic operations has been released, so that compatibility is not an issue now, we can enable quadword atomics by default.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D151312
2023-07-25 08:21:07 +08:00
esmeyi
776195865d [XCOFF] Write source language ID and CPU version ID into C_FILE symbol.
Summary: The source language ID and CPU version ID are required by debuggers on AIX. AIX's system assembler determines the source language ID based on the source file's name suffix, and the behavior in this patch is consistent with it.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D155684
2023-07-24 00:35:24 -04:00
Kishan Parmar
41af6ece6c [PowerPC/SPE] powerpcspe load and store instruction has
8-bit offset instead of 16-bit unlike other load/store instructions.
so if stack grows any further than 8-bit, create one emergency slot
for spilling.
2023-07-23 13:24:35 +05:30
Jake Egan
311abf5fc0 Implement -frecord-command-line for XCOFF integrated assembler path
The patch D153600 implemented `-frecord-command-line` for the XCOFF direct assembly path. This patch adds support for the XCOFF integrated assembly path.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D154921
2023-07-20 09:45:37 -04:00
Konstantina Mitropoulou
4c42ab1199 [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns
CMP(A,C)||CMP(B,C) => CMP(MIN/MAX(A,B), C)
CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C)

This first patch handles integer types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153502
2023-07-17 17:13:47 -07:00
Amy Kwan
8e0e442c1d [AIX][TLS] Account for local-exec accesses in XCOFFObjectWriter
This is a follow up to D149722 and aims to address https://github.com/llvm/llvm-project/issues/63885.
Local-exec accesses were not previously accounted for in XCOFFObjectWriter.
Specifically, the R_TLS_LE relocation was not previously handled, which lead to
the incorrect value being written for the relocation target.

Within this patch, the value being written is set to the symbol's virtual
address and extra relocation tests are added.

Differential Revision: https://reviews.llvm.org/D155415
2023-07-17 12:15:44 -05:00
Stephen Peckham
ac5d5351d4 Use empty symbol name for XCOFF text csect
When generating XCOFF, the compiler generates a csect with an internal
name.  Each function results in a label within the csect.  This patch
replaces the internal name ".text" with an empty string "".  This avoids
adding special code to handle a function text() in the source file, and
works better with some XCOFF tools that are confused when the csect and
the first function have the same address.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D154854
2023-07-15 16:13:48 -04:00
Kamau Bridgeman
62c1cf7c63 [PowerPC][Future] Enable __builtin_mma_xxm[t|f]acc
Future cpu instructions dmxxinstdmr512 and dmxxextfdmr512 insert and extract
quad vectors from the new wide accumulator(wacc) register class.
The introduction of these new instructions renders the p10 instructions
xxmtacc and xxmfacc obsolete since the new wacc register class is a better
choice for handing quad vector operations. This patch ensures that, for
future cpu, instructions dmxxinstdmr512 and dmxxextfdmr512 are generated
by custom lowering the intrinsics for xxm[t|f]acc to produce no instructions.

Reviewed By: amyk, lei

Differential Revision: https://reviews.llvm.org/D153034
2023-07-14 13:38:40 -05:00
Sean Fertile
5e28d30f1f [XCOFF][AIX] Peephole optimization for toc-data.
Followup to D101178 - peephole optimization that converts a
load address instruction and a consuming load/store into just the
load/store when its safe to do so.

eg: converts the 2 instruction code sequence
  la 4, i[TD](2)
  stw 3, 0(4)
to
  stw 3, i[TD](2)

Differential Revision: https://reviews.llvm.org/D101470
2023-07-13 20:40:09 -04:00
Nemanja Ivanovic
329b8cd3e3 [PowerPC] Improve code gen for vector add
Improve codegen for vectors modulo additions.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D154447
2023-07-13 15:21:49 -04:00
Nikita Popov
edb2fc6dab [llvm] Remove explicit -opaque-pointers flag from tests (NFC)
Opaque pointers mode is enabled by default, no need to explicitly
enable it.
2023-07-12 14:35:55 +02:00
Jake Egan
bbd0d123d3 Implement -frecord-command-line for XCOFF
This patch extends support of the option `-frecord-command-line` to XCOFF. XCOFF doesn’t have custom sections like ELF, so the command line data is emitted to a .info section instead. A C_INFO symbol is generated with the .info section to preserve the command line data past the link step. Multiple command lines are separated by newlines and null bytes. The command line data can be retrieved on AIX with command `what file_name`.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D153600
2023-07-10 12:47:07 -04:00
Matt Arsenault
310f839612 DAG: Lower is.fpclass fcInf to fcmp of fabs
InstCombine should have taken care of this, but I think
this is more useful in the future when the expansion
tries to handle multiple cases at a time with fcmp.

x87 looks worse to me but the only thing I know about it is that
I aggressively do not care about it.

https://reviews.llvm.org/D143198
2023-07-07 17:00:10 -04:00
Nemanja Ivanovic
b0e249d5e2 Reland "[PowerPC] Remove extend between shift and and"
The commit originally caused a bootstrap failure on the big endian
PPC bot as the combine was interfering with the legalizer when
applied on illegal types. This update restricts the combine to
the only types for which it is actually needed. Tested on PPC BE
bootstrap locally.
2023-07-07 14:45:05 -04:00
Qiu Chaofan
a2b5117df7 [PowerPC] Update InputOps of Power10 SchedModel
Count of input operands affect pipeline forwarding in scheduling model.
Previous Power10 model definition arranges some instructions into
incorrect groups, by counting the wrong number of input operands.

This patch updates the model, setting the input operands count correctly
by excluding irrelevant immediate operands and count memory operands of
load instructions correctly.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D153842
2023-07-07 22:46:22 +08:00
zhijian
d6d7f7b1d2 [AIX][XCOFF] print out the traceback info
Summary:

  Adding a new option -traceback-table to print out the traceback info of xcoff ojbect file.

Reviewers: James Henderson, Fangrui Song, Stephen Peckham, Xing Xue

Differential Revision: https://reviews.llvm.org/D89049
2023-07-06 11:47:08 -04:00
Amy Kwan
598cccea80 [AIX][TLS] Generate optimized local-exec access code sequence using X-Form loads/stores
This patch is a follow up to D149722, D152669 and D153645, where a slightly more
optimized code sequence is generated for 64-bit and 32-bit local-exec accesses
when optimizations are turned on.

Handling is added PPCISelDAGToDAG.cpp in order to check if any D-form loads or
stores that follow an PPCISD::ADD_TLS can be optimized to use an X-Form load or
store. In this particular situation, this allows the ADD_TLS node to be removed
completely.

Differential Revision: https://reviews.llvm.org/D150367
2023-07-06 07:57:05 -05:00
Nemanja Ivanovic
7cd9084c69 Revert "[PowerPC] Remove extend between shift and and"
This reverts commit a57236de4eb8f38b4201647b10146941cbbb5c0b.
Causes a bootstrap failure on ppc64be.
2023-07-05 20:04:49 -04:00
Nemanja Ivanovic
a57236de4e [PowerPC] Remove extend between shift and and
The SDAG will sometimes insert an extend between
the shift and an and (immediate) even though the
immediate is narrower than the narrow size.
This does not allow us to produce a rotate
instruction (such as rlwinm).
This patch just adds a combine to move the extend
onto the and.

Differential revision: https://reviews.llvm.org/D152911
2023-07-05 16:33:07 -04:00
esmeyi
2d74cf1f24 [XCOFF] Force recording a relocation for weak symbol label.
Summary: Currently, if there are multiple definitions of the same symbol declared has weak linkage, the linker may choose the wrong one when they are compiled with integrated-as. This patch fixes the issue. If the target symbol is a weak label we must not attempt to resolve the fixup directly. Emit a relocation and leave resolution of the final target address to the linker.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D153839
2023-07-05 01:58:18 -04:00
Lei Huang
c7c3d71414 [PowerPC] add testcase for vector add and shift 2023-07-04 10:45:19 -04:00
Ting Wang
0b955fee90 [PowerPC][NFC] add SADDO/SSUBO test case
Differential Revision: https://reviews.llvm.org/D152339

Reviewed By: qiucf
2023-06-29 20:35:59 -04:00
Ting Wang
919588fd10 [PowerPC][NFC] expose issue on absol-jump-table-enabled.ll (relocation-model=pic + ppc-use-absolute-jumptables)
Differential Revision: https://reviews.llvm.org/D154047
2023-06-29 20:32:15 -04:00
Matt Arsenault
003b58f65b IR: Add llvm.frexp intrinsic
Add an intrinsic which returns the two pieces as multiple return
values. Alternatively could introduce a pair of intrinsics to
separately return the fractional and exponent parts.

AMDGPU has native instructions to return the two halves, but could use
some generic legalization and optimization handling. For example, we
should be able to handle legalization of f16 on older targets, and for
bf16. Additionally antique targets need a hardware workaround which
would be better handled in the backend rather than in library code
where it is now.
2023-06-28 14:50:16 -04:00
Amy Kwan
11b71ade51 [PowerPC][TLS] Add additional TLS X-Form loads/store instructions
This patch is a follow up to D43315, and adds the following new load/store
TLS specific instructions for integer and floating point scalar types:
```
LHAXTLS
LWAXTLS
LHAXTLS_32
LWAXTLS_32
LFSXTLS
LFDXTLS
STFSXTLS
STFDXTLS
```
These instructions can be used to optimized TLS sequences when D-Form
loads/stores follow an ADD_TLS instruction.

Duplicate versions of these instructions are also added within an isAsmParserOnly=1
block (similar to D47382) to allow llvm-mc to assemble these instructions.

Differential Revision: https://reviews.llvm.org/D153645
2023-06-27 11:33:38 -05:00
Matthias Braun
02ba5b8c6b Ignore load/store until stack address computation
No longer conservatively assume a load/store accesses the stack when we
can prove that we did not compute any stack-relative address up to this
point in the program.

We do this in a cheap not-quite-a-dataflow-analysis: Assume
`NoStackAddressUsed` when all predecessors of a block already guarantee
it. Process blocks in reverse post order to guarantee that except for
loop headers we have processed all predecessors of a block before
processing the block itself. For loops we accept the conservative answer
as they are unlikely to be shrink-wrappable anyway.

Differential Revision: https://reviews.llvm.org/D152213
2023-06-26 13:50:36 -07:00
Matthias Braun
759b217626 Switch tests to use update_llc_test_checks
Switch and update some tests to use `update_llc_test_checks` to reduce
clutter in upcoming change.

Differential Revision: https://reviews.llvm.org/D152215
2023-06-26 13:50:36 -07:00
Matt Arsenault
f2596b754c SeparateConstOffsetFromGEP: Don't use SCEV
This was only using the SCEV expressions as a map key, which we can do
just as well with the value pointers. This also allows it to handle
vectors.
2023-06-26 13:58:06 -04:00