The intrinsic uses ImmArg so TargetConstant would be consistent
with how other intrinsics are handled.
This hides the constants from type legalization so we can remove
the promotion support.
isel patterns are updated accordingly.
If a virtual register is not assigned preferred physical register, it means some
COPY instructions will be changed to real register move instructions. In this
case we can try to split the virtual register in colder blocks, if success, the
original COPY instructions can be deleted, and the new COPY instructions in
colder blocks will be generated as register move instructions. It results in
fewer dynamic register move instructions executed.
The new test case split-reg-with-hint.ll gives an example, the hot path contains
24 instructions without this patch, now it is only 4 instructions with this
patch.
Differential Revision: https://reviews.llvm.org/D156491
This patch reverts commit 7614ba0a5db8 to optimize VPERM when one of its
vector operands is XXSWAPD, similar to XXPERM. It also reorganizes the
little-endian swap code on LE, swapping the vector operand after
adjusting the mask operand. This ensures that the vector operand is
swapped at the correct point in the code, resulting in a valid
constant pool for the mask operand.
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D149083
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction
As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.
Reapplied after reversion at e1e3c75c7dad72 with a tweak to the pseudo-probe-peep.ll test
Differential Revision: https://reviews.llvm.org/D158068
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction
As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.
Differential Revision: https://reviews.llvm.org/D158068
PowerPC subtargets prior to Power9 use the 'legacy' itinerary way to
provide scheduling information. This patch re-writes the tablegen file
to define the scheduling information in the new SchedModel way, which
can bring improvements to some benchmarks.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D154488
PPC64 allows stack size up to ((2^63)-1) bytes. Currently llc reports
```
warning: stack frame size (4294967568) exceeds limit (4294967295) in function 'main'
```
if the stack allocated is larger than 4G.
This patch utilizes the -maix-small-local-exec-tls option added in
D155544 to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided.
The patch either produces an addi/la with a displacement off of r13
(the thread pointer) when the address is calculated, or it produces an
addi/la followed by a load/store when the address is calculated and
used for further accesses.
This patch also optimizes this sequence a bit more where we can remove
the addi/la when the load/store offset is 0. A follow up patch will
be posted to account for when the load/store offset is non-zero, and
currently in these situations we keep the addi/la that precedes the
load/store.
Furthermore, this access sequence is only performed for TLS variables
that are less than ~32KB in size.
Differential Revision: https://reviews.llvm.org/D155600
This patch adds a target attribute for an AIX-specific option that
informs the compiler that it can use a faster access sequence for the
local-exec TLS model (formally named aix-small-local-exec-tls).
The Clang portion of this option is in D155544.
The initial implementation to generate the faster access sequence is in
D155600.
Differential Revision: https://reviews.llvm.org/D156203
With ThinLTO, when compiling SPEC 2017 omnetpp_r with -threads=4, two
small modules can end up with the same timestamp in their sinit symbols
when calculating time in seconds, creating duplicate definitions.
This patch uses a timestamp in nanoseconds.
Because the race can be between threads, embed the thread ID as well.
Reviewed By: xingxue, daltenty
Differential Revision: https://reviews.llvm.org/D159319
Summary: Materialization a 64-bit constant with High32=Low32 only requires 2 instructions instead of 3 when Low32 can be materialized in 1 instruction.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D158495
On PowerPC the number of TOC entries must be kept low for large
applications. In order to reduce the number of constant global arrays
we can pool them into one structure and then access them as the base
address of that structure plus some offset. The constant global arrays
may be arrays of `i8` which are constant strings but they may also be
arrays of `i32, i64, etc...`.
Reviewed By: lei, amyk
Differential Revision: https://reviews.llvm.org/D155730
This patch tries to catch a codegen opportunity where the rotate and
mask can be merged into a single RLDCL instruction.
Reviewed By: lei, amyk
Differential Revision: https://reviews.llvm.org/D158328
On PPC there are instructions to store element from vector(e.g.
stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail
constant in memset and constant splat array initialization.
This patch tries to explore these opportunities.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D138883
Compiling with TLS variables requires -pthread, but if the user omits this
option, the compiler will not show any obvious indication during compilation
that -pthread is needed for programs using TLS variables. Instead, the user will
experience a segmentation fault when running programs with TLS variables in them
and without specifying -pthread.
This patch aims to generate .extern/.ref references to __tls_get_addr[DS] for
local-exec accesses, in order to trigger an error from the linker to indicate
that there is an undefined symbol to __tls_get_addr. Doing so will remind the
user to compile/link with -pthread.
Differential Revision: https://reviews.llvm.org/D151335
mffsl is available since ISA 3.0. The builtin is named with ppc prefix
to follow our convention. For targets earlier than power9, GCC generates
extra code to support the functionality, while this patch does not
implement such behavior.
Reviewed By: nemanjai, tuliom
Differential Revision: https://reviews.llvm.org/D158065
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.
https://reviews.llvm.org/D157871
Summary:
There is no support in XCOFF for labels on common symbols. Therefore, an alias for a common symbol is not supported. Issue an error in the front end when an aliasee is a common symbol. Issue a similar error in the back end in case an IR specifies an alias for a common symbol.
Reviewed by: hubert.reinterpretcast, DiggerLin
Differential Revision: https://reviews.llvm.org/D158739
Before elimination of mostly empty block it makes sense to remove dead PHI nodes.
It open more opportunity for elimination plus eliminates dead code itself.
It appeared that change results in failing many unit tests and some of
them I've updated and for another one I disable this optimization.
The pattern I observed in the tests is that there is a infinite loop
without side effects. As a result after elimination of dead phi node all other
related instruction are also removed and tests stops to check what it is expected.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D158503
Summary: This patch handles the SectionKind of ReadOnlyWithRel on AIX. The failure was discovered during sanitizer enablement and occured with `-fsanitize-coverage` option.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D157483
This removes some diffs created by D153502.
I'm assuming an AND/OR won't be worse than an SMIN/SMAX. For
RISC-V at least, AND/OR can be a shorter encoding than SMIN/SMAX.
It's weird that we have two different functions responsible for
folding logic of setccs, but I'm not ready to try to untangle that.
I'm unclear if the PowerPC chang is a regression or not. It looks
like it might use more registers, but I don't understand PowerPC
register so I'm not sure.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D158292
D141386 changed the semantics of !range metadata to return poison
on violation. If !range is combined with !noundef, violation is
immediate UB instead, matching the old semantics.
In theory, these IR semantics should also carry over into SDAG.
In practice, DAGCombine has at least one key transform that is
invalid in the presence of poison, namely the conversion of logical
and/or to bitwise and/or (c7b537bf09/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L11252)).
Ideally, we would fix this transform, but this will require
substantial work to avoid codegen regressions.
In the meantime, avoid transferring !range metadata without
!noundef, effectively restoring the old !range metadata semantics
on the SDAG layer.
Fixes https://github.com/llvm/llvm-project/issues/64589.
Differential Revision: https://reviews.llvm.org/D157685
Set the ReplaceFlags variable to false, since there is code meant only
for the ADDItocHi/ADDItocL nodes. This has the side effect of disabling
the peephole when the load/store instruction has a non-zero offset.
This patch also fixes retrieving the `ImmOpnd` node from the AIX small
code model pseduos and does the same for the register operand node.
This allows cleaning up the later calls to replaceOperands.
Finally move calculating the MaxOffset into the code guarded by
ReplaceFlags as it is only used there and the comment is specific to the ELF
ABI.
Fixes https://github.com/llvm/llvm-project/issues/63927
Differential Revision: https://reviews.llvm.org/D155957
The current implementation generates a csect with a
".rodata.str.x.y" prefix for a MergeableCString variable definition.
However, a reference to such variable does not get the prefix in its
name because there's not enough information in the containing IR.
In particular, without seeing the initializer and absent of some other
indicators, we cannot tell that the referenced variable is a null-
terminated string.
When the AIX codegen in llvm was being developed, the prefixing was copied
from ELF without having the linker take advantage of the info.
Currently, the AIX linker does not have the capability to merge
MergeableCString variables. If such feature would ever get implemented,
the contract between the linker and compiler would have to be reconsidered.
Here's the before and after of this change:
```
@a = global i64 320255973571806, align 8
@strA = unnamed_addr constant [7 x i8] c"hello\0A\00", align 1 ;; Mergeable1ByteCString
@strB = unnamed_addr constant [8 x i8] c"Blahah\0A\00", align 1 ;; Mergeable1ByteCString
@strC = unnamed_addr constant [2 x i16] [i16 1, i16 0], align 2 ;; Mergeable2ByteCString
@strD = unnamed_addr constant [2 x i16] [i16 1, i16 1], align 2 ;; !isMergeableCString
@strE = external unnamed_addr constant [2 x i16], align 2
-fdata-sections:
.text extern .rodata.str1.1strA .text extern strA
0 SD RO 0 SD RO
.text extern .rodata.str1.1strB .text extern strB
0 SD RO 0 SD RO
.text extern .rodata.str2.2strC ===> .text extern strC
0 SD RO 0 SD RO
.text extern strD .text extern strD
0 SD RO 0 SD RO
.data extern a .data extern a
0 SD RW 0 SD RW
undef extern strE undef extern strE
0 ER UA 0 ER UA
-fno-data-sections:
.text unamex .rodata.str1.1 .text unamex .rodata
0 SD RO 0 SD RO
.text extern strA .text extern strA
0 LD RO 0 LD RO
.text extern strB .text extern strB
0 LD RO 0 LD RO
.text unamex .rodata.str2.2 ===> .text extern strC
0 SD RO 0 LD RO
.text extern strC .text extern strD
0 LD RO 0 LD RO
.text unamex .rodata .data unamex .data
0 SD RO 0 SD RW
.text extern strD .data extern a
0 LD RO 0 LD RW
.data unamex .data undef extern strE
0 SD RW 0 ER UA
.data extern a
0 LD RW
undef extern strE
0 ER UA
```
Reviewed by: David Tenty, Fangrui Song
Differential Revision: https://reviews.llvm.org/D156202
Correct PowerPC strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
Mostly these tests just needed the strictfp attribute on function
definitions. I've also removed the strictfp attribute from uses
of the constrained intrinsics because it comes by default since
D154991, but I only did this in tests I was changing anyway.
I have removed attributes added to declare lines of intrinsics. The
attributes of intrinsics cannot be changed in a test so I eliminated
attempts to do so.
Test changes verified with D146845.
Summary: D80642 added support for emitting AvailableExternally Linkage on AIX. However, an assertion of "Trying to get csect representation of this symbol but none was set." occurred when a function is declared as available_externally. This is due to we missing to generate a csect for the function. This patch fixes it.
Reviewed By: hubert.reinterpretcast, shchenz
Differential Revision: https://reviews.llvm.org/D156213
Signed-off-by: Esme Yi <esme.yi@ibm.com>
On AIX, a libatomic supporting inline quadword atomic operations has been released, so that compatibility is not an issue now, we can enable quadword atomics by default.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D151312
Summary: The source language ID and CPU version ID are required by debuggers on AIX. AIX's system assembler determines the source language ID based on the source file's name suffix, and the behavior in this patch is consistent with it.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D155684
The patch D153600 implemented `-frecord-command-line` for the XCOFF direct assembly path. This patch adds support for the XCOFF integrated assembly path.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D154921
CMP(A,C)||CMP(B,C) => CMP(MIN/MAX(A,B), C)
CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C)
This first patch handles integer types.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D153502
This is a follow up to D149722 and aims to address https://github.com/llvm/llvm-project/issues/63885.
Local-exec accesses were not previously accounted for in XCOFFObjectWriter.
Specifically, the R_TLS_LE relocation was not previously handled, which lead to
the incorrect value being written for the relocation target.
Within this patch, the value being written is set to the symbol's virtual
address and extra relocation tests are added.
Differential Revision: https://reviews.llvm.org/D155415
When generating XCOFF, the compiler generates a csect with an internal
name. Each function results in a label within the csect. This patch
replaces the internal name ".text" with an empty string "". This avoids
adding special code to handle a function text() in the source file, and
works better with some XCOFF tools that are confused when the csect and
the first function have the same address.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D154854
Future cpu instructions dmxxinstdmr512 and dmxxextfdmr512 insert and extract
quad vectors from the new wide accumulator(wacc) register class.
The introduction of these new instructions renders the p10 instructions
xxmtacc and xxmfacc obsolete since the new wacc register class is a better
choice for handing quad vector operations. This patch ensures that, for
future cpu, instructions dmxxinstdmr512 and dmxxextfdmr512 are generated
by custom lowering the intrinsics for xxm[t|f]acc to produce no instructions.
Reviewed By: amyk, lei
Differential Revision: https://reviews.llvm.org/D153034
Followup to D101178 - peephole optimization that converts a
load address instruction and a consuming load/store into just the
load/store when its safe to do so.
eg: converts the 2 instruction code sequence
la 4, i[TD](2)
stw 3, 0(4)
to
stw 3, i[TD](2)
Differential Revision: https://reviews.llvm.org/D101470