To store a byval parameter the existing code would store as many 8 byte elements
as was required to store the full size of the byval parameter.
For example, a paramter of size 16 would store two element of 8 bytes.
A paramter of size 12 would also store two elements of 8 bytes.
This would sometimes store too many bytes as the size of the paramter is not
always a factor of 8.
This patch fixes that issue and now byval paramters are stored with the correct
number of bytes.
Reviewed By: nemanjai, #powerpc, quinnp, amyk
Differential Revision: https://reviews.llvm.org/D121430
Add a compiler option and the instructions required to set the
special Data Stream Control Register (DSCR). The special register will
not be set by default.
Original patch by: Muhammad Usman
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D117013
All LLVM backends use MCDisassembler as a base class for their
instruction decoders. Use "const MCDisassembler *" for the decoder
instead of "const void *". Remove unnecessary static casts.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D122245
The BL8_NOTOC_RM instruction was incorrectly producing a relocation that reqired
a TOC restore after the call. This patch fixes that issue and the notoc
relocation is now used.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D122012
Rename file for PPCCTRLoopsVerify pass from PPCCTRLoops.cpp
to PPCCTRLoopsVerify.cpp.
There will be a new file PPCCTRLoops.cpp for PPC CTR loops
generation later.
After D108936, @llvm.smul.with.overflow.i64 was lowered to __multi3
instead of __mulodi4, which also doesn't exist on PowerPC 32-bit, not
even with compiler-rt. Block it as well so that we get inline code.
Because libgcc doesn't have __muloti4, we block that as well.
Fixes#54460.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D122090
Add the calling convention for the vector pair registers.
These registers overlap with the vector registers.
Part of an original patch by: Lei Huang
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D117225
We are going to remove the old 'perfect shuffle' optimization since it
brings performance penalty in hot loop around vectors. For example, in
following loop sharing the same mask:
%v.1 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>
%v.2 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>
The generated instructions will be `vmrglw-vmrghw-vmrglw-vmrghw` instead
of `vperm-vperm`. In some large loop cases, this causes 20%+ performance
penalty.
The original attempt to resolve this is to pre-record masks of every
shufflevector operation in DAG, but that is somewhat complex and brings
unnecessary computation (to scan all nodes) in optimization. Here we
disable it by default. There're indeed some cases becoming worse after
this, which will be fixed in a more careful way in future patches.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D121082
VSX introduced some permute instructions that are direct
replacements for Altivec ones except they can target all
the VSX registers. We have added code generation for most
of these but somehow missed the low/hi word merges (XXMRG[LH]W).
This caused some additional spills on some large
computationally intensive code.
This patch simply adds the missed patterns.
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
Currently in Clang, we have two types of builtins for fnmsub operation:
one for float/double vector, they'll be transformed into IR operations;
one for float/double scalar, they'll generate corresponding intrinsics.
But for the vector version of builtin, the 3 op chain may be recognized
as expensive by some passes (like early cse). We need some way to keep
the fnmsub form until code generation.
This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub
intrinsics.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D116015
This wraps up from D119053. The 2 headers are moved as described,
fixed file headers and include guards, updated all files where the old
paths were detected (simple grep through the repo), and `clang-format`-ed it all.
Differential Revision: https://reviews.llvm.org/D119876
There are two MMA patterns that have been added twice. This patch just removes
one set of petterns. Should not change the way MMA behaves.
Reviewed By: lei, #powerpc
Differential Revision: https://reviews.llvm.org/D120680
Currently all of the MMA instructions as well as the MMA related register info
is bundled with the Power 10 instructions. This patch just splits them out.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D120515
Added the license info as well as description about how classes should be named
based on existing documentation.
Reviewed By: lei, #powerpc
Differential Revision: https://reviews.llvm.org/D120530
Add the Power 10 instruction LXVKQ.
This patch was taken from an original patch by: Yi-Hong Lyu
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D117507
The Linux kernel build uses absolute expressions suffixed with @lo/@ha
relocations. This currently doesn't work for DS/DQ form instructions and
there is no reason for it not to. It also works with GAS.
This patch allows this as long as the value is a multiple of 4/16
for DS/DQ form.
Differential revision: https://reviews.llvm.org/D115419
Perfect shuffle was introduced into PowerPC backend years ago, and only
available in big-endian subtargets. This optimization has good effects
in simple cases, but brings serious negative impact in large programs
with many shuffle instructions sharing the same mask.
Here introduces a temporary backend hidden option to control it until we
implemented better way to fix the gap in vectorshuffle decomposition.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D120072
This patch updates the handling of vectors in getPreferredVectorAction():
For single-element and scalable vectors, fall back to default vector legalization
handling. For vNi1 vectors, add handling to either split or promote them in
order to prevent the production of wide v256i1/v512i1 types.
The following assertion is fixed by this patch, as we ended up producing the
wide vector types (that are used for MMA) in the backend prior to this fix.
```
Assertion failed: VT.getSizeInBits() == Operand.getValueSizeInBits() &&
"Cannot BITCAST between types of different sizes!"
```
Differential Revision: https://reviews.llvm.org/D119521
The LDMX instruction was to be potentially added in P9 but it was never added
in either ISA 3.0 or ISA 3.1. This patch removes that instruction as it is
currently still an invalid instruction.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D118074
Power ISA 3.1 adds xsmaxcqp/xsmincqp for quad-precision type-c max/min selection,
and this opens the opportunity to improve instruction selection on: llvm.maxnum.f128,
llvm.minnum.f128, and select_cc ordered gt/lt and (don't care) gt/lt.
Reviewed By: nemanjai, shchenz, amyk
Differential Revision: https://reviews.llvm.org/D117006
There's a few relevant forward declarations in there that may require downstream
adding explicit includes:
llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h
llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h
llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h
Counting preprocessed lines required to rebuild llvm-project on my setup:
before: 1052436830
after: 1049293745
Which is significant and backs up the change in addition to the usual benefits of
decreasing coupling between headers and compilation units.
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119244
For PGO on AIX, when we switch to the linux-style PGO variable access
(via _start and _stop labels), we need the compiler to generate a .ref
assembly for each of the three csects:
- __llvm_prf_data[RW]
- __llvm_prf_names[RO]
- __llvm_prf_vnds[RW]
We insert the .ref inside the __llvm_prf_cnts[RW] csect so that if it's
live then the 3 csects are live.
For example, for a testcase with at least one function definition, when
compiled with -fprofile-generate we should generate:
.csect __llvm_prf_cnts[RW],3
.ref __llvm_prf_data[RW] <<============ needs to be inserted
.ref __llvm_prf_names[RO] <<===========
the __llvm_prf_vnds is not always present, so we reference it only when
it's present.
Reviewed By: sfertile, daltenty
Differential Revision: https://reviews.llvm.org/D116607
This patch introduces the conversions from math function calls
to MASS library calls. To resolves calls generated with these conversions, one
need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX.
Differential: https://reviews.llvm.org/D101759
Reviewer: bmahjour
This patch updates the P10 patterns with a load feeding into an insertelt to
utilize the refactored load and store infrastructure, as well as updating any
tests that exhibit any codegen changes.
Furthermore, custom legalization is added for v4f32 on Power9 and above to not
only assist with adjusting the refactored load/stores for P10 vector insert,
but also it enables the utilization of direct moves.
Differential Revision: https://reviews.llvm.org/D115691
This patch updates how splat loads handled and is an extension of D106555.
Particularly, for v2i64/v4f32/v4i32 types, they are updated to handle only
non-extending loads. For v8i16/v16i8 types, they are updated to handle extending
loads only if the memory VT is the same vector element VT type.
A test case has been added to illustrate a scenario where a PPCISD::LD_SPLAT
node should not be produced. In this test, it depicts the following f64
extending load used in a v2f64 build vector, but the extending load is actually
used in more places other than the build vector (such as in t12 and t16).
```
Type-legalized selection DAG: %bb.0 'test:entry'
SelectionDAG has 20 nodes:
t0: ch = EntryToken
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t6: i64,ch = CopyFromReg t0, Register:i64 %2
t11: f64,ch = load<(load (s64) from %ir.b, !tbaa !7)> t0, t4, undef:i64
t16: f64 = fadd t31, t37
t34: ch = store<(store (s64) into %ir.c, !tbaa !7)> t31:1, t16, t6, undef:i64
t36: ch = TokenFactor t34, t37:1
t27: v2f64 = BUILD_VECTOR t37, t37
t22: ch,glue = CopyToReg t36, Register:v2f64 $v2, t27
t12: f64 = fadd t11, t37
t28: ch = store<(store (s64) into %ir.b, !tbaa !7)> t11:1, t12, t4, undef:i64
t31: f64,ch = load<(load (s64) from %ir.c, !tbaa !7)> t28, t6, undef:i64
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t37: f64,ch = load<(load (s32) from %ir.a, !tbaa !3), anyext from f32> t0, t2, undef:i64
t23: ch = PPCISD::RET_FLAG t22, Register:v2f64 $v2, t22:1
```
Differential Revision: https://reviews.llvm.org/D117803
This reverts commit ef8206320769ad31422a803a0d6de6077fd231d2.
- It conflicts with the existing llvm::size in STLExtras, which will now
never be called.
- Calling it without llvm:: breaks C++17 compat
In commit 1674d9b6b2da, I fixed the bug where we didn't consider
both words of the result of the comparison. However, the logic
needs to be different for eq and ne.
Namely for eq, we need both words of the doubleword to equal so it
is an AND. OTOH for ne, we need either word to be unequal so it
is an OR.