As we're going to replace this ambiguous option with more precise
instruction-level fast-math description, some tests need to be updated
and the option doesn't play any role in some of them.
This patch simplifies pattern (xxswap (vec-op (xxswap a) (xxswap b)))
into (vec-op a b) if vec-op is lane-insensitive. The motivating case
is ScalarToVector-VecOp-ExtractElement sequence on LE, but the
peephole itself is not related to endianness, so BE may also benefit
from this.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D97658
This pull request implements patterns to exploit the load rightmost vector
element instructions for loading element 0 on little endian PowerPC subtargets
into v8i16 and v16i8 vector registers for i16 and i8 data types.
Differential Revision: https://reviews.llvm.org/D94816#inline-921403
Add support for the TLS general dynamic access model to assembly
files on AIX 64-bit.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D98078
Since P8 is the oldest machine supported by MASSV pass,
_massv place holder is removed and the oldest version of
MASSV functions is assumed. If the P9 vector specific is
detected in the compilation process, the P8 prefix will
be updated to P9.
Differential Revision: https://reviews.llvm.org/D98064
Adds support for the TLS general dynamic access model to
assembly files on AIX 32-bit.
To generate the correct code sequence when accessing a TLS variable
`v`, we first create two TOC entry nodes, one for the variable offset, one
for the region handle. These nodes are followed by a `PPCISD::TLSGD_AIX`
node (new node introduced by this patch).
The `PPCISD::TLSGD_AIX` node (`TLSGDAIX` pseudo instruction) is
expanded to 2 copies (to put the variable offset and region handle in
the right registers) and a call to `__tls_get_addr`.
This patch also changes the way TC entries are generated in asm files.
If the generated TC entry is for the region handle of a TLS variable,
we add the `@m` relocation and the `.` prefix to the entry name.
For example:
```
L..C0:
.tc .v[TC],v[TL]@m -> region handle
L..C1:
.tc v[TC],v[TL] -> variable offset
```
Reviewed By: nemanjai, sfertile
Differential Revision: https://reviews.llvm.org/D97948
This changes the target data layout to make stack align to 16 bytes
on Power10. Before this change, stack was being aligned to 32 bytes.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D96265
I noticed that we were not folding expressions like this:
icmp ult (constexpr), null
in https://llvm.org/PR49355, so we end up with extremely large
icmp instructions as the constant expressions pile up on each other.
There is no potential to mis-fold an unsigned boundary condition
with a zero/null, so this is just falling through a crack in the
pattern matching.
The more general case of comparisons of non-zero constants and
constexpr are more tricky and may require the datalayout to know
how to cast to different types, etc. Negative tests verify that
we are only changing a subset of potential patterns.
Differential Revision: https://reviews.llvm.org/D98150
Patch adds support for passing vector call operands to variadic
functions. Arguments which are fixed shadow GPRs and stack space even
when they are passed in vector registers, while arguments passed through
ellipses are passed in properly aligned GPRs if available and on the
stack once all GPR arguments registers are consumed.
Differential Revision: https://reviews.llvm.org/D97956
This patch adds support for the default AltiVec ABI for AIX.
Vector registers 20 through 31 are marked as reserved and cannot
be used in the default ABI. This patch adds handling for this case
and also remove the default AltiVec ABI errors.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D96351
:: (store 1 + 4, addrspace 1)
->
:: (store 1 into undef + 4, addrspace 1)
An offset without a base isn't terribly useful but it's convenient to update
the offset without checking the value. For example, when breaking apart
stores into smaller units
Differential Revision: https://reviews.llvm.org/D97812
Patch adds support for passing vector arguments to variadic functions.
Arguments which are fixed shadow GPRs and stack space even when they are
passed in vector registers, while arguments passed through ellipses are
passed in(properly aligned GPRs if available and on the stack once all
GPR arguments registers are consumed.
Differential Revision: https://reviews.llvm.org/D97485
VirtRegRewriter may sometimes fail to correctly apply the kill flag where necessary,
which causes unecessary code gen on PowerPC. This patch fixes the way masks for
defined lanes are computed and the way mask for used lanes is computed.
Contact albion.fung@ibm.com instead of author for problems related to this commit.
Differential Revision: https://reviews.llvm.org/D92405
This patch allows generating TLS variables in assembly files on AIX.
Initialized and external uninitialized variables are generated with the
.csect pseudo-op and local uninitialized variables are generated with
the .comm/.lcomm pseudo-ops. The patch also adds a check to
explicitly say that TLS is not yet supported on AIX.
Reviewed by: daltenty, jasonliu, lei, nemanjai, sfertile
Originally patched by: bsaleil
Commandeered by: NeHuang
Differential Revision: https://reviews.llvm.org/D96184
This seems to be more of a Clang thing rather than a generic LLVM thing,
so this moves it out of LLVM pipelines and as Clang extension hooks into
LLVM pipelines.
Move the post-inline EEInstrumentation out of the backend pipeline and
into a late pass, similar to other sanitizer passes. It doesn't fit
into the codegen pipeline.
Also fix up EntryExitInstrumentation not running at -O0 under the new
PM. PR49143
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D97608
The code previously used two BUILD_PAIRs to concatenate the two UMULO
results with 0s in the lower bits to match original VT. Then it created
an ADD and a UADDO with the original bit width. Each of those operations
need to be expanded since they have illegal types.
Since we put 0s in the lower bits before the ADD, the lower half of the
ADD result will be 0. So the lower half of the UADDO result is
solely determined by the other operand. Since the UADDO need to
be split in half, we don't really needd an operation for the lower
bits. Unfortunately, we don't see that in type legalization and end up
creating something more complicated and DAG combine or
lowering aren't always able to recover it.
This patch directly generates the narrower ADD and UADDO to avoid
needing to legalize them. Now only the MUL is done on the original
type.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D97440
Under -O3 and -Ofast, the MASSV conversion prevents the sqrt call to be inlined.
Inline sqrt is faster than MASSV call on leppc.
Differential Revision: https://reviews.llvm.org/D97487
If a global object is listed in `@llvm.used`, place it in a unique section with
the `SHF_GNU_RETAIN` flag. The section is a GC root under `ld --gc-sections`
with LLD>=13 or GNU ld>=2.36.
For front ends which do not expect to see multiple sections of the same name,
consider emitting `@llvm.compiler.used` instead of `@llvm.used`.
SHF_GNU_RETAIN is restricted to ELFOSABI_GNU and ELFOSABI_FREEBSD in
binutils. We don't do the restriction - see the rationale in D95749.
The integrated assembler has supported SHF_GNU_RETAIN since D95730.
GNU as>=2.36 supports section flag 'R'.
We don't need to worry about GNU ld support because older GNU ld just ignores
the unknown SHF_GNU_RETAIN.
With this change, `__attribute__((retain))` functions/variables emitted
by clang will get the SHF_GNU_RETAIN flag.
Differential Revision: https://reviews.llvm.org/D97448
This is for XCOFF DWARF support.
Seems when DWARF debug is enable, symbol 0 has special usage
for AIX binder. At least, symbol 0 can not be the .text
section. Otherwise, we get some binding time error.
Add correct C_FILE symbol at index 0 here to make AIX binder
work.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D97117
This patch generates the vinsw, vinsd, vinsblx, vinshlx, vinswlx, vinsdlx,
vinsbrx, vinshrx, vinswrx and vinsdrx instructions for vector insertion on P10.
Differential Revision: https://reviews.llvm.org/D94454
Added -mrop-protection for Power PC to turn on codegen that provides some
protection from ROP attacks.
The option is off by default and can be turned on for Power 8, Power 9 and
Power 10.
This patch is for the option only. The feature will be implemented by a later
patch.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D96512
Summary:
Currently Shrinkwrap is not enabled on AIX.
This patch enables shrink wrap on 32 and 64 bit AIX, and 64 bit ELF.
Reviewed By: sfertile, nemanjai
Differential Revision: https://reviews.llvm.org/D95094
Do not defer to the base class when the register constraint is a
physical fpr. The base class will select SPILLTOVSRRC as the register
class and register allocation will fail on subtargets without VSX
registers.
Differential Revision: https://reviews.llvm.org/D91629
If we wait until the type is legalized, we'll lose information
about the orginal type and need to use larger magic constants.
This gets especially bad on RISCV64 where i64 is the only legal
type.
I've limited this to simple scalar types so it only works for
i8/i16/i32 which are most likely to occur. For more odd types
we might want to do a small promotion to a type where MULH is legal
instead.
Unfortunately, this does prevent some urem/srem+seteq matching since
that still require legal types.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D96210
As of commit 284f2bffc9bc5, the DAG Combiner gets rid of the masking of the
input to this node if the mask only keeps the bottom 16 bits. This is because
the underlying library function does not use the high order bits. However, on
PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits
from the register. Therefore, the library implementation of __gnu_h2f_ieee will
return an incorrect result if the bits aren't cleared.
This combine is desired for ARM (and possibly other targets) so this patch adds
a query to Target Lowering to check if this zeroing needs to be kept.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092
Differential revision: https://reviews.llvm.org/D96283
NOTE: This patch was originally written by Anil Mahmud. His code has been
rebased but otherwise left mostly unchanged.
A new instructon on Power 10 allows for the materialization of 34 bit
immediate values. This patch allows the compiler to take advantage of
the new instruction in this situation.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D92879
Some cases may be transformed into 32 bit splats before hitting the boolean statement, which may cause incorrect behaviour and provide XXSPLTI32DX with the incorrect values of splat. The condition was reversed so that the shortcut prevents this problem.
Differential Revision: https://reviews.llvm.org/D95634
If the APInt returned by BuildVectorSDNode::isConstantSplat() is narrower than
64 bits, the result produced by XXSPLTI32DX is incorrect. The result returned
by the function appears to be incorrect and we'll investigate/fix it in a
follow-up commit. However, since this causes miscompiles, we must
temporarily disable emitting this instruction for such values.
This intrinsic is supposed to have the permute control vector complemented on
little endian systems (as the ABI specifies and GCC implements). With the
current code gen, the result vector is byte-reversed.
Differential revision: https://reviews.llvm.org/D95004
Reassociating some patterns to generate more fma instructions to
reduce register pressure.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92071
PowerPC has its custom scheduler heuristic. It calls parent classes'
tryCandidate in override version, but the function returns void, so this
way doesn't actually help. This patch duplicates code from base scheduler
into PPC machine scheduler class, which does what we wanted.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D94464
Exploits the instruction xxsplti32dx.
It can be used to materialize any 64 bit scalar/vector splat by using two instances, one for the upper 32 bits and the other for the lower 32 bits. It should not materialize the cases which can be materialized by using the instruction xxspltidp.
Differential Revision: https://https://reviews.llvm.org/D90173
When performing peephole optimization to simplify the code, after removing
passed FPSP/XSRSP instruction we will set any uses of that FRSP/XSRSP to the
source of the FRSP/XSRSP.
We are finding the machine instruction using virtual register holding FRSP/XSRSP
results by searching all following instructions and encountering an issue
that the first use of the virtual register is a debug MI causing:
1. virtual register in the debug MI removed unexpectedly.
2. virtual register used in non-debug MI not replaced with the source of
FRSP/XSRSP. which stays in a undef status.
This patch fix the issue by only searching non-debug machine instruction using
virtual register holding FRSP/XSRSP results when the vr only has one non debug
usage.
Differential Revisien: https://reviews.llvm.org/D94711
Reviewed by: nemanjai
As of 8dacca943af8a53a23b1caf3142d10fb4a77b645, we sign extend the atomic loaded
operand for signed subword comparisons. However, the assumption that the other
operand is correctly sign extended doesn't always hold. This patch sign extends
the other operand if it needs to be sign extended.
This is a second fix for https://bugs.llvm.org/show_bug.cgi?id=30451
Differential revision: https://reviews.llvm.org/D94058
A bug in the system assembler can assemble the xxspltd extended
menemonic into the wrong instruction (extracting the wrong element).
Emit the full xxpermdi with all operands to work around the problem.
Differential Revision: https://reviews.llvm.org/D94419
Reassociating some patterns to generate more fma instructions to
reduce register pressure.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92071
This patch promotes result integer type of FP_TO_XINT in expanding.
So crash in conversion from ppc_fp128 to i1 will be fixed.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D92473
As part of the effort to improve AIX support, regression test coverage
misses quite a lot for AIX subtarget. This patch adds AIX triple to
those don't need extra change, and we can cover more cases in following
commits.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D94159
This is a follow-up fix to commit 03c8d6a0c4bd0016bdfd1e5.
Seems like we now end up with NeedInvert being set in the result
from LegalizeSetCCCondCode more often than in the past, so we
need to handle NeedInvert when expanding BR_CC.
Not sure how to deal with the "Tmp4.getNode()" case properly,
but current assumption is that that code path isn't impacted
by the changes in 03c8d6a0c4bd0016bdfd1e5 so we can simply move
the old assert into the if-branch and only handle NeedInvert in the
else-branch.
I think that the test case added here, for PowerPC, might have
failed also before commit 03c8d6a0c4bd0016bdfd1e5. But we started
to hit the assert more often downstream when having merged that
commit.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94762
If SETO/SETUO aren't legal, they'll be expanded and we'll end up
with 3 comparisons.
SETONE is equivalent to (SETOGT || SETOLT)
so if one of those operations is supported use that expansion. We
don't need both since we can commute the operands to make the other.
SETUEQ can be implemented with !(SETOGT || SETOLT) or (SETULE && SETUGE).
I've only implemented the first because it didn't look like most of the
affected targets had legal SETULE/SETUGE.
Reviewed By: frasercrmck, tlively, nemanjai
Differential Revision: https://reviews.llvm.org/D94450
PowerPC cores like e200z759n3 [1] using an efpu2 only support single precision
hardware floating point instructions. The single precision instructions efs*
and evfs* are identical to the spe float instructions while efd* and evfd*
instructions trigger a not implemented exception.
This patch introduces a new command line option -mefpu2 which leads to
single-hardware / double-software code generation.
[1] Core reference:
https://www.nxp.com/files-static/32bit/doc/ref_manual/e200z759CRM.pdf
Differential revision: https://reviews.llvm.org/D92935
Memory operands store a base alignment that does not factor in
the effect of the offset on the alignment.
Previously the printing code only printed the base alignment if
it was different than the size. If there is an offset, the reader
would need to figure out the effective alignment themselves. This
has confused me before and someone else was recently confused on
IRC.
This patch prints the possibly offset adjusted alignment if it is
different than the size. And prints the base alignment if it is
different than the alignment. The MIR parser has been updated to
read basealign in addition to align.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D94344
This patch finishes addressing unused prefixes under CodeGen: 2
remaining tests fixed, and then undo-ing the lit.local.cfg changes under
various subdirs and moving the policy under CodeGen.
Differential Revision: https://reviews.llvm.org/D94430