Add td definitions and asm/disasm tests for the quantum conversion
instructions in ISA 3.1 section 5.6.5
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D154394
The build failure should be fixed by de681d53. Follow-up refactor will
be done in future patches.
This reverts commit e7c5ced0b9f0551ea17e1d2b48be86f03a772c59.
On PowerPC VSX targets, fp-to-int will be transformed into xscv with
mfvsr. When the result is to be stored, mfvsr can be replaced by a
direct store.
This change simplifies the optimization by using existing fp-to-int
code, which helps CSE and handling strictfp cases.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D141473
A move towards using the generic ISD::ABDU nodes on more backends
Also support ISD::ABDS for v4i32 types using the existing signbit flip trick
PowerPC has a select(icmp_ugt(x,y),sub(x,y),sub(y,x)) -> abdu(x,y) combine that I intend to move to DAGCombiner in a future patch.
The ABS(SUB(X,Y)) -> PPCISD::VABSD(X,Y,1) v4i32 combine wasn't legal (https://alive2.llvm.org/ce/z/jc2hLU) - so I've removed it, having already added the legal sub nsw tests equivalent.
Differential Revision: https://reviews.llvm.org/D142313
Two of the float materialization patterns use the VSSRC regsiter class. This
register class is not available before Power 8. The patterns will stay the same
for Power 8 and up but must use the class F4RC for Power 7 and earlier.
This patch fixes those patterns.
Reviewed By: nemanjai, amyk, #powerpc
Differential Revision: https://reviews.llvm.org/D142120
This is a follow-on to https://reviews.llvm.org/D134073.
Currently, all of the "memri"-style complex operands, which contain
both a register and an immediate, are encoded into a single field in
the instruction definition. This requires complex encoders/decoders,
and instruction definitions that insert and extract the correct parts
of the bits.
Now, switch to naming and encoding/decoding the sub-operands
separately.
Thus, we can now disable useDeprecatedPositionallyEncodedOperands.
Reviewed By: barannikov88
Differential Revision: https://reviews.llvm.org/D137670
This is a follow-on to https://reviews.llvm.org/D134073.
After https://reviews.llvm.org/D137653 we can now switch the PPC
target away from positional operand matching.
This patch fixes all of the "easy" cases. While this changes a large
number of lines of tablegen source, it results in only a single
non-comment change in the code generated by tablegen: the (unused)
codegen-only "MTVRSAVEv" instruction was previously incorrectly
encoding operand 0, and now encodes (correctly) operand 1.
Changes which result in generated-code changes have been split off
into the next (smaller) patch, for ease of review.
Reviewed By: barannikov88
Differential Revision: https://reviews.llvm.org/D137661
Previous to this patch we only materialized 0.0 and all other floating point
values would be loaded from the TOC. This patch adds materialization for the
floating point values that can be represented as integers in [-16.0, 15.0].
For example we will now materialize 3.0 and -5.0 but not 4.7.
Reviewed By: nemanjai, lei, #powerpc
Differential Revision: https://reviews.llvm.org/D138844
vperm instruction requires the data to be in the Altivec registers, if one of
the vector operands is not used after this vperm instruction then it can be
substituted by xxperm which doubles the number of available registers.
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D133700
This patch fixes the following two bugs in `PPCInstrInfo::isSignOrZeroExtended` helper, which is used from sign-/zero-extension elimination in PPCMIPeephole pass.
- Registers defined by load with update (e.g. LBZU) were identified as already sign or zero-extended. But it is true only for the first def (loaded value) and not for the second def (i.e. updated pointer).
- Registers defined by ORIS/XORIS were identified as already sign-extended. But, it is not true for sign extension depending on the immediate (while it is ok for zero extension).
To handle the first case, the parameter for the helpers is changed from `MachineInstr` to a register number to distinguish first and second defs. Also, this patch moves the initialization of PPCMIPeepholePass to allow mir test case.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D40554
This patch updates two patterns involving `scalar_to_vector` and
`SCALAR_TO_VECTOR_PERMUTED` nodes to be safe for both 64-bit and 32-bit by
pulling the patterns out of the 64-bit specific guard. These patterns are
matched on POWER8 and above.
Differential Revision: https://reviews.llvm.org/D125389
This patch implements the following floating point negative absolute value
builtins that required for compatibility with the XL compiler:
```
double __fnabs(double);
float __fnabss(float);
```
These builtins will emit :
- fnabs on PWR6 and below, or if VSX is disabled.
- xsnabsdp on PWR7 and above, if VSX is enabled.
Differential Revision: https://reviews.llvm.org/D125506
Currently the regsiter operand definitions are found in three separate files.
This patch moves all of the definitions into PPCRegisterInfo.td.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D123543
VSX introduced some permute instructions that are direct
replacements for Altivec ones except they can target all
the VSX registers. We have added code generation for most
of these but somehow missed the low/hi word merges (XXMRG[LH]W).
This caused some additional spills on some large
computationally intensive code.
This patch simply adds the missed patterns.
Currently in Clang, we have two types of builtins for fnmsub operation:
one for float/double vector, they'll be transformed into IR operations;
one for float/double scalar, they'll generate corresponding intrinsics.
But for the vector version of builtin, the 3 op chain may be recognized
as expensive by some passes (like early cse). We need some way to keep
the fnmsub form until code generation.
This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub
intrinsics.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D116015
A big-endian version of vpermxor, named vpermxor_be, is added to LLVM
and Clang. vpermxor_be can be called directly on both the little-endian
and the big-endian platforms.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D114540
Currently, the floating point instructions that depend on
rounding mode are correctly marked in the PPC back end with
an implicit use of the RM register. Similarly, instructions
that explicitly define the register are marked with an
implicit def of the same register. So for the most part,
RM-using code won't be moved across RM-setting instructions.
However, calls are not marked as RM-setting instructions so
code can be moved across calls. This is generally desired,
but so is the ability to turn off this behaviour with an
appropriate option - and -frounding-math really should be
that option.
This patch provides a set of call instructions (for direct
and indirect calls) that are marked with an implicit def of
the RM register. These will be used for calls that are marked
with the strictfp attribute.
Differential revision: https://reviews.llvm.org/D111433
Add comments to explain why XXPERMDIs and XXPERMDI have different input register
classes, vsfrc for XXPERMDIs and vsrc for XXPERMDI.
This addresses the comments in abandoned patch D113178, we keep using `f0` instead
of using `vs0` for XXPERMDIs on purpose.
This patch removes the uneccessary mf/mtvsr generated in conjunction
with xscvdpsxws/xscvdpuxws.
Differential revision: https://reviews.llvm.org/D109902
This patch is in a series of patches to provide
builtins for compatibility with the XL compiler.
This patch adds builtins related to floating point
operations
Reviewed By: #powerpc, nemanjai, amyk, NeHuang
Differential Revision: https://reviews.llvm.org/D103986
This patch implements store, load, move from and to registers related
builtins, as well as the builtin for stfiw. The patch aims to provide
feature parady with xlC on AIX.
Differential revision: https://reviews.llvm.org/D105946
Implement a subset of builtins required for compatiblilty with AIX XL compiler.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D105930
This patch includes the following updates to the load/store refactoring effort introduced in D93370:
- Update various VSX patterns that use to "force" an XForm, to instead just XForm.
This allows the ability for the patterns to compute the most optimal addressing
mode (and to produce a DForm instruction when possible)
- Update pattern and test case for the LXVD2X/STXVD2X intrinsics
- Update LIT test cases that use to use the XForm instruction to use the DForm instruction
Differential Revision: https://reviews.llvm.org/D95115
The code gen for f32 to i32 bitcast is not currently the most efficient;
this patch removes some unneccessary instructions gerneated.
Differential revision: https://reviews.llvm.org/D100782
These patterns are missing even though the underlying instruction
doesn't really care about the type. Added these patterns to resolve
https://bugs.llvm.org/show_bug.cgi?id=50084
There are two reasons this shouldn't be restricted to Power8 and up:
1. For XL compatibility
2. Because clang will expand comparison operators to these intrinsics*
*Without this patch, the following causes a selection error:
int test(vector signed long a, vector signed long b) {
return a < b;
}
This patch provides the handling for the intrinsics in the back
end and removes the Power8 guards from the predicate functions
(vec_{all|any}_{eq|ne|gt|ge|lt|le}).
When an integer is converted into floating point in subword vector extract,
it can be done in 2 instructions instead of the 3+ instructions it generates
right now. This patch removes the uncessary generation.
Differential: https://reviews.llvm.org/D100604
This is a simple fix on LE. On BE, vector shuffles are categorized into
different ops. We may need more work to eliminate these in
tablegen/pre-isel.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D101605
This patch introduces a new infrastructure that is used to select the load and
store instructions in the PPC backend.
The primary motivation is that the current implementation of selecting load/stores
is dependent on the ordering of patterns in TableGen. Given this limitation, we
are not able to easily and reliably generate the P10 prefixed load and stores
instructions (such as when the immediates that fit within 34-bits). This
refactoring is meant to provide us with more control over the patterns/different
forms to exploit, as well as eliminating dependency of pattern declaration in TableGen.
The idea of this refactoring is that it introduces a set of addressing modes that
correspond to different instruction formats of a particular load and store
instruction, along with a set of common flags that describes a load/store.
Whenever a load/store instruction is being selected, we analyze the instruction
and compute a set of flags for it. The computed flags are then used to
select the most optimal load/store addressing mode.
This patch is the first of a series of patches to be committed - it contains the
initial implementation of the refactored load/store selection infrastructure and
also updates P8/P9 patterns to adopt this infrastructure. The idea is that
incremental patches will add more implementation and support, and eventually
the old implementation will be removed.
Differential Revision: https://reviews.llvm.org/D93370
These are added for compatibility with XLC. They are similar to
vec_cts and vec_ctu except that the result is a doubleword vector
regardless of the parameter type.
We currently do not utilize instructions that convert single
precision vectors to doubleword integer vectors. These conversions
come up in code occasionally and this improvement allows us to
open code some functions that need to be added to altivec.h.
Extend shuffle canonicalization and conversion of shuffles fed by vectorized
scalars to big endian subtargets. For big endian subtargets, loads and direct
moves of scalars into vector registers put the data in the correct element for
SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is
narrower, the value still ends up in the wrong place - althouth a different
wrong place than on little endian targets.
This patch extends the combine that keeps values where they are if they feed a
shuffle to big endian targets.
Differential revision: https://reviews.llvm.org/D100478
The VSX tablegen file has some rather eggregious uses of
COPY_TO_REGCLASS even in situations where it needs to use
SUBREG_TO_REG. While this produces correct code, it often doesn't
allow the register coalescer to coalesce copies and the resulting
code ends up being suboptimal. This patch just changes over
patterns that should use SUBREG_TO_REG.
This pull request implements patterns to exploit the load rightmost vector
element instructions for loading element 0 on little endian PowerPC subtargets
into v8i16 and v16i8 vector registers for i16 and i8 data types.
Differential Revision: https://reviews.llvm.org/D94816#inline-921403
This patch generates the vinsw, vinsd, vinsblx, vinshlx, vinswlx, vinsdlx,
vinsbrx, vinshrx, vinswrx and vinsdrx instructions for vector insertion on P10.
Differential Revision: https://reviews.llvm.org/D94454
This intrinsic is supposed to have the permute control vector complemented on
little endian systems (as the ABI specifies and GCC implements). With the
current code gen, the result vector is byte-reversed.
Differential revision: https://reviews.llvm.org/D95004