10 Commits

Author SHA1 Message Date
Maryam Moghadas
7614ba0a5d [PowerPC] Fix vperm codegen
Commit rG934d5fa2b8672695c335deed0e19d0e777c98403 changed the vperm codegen
for cases that vperm is not replaced by xxperm, this patch is to revert that.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D138736
2022-11-29 15:47:32 -06:00
Maryam Moghadas
934d5fa2b8 [PowerPC] Exploit xxperm, check for dead vectors and substitute vperm with xxperm
vperm instruction requires the data to be in the Altivec registers, if one of
the vector operands is not used after this vperm instruction then it can be
substituted by xxperm which doubles the number of available registers.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D133700
2022-11-23 13:28:12 -06:00
Kai Nacke
5403c59c60 [PPC] Opaque pointer migration, part 2.
The LIT test cases were migrated with the script provided by
Nikita Popov. Due to the size of the change it is split into
several parts.

Reviewed By: nemanja, nikic

Differential Revision: https://reviews.llvm.org/D135474
2022-10-11 17:24:06 +00:00
Quinn Pham
335e8bf100 [PowerPC] emit VSX instructions instead of VMX instructions for vector loads and stores
This patch changes the PowerPC backend to generate VSX load/store instructions
for all vector loads/stores on Power8 and earlier  (LE) instead of VMX
load/store instructions. The reason for this change is because VMX instructions
require the vector to be 16-byte aligned. So, a vector load/store will fail with
VMX instructions if the vector is misaligned. Also, `gcc` generates VSX
instructions in this situation which allow for unaligned access but require a
swap instruction after loading/before storing. This is not an issue for BE
because we already emit VSX instructions since no swap is required. And this is
not an issue on Power9 and up since we have access to `lxv[x]`/`stxv[x]` which
allow for unaligned access and do not require swaps.

This patch also delays the VSX load/store for LE combines until after
LegalizeOps to prioritize other load/store combines.

Reviewed By: #powerpc, stefanp

Differential Revision: https://reviews.llvm.org/D127309
2022-06-15 12:06:04 -05:00
David Green
4388f4f776 [DAG] Don't convert undef to 0 when creating buildvector
When inserting undef into buildvectors created from shuffles of
buildvectors, we convert elements to the largest needed type. This had
the effect of converting undef into 0, which isn't needed as the
buildvector implicitly truncates and trunc(zext(undef)) == undef.

Differential Revision: https://reviews.llvm.org/D121002
2022-03-06 18:35:34 +00:00
Amy Kwan
ba627a32e1 [PowerPC] Update Refactored Load/Store Implementation, XForm VSX Patterns, and Tests
This patch includes the following updates to the load/store refactoring effort introduced in D93370:
 - Update various VSX patterns that use to "force" an XForm, to instead just XForm.
   This allows the ability for the patterns to compute the most optimal addressing
   mode (and to produce a DForm instruction when possible)
- Update pattern and test case for the LXVD2X/STXVD2X intrinsics
- Update LIT test cases that use to use the XForm instruction to use the DForm instruction

Differential Revision: https://reviews.llvm.org/D95115
2021-07-16 09:28:48 -05:00
Nemanja Ivanovic
03e7fefff8 [PowerPC] Canonicalize shuffles on big endian targets as well
Extend shuffle canonicalization and conversion of shuffles fed by vectorized
scalars to big endian subtargets. For big endian subtargets, loads and direct
moves of scalars into vector registers put the data in the correct element for
SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is
narrower, the value still ends up in the wrong place - althouth a different
wrong place than on little endian targets.

This patch extends the combine that keeps values where they are if they feed a
shuffle to big endian targets.

Differential revision: https://reviews.llvm.org/D100478
2021-04-20 07:29:47 -05:00
QingShan Zhang
8b6674e64f [NFC][Test] Update the test with update_llc_test_checks.py 2020-10-09 02:26:03 +00:00
QingShan Zhang
f24ec7bdd0 [Power9] Enable the Out-of-Order scheduling model for P9 hw
When switched to the MI scheduler for P9, the hardware is modeled as out of order.
However, inside the MI Scheduler algorithm, we still use the in-order scheduling model
as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer
the op. So, only when all the available instructions issued, the pending instruction
could be scheduled. That is not true for our P9 hw in fact.

This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is
picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017.

With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows:

x264_r: +6.95%
cactuBSSN_r: +6.94%
lbm_r: +4.11%
xz_r: -3.85%

And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved. 

Reviewer: Nemanjai
Differential Revision: https://reviews.llvm.org/D55810

llvm-svn: 350285
2019-01-03 05:04:18 +00:00
Nemanja Ivanovic
6a74bfba20 [PowerPC] Keep vector int to fp conversions in vector domain
At present a v2i16 -> v2f64 convert is implemented by extracts to scalar,
scalar converts, and merge back into a vector. Use vector converts instead,
with the int data permuted into the proper position and extended if necessary.

Patch by RolandF.

Differential revision: https://reviews.llvm.org/D53346

llvm-svn: 345361
2018-10-26 03:19:13 +00:00