3277 Commits

Author SHA1 Message Date
Stefan Pintilie
78406ac898 [PowerPC][P10] Add Vector pair calling convention
Add the calling convention for the vector pair registers.
These registers overlap with the vector registers.

Part of an original patch by: Lei Huang

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D117225
2022-03-15 14:08:42 -05:00
Qiu Chaofan
300e1293de [PowerPC] Disable perfect shuffle by default
We are going to remove the old 'perfect shuffle' optimization since it
brings performance penalty in hot loop around vectors. For example, in
following loop sharing the same mask:

  %v.1 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>
  %v.2 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>

The generated instructions will be `vmrglw-vmrghw-vmrglw-vmrghw` instead
of `vperm-vperm`. In some large loop cases, this causes 20%+ performance
penalty.

The original attempt to resolve this is to pre-record masks of every
shufflevector operation in DAG, but that is somewhat complex and brings
unnecessary computation (to scan all nodes) in optimization. Here we
disable it by default. There're indeed some cases becoming worse after
this, which will be fixed in a more careful way in future patches.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D121082
2022-03-15 15:52:24 +08:00
Nemanja Ivanovic
766ca2c59e [PowerPC] Add missed VSX shuffles instead of Altivec ones
VSX introduced some permute instructions that are direct
replacements for Altivec ones except they can target all
the VSX registers. We have added code generation for most
of these but somehow missed the low/hi word merges (XXMRG[LH]W).
This caused some additional spills on some large
computationally intensive code.

This patch simply adds the missed patterns.
2022-03-14 10:11:54 -05:00
Xiang1 Zhang
c31014322c TLS loads opimization (hoist)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D120000
2022-03-10 09:29:06 +08:00
Masoud Ataei
30f30e1c12 [PowerPC] Fix the none tail call in scalar MASS conversion
This patch is proposing a fix for patch https://reviews.llvm.org/D101759
on none tail call math function conversion to MASS call.

Differential: https://reviews.llvm.org/D121016

reviewer: @nemanjai
2022-03-08 08:59:17 -08:00
Qiu Chaofan
b2497e5435 [PowerPC] Add generic fnmsub intrinsic
Currently in Clang, we have two types of builtins for fnmsub operation:
one for float/double vector, they'll be transformed into IR operations;
one for float/double scalar, they'll generate corresponding intrinsics.

But for the vector version of builtin, the 3 op chain may be recognized
as expensive by some passes (like early cse). We need some way to keep
the fnmsub form until code generation.

This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub
intrinsics.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D116015
2022-03-07 13:00:06 +08:00
David Green
4388f4f776 [DAG] Don't convert undef to 0 when creating buildvector
When inserting undef into buildvectors created from shuffles of
buildvectors, we convert elements to the largest needed type. This had
the effect of converting undef into 0, which isn't needed as the
buildvector implicitly truncates and trunc(zext(undef)) == undef.

Differential Revision: https://reviews.llvm.org/D121002
2022-03-06 18:35:34 +00:00
Kai Luo
1cfcbf197c [PowerPC][atomics] Precommit test cases for i128 cmpxchg. NFC. 2022-03-03 10:47:52 +08:00
Xiang1 Zhang
65588a0776 Revert "TLS loads opimization (hoist)"
Revert for more reviews

This reverts commit 30e612ebdfb0f243eb63d93487790a53c26ae873.
2022-03-02 14:10:11 +08:00
Xiang1 Zhang
30e612ebdf TLS loads opimization (hoist)
Reviewed By: Wang Pheobe, Topper Craig

Differential Revision: https://reviews.llvm.org/D120000
2022-03-02 10:37:24 +08:00
Jay Foad
719bac55df [MIRParser] Diagnose too large align values in MachineMemOperands
When parsing MachineMemOperands, MIRParser treated the "align" keyword
the same as "basealign". Really "basealign" should specify the
alignment of the MachinePointerInfo base value, and "align" should
specify the alignment of that base value plus the offset.

This worked OK when the specified alignment was no larger than the
alignment of the offset, but in cases like this it just caused
confusion:

    STW killed %18, 4, %stack.1.ap2.i.i :: (store (s32) into %stack.1.ap2.i.i + 4, align 8)

MIRPrinter would never have printed this, with an offset of 4 but an
align of 8, so it must have been written by hand. MIRParser would
interpret "align 8" as "basealign 8", but I think it is better to give
an error and force the user to write "basealign 8" if that is what they
really meant.

Differential Revision: https://reviews.llvm.org/D120400

Change-Id: I7eeeefc55c2df3554ba8d89f8809a2f45ada32d8
2022-02-24 15:32:08 +00:00
Stefan Pintilie
b3e63ee2e5 [NFC][PowerPC] Fix the check-cpu.ll test case.
This test doesn't work because the CHECK-NOT line is actually checking
something that only exists on stderr and not stdout.
Changed the test so that we now check both stderr and stdout.
Changed the test so that we check pwr9, pwr10, and future. The cpu names of
power9 or power10 are not supported in the llc backend.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D120349
2022-02-23 14:09:34 -06:00
Craig Topper
440c4b705a [SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y).
Previous we used sra (X, size(X)-1); xor (add (X, Y), Y).

By placing sub at the end, we allow RISCV to combine sign_extend_inreg
with it to form subw.

Some X86 tests for Z - abs(X) seem to have improved as well.

Other targets look to be a wash.

I had to modify ARM's abs matching code to match from sub instead of
xor. Maybe instead ISD::ABS should be made legal. I'll try that in
parallel to this patch.

This is an alternative to D119099 which was focused on RISCV only.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D119171
2022-02-20 21:11:23 -08:00
esmeyi
7b67d2e398 Reland [XCOFF][llvm-objdump] change the priority of symbols with the same address by symbol types.
Fix the Buildbot failure #19373.

Differential Revision: https://reviews.llvm.org/D117642
2022-02-20 21:51:10 -05:00
esmeyi
0bf3fec4cd Revert "[XCOFF][llvm-objdump] change the priority of symbols with"
This reverts commit 2ad662172cbbd1ca53489bf8bddb0183d7692708.

Buildbot failure #19373
2022-02-18 04:12:32 -05:00
esmeyi
2ad662172c [XCOFF][llvm-objdump] change the priority of symbols with
the same address by symbol types.

Summary: In XCOFF, each section comes with a default symbol
         with the same name as the section. It doesn't bind
         to code locations and it may cause incorrect display
         of symbol names under `llvm-objdump -d`.
         This patch changes the priority of symbols with the
         same address by symbol type.

Reviewed By: jhenderson, shchenz

Differential Revision: https://reviews.llvm.org/D117642
2022-02-18 00:29:10 -05:00
Amy Kwan
5dc0a1657b [PowerPC] Fix __builtin_pdepd and __builtin_pextd to be 64-bit and P10 only.
The `__builtin_pdepd` and `__builtin_pextd` are P10 builtins that are meant to
be used under 64-bit only. For instance, when the builtins are compiled under
32-bit mode:
```
$ cat t.c
unsigned long long foo(unsigned long long a, unsigned long long b) {
  return __builtin_pextd(a,b);
}

$ clang -c t.c -mcpu=pwr10 -m32
ExpandIntegerResult #0: t31: i64 = llvm.ppc.pextd TargetConstant:i32<6928>, t28, t29

fatal error: error in backend: Do not know how to expand the result of this operator!
```
This patch adds sema checking for these builtins to compile under 64-bit
mode only and on P10. The builtins will emit a diagnostic when they are compiled on
non-P10 compilations and on 32-bit mode.

Differential Revision: https://reviews.llvm.org/D118753
2022-02-15 12:30:50 -06:00
Amy Kwan
ac5a5a9cfe [PowerPC] Add default handling for single element vectors, and split/promote vNi1 vectors.
This patch updates the handling of vectors in getPreferredVectorAction():

For single-element and scalable vectors, fall back to default vector legalization
handling. For vNi1 vectors, add handling to either split or promote them in
order to prevent the production of wide v256i1/v512i1 types.

The following assertion is fixed by this patch, as we ended up producing the
wide vector types (that are used for MMA) in the backend prior to this fix.

```
Assertion failed: VT.getSizeInBits() == Operand.getValueSizeInBits() &&
"Cannot BITCAST between types of different sizes!"
```

Differential Revision: https://reviews.llvm.org/D119521
2022-02-15 08:44:08 -06:00
Roman Lebedev
9ff087598e
[NFC][CodeGen][PPC] Autogenerate checklines in a test to simplify further updates 2022-02-11 01:21:45 +03:00
Ting Wang
097a95f2df [PowerPC] Add custom lowering for SELECT_CC fp128 using xsmaxcqp
Power ISA 3.1 adds xsmaxcqp/xsmincqp for quad-precision type-c max/min selection,
and this opens the opportunity to improve instruction selection on: llvm.maxnum.f128,
llvm.minnum.f128, and select_cc ordered gt/lt and (don't care) gt/lt.

Reviewed By: nemanjai, shchenz, amyk

Differential Revision: https://reviews.llvm.org/D117006
2022-02-09 21:48:28 -05:00
Wael Yehia
addd073325 [AIX][PowerPC][PGO] Generate .ref for some PGO sections
For PGO on AIX, when we switch to the linux-style PGO variable access
(via _start and _stop labels), we need the compiler to generate a .ref
assembly for each of the three csects:

 -   __llvm_prf_data[RW]
 -   __llvm_prf_names[RO]
 -   __llvm_prf_vnds[RW]

We insert the .ref inside the __llvm_prf_cnts[RW] csect so that if it's
live then the 3 csects are live.

For example, for a testcase with at least one function definition, when
compiled with -fprofile-generate we should generate:

        .csect __llvm_prf_cnts[RW],3
        .ref __llvm_prf_data[RW]   <<============ needs to be inserted
        .ref __llvm_prf_names[RO]  <<===========

the __llvm_prf_vnds is not always present, so we reference it only when
it's present.

Reviewed By: sfertile, daltenty

Differential Revision: https://reviews.llvm.org/D116607
2022-02-05 06:34:20 -05:00
Masoud Ataei
8ce13bc93b [PowerPC] Option controling scalar MASS convertion
differential: https://reviews.llvm.org/D119035

reviewer: bmahjour
2022-02-04 13:24:22 -08:00
Masoud Ataei
256d253332 [PowerPC] Scalar IBM MASS library conversion pass
This patch introduces the conversions from math function calls
to MASS library calls. To resolves calls generated with these conversions, one
need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX.

Differential: https://reviews.llvm.org/D101759

Reviewer: bmahjour
2022-02-02 07:54:19 -08:00
Amy Kwan
0d6e64755a [PowerPC] Update P10 vector insert patterns to use refactored load/stores, and update handling of v4f32 vector insert.
This patch updates the P10 patterns with a load feeding into an insertelt to
utilize the refactored load and store infrastructure, as well as updating any
tests that exhibit any codegen changes.

Furthermore, custom legalization is added for v4f32 on Power9 and above to not
only assist with adjusting the refactored load/stores for P10 vector insert,
but also it enables the utilization of direct moves.

Differential Revision: https://reviews.llvm.org/D115691
2022-02-01 08:48:37 -06:00
Amy Kwan
9cc5b064f1 [PowerPC] Update handling of splat loads for v4i32/v4f32/v2i64 to require non-extending loads.
This patch updates how splat loads handled and is an extension of D106555.

Particularly, for v2i64/v4f32/v4i32 types, they are updated to handle only
non-extending loads. For v8i16/v16i8 types, they are updated to handle extending
loads only if the memory VT is the same vector element VT type.

A test case has been added to illustrate a scenario where a PPCISD::LD_SPLAT
node should not be produced. In this test, it depicts the following f64
extending load used in a v2f64 build vector, but the extending load is actually
used in more places other than the build vector (such as in t12 and t16).
```
Type-legalized selection DAG: %bb.0 'test:entry'
SelectionDAG has 20 nodes:
  t0: ch = EntryToken
  t4: i64,ch = CopyFromReg t0, Register:i64 %1
  t6: i64,ch = CopyFromReg t0, Register:i64 %2
  t11: f64,ch = load<(load (s64) from %ir.b, !tbaa !7)> t0, t4, undef:i64
        t16: f64 = fadd t31, t37
      t34: ch = store<(store (s64) into %ir.c, !tbaa !7)> t31:1, t16, t6, undef:i64
    t36: ch = TokenFactor t34, t37:1
    t27: v2f64 = BUILD_VECTOR t37, t37
  t22: ch,glue = CopyToReg t36, Register:v2f64 $v2, t27
      t12: f64 = fadd t11, t37
    t28: ch = store<(store (s64) into %ir.b, !tbaa !7)> t11:1, t12, t4, undef:i64
  t31: f64,ch = load<(load (s64) from %ir.c, !tbaa !7)> t28, t6, undef:i64
    t2: i64,ch = CopyFromReg t0, Register:i64 %0
  t37: f64,ch = load<(load (s32) from %ir.a, !tbaa !3), anyext from f32> t0, t2, undef:i64
  t23: ch = PPCISD::RET_FLAG t22, Register:v2f64 $v2, t22:1
```

Differential Revision: https://reviews.llvm.org/D117803
2022-01-28 08:23:01 -06:00
Yousuf Ali
dad2b6e797 [PowerPC][AIX] Support toc-data attribute for read-only globals.
The patch handles the addition of constant global variables to the table
of contents.

Differential Revision: https://reviews.llvm.org/D116181
2022-01-27 10:47:22 -05:00
Nemanja Ivanovic
0c56bc92e4 [PowerPC] Fix eq/ne comparison of v2i64 pre-Power8
In commit 1674d9b6b2da, I fixed the bug where we didn't consider
both words of the result of the comparison. However, the logic
needs to be different for eq and ne.
Namely for eq, we need both words of the doubleword to equal so it
is an AND. OTOH for ne, we need either word to be unequal so it
is an OR.
2022-01-26 08:59:08 -06:00
Qiu Chaofan
ad0345aed1 [PowerPC] Emit gnu_attribute according to float-abi metadata
According to GNU as documentation, PowerPC supports some .gnu_attribute
tags to represent the vector and float ABI type in the object file.
Some linkers like GNU ld respects the attribute and will prevent objects
with conflicting ABIs being linked.

This patch emits gnu_attribute value in assembly printer according to
the float-abi metadata. More attributes for soft-fp, hard single/double
and even vector ABI need to be supported in the future.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D117193
2022-01-26 13:28:50 +08:00
Sean Fertile
a2505bd063 [PowerPC][AIX] Override markFunctionEnd()
During fast-isel calling 'markFunctionEnd' in the base class will call
tidyLandingPads. This can cause an issue where we have determined that
we need ehinfo and emitted a traceback table with the bits set to
indicate that we will be emitting the ehinfo, but the tidying deletes
all landing pads. In this case we end up emitting a reference to
__ehinfo.N symbol, but not emitting a definition to said symbol and the
resulting file fails to assemble.

Differential Revision: https://reviews.llvm.org/D117040
2022-01-25 10:08:53 -05:00
Bjorn Pettersson
109cc5adcc [DAGCombine] Fold SRA of a load into a narrower sign-extending load
An sra is basically sign-extending a narrower value. Fold away the
shift by doing a sextload of a narrower value, when it is legal to
reduce the load width accordingly.

Differential Revision: https://reviews.llvm.org/D116930
2022-01-25 12:14:48 +01:00
Quinn Pham
6a028296fe [PowerPC] Emit warning when SP is clobbered by asm
This patch emits a warning when the stack pointer register (`R1`) is found in
the clobber list of an inline asm statement. Clobbering the stack pointer is
not supported.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D112073
2022-01-24 15:12:23 -06:00
Sander de Smalen
4f8fdf7827 [ISEL] Canonicalise constant splats to RHS.
SelectionDAG::getNode() canonicalises constants to the RHS if the
operation is commutative, but it doesn't do so for constant splat
vectors. Doing this early helps making certain folds on vector types,
simplifying the code required for target DAGCombines that are enabled
before Type legalization.

Somewhat to my surprise, DAGCombine doesn't seem to traverse the
DAG in a post-order DFS, so at the time of doing some custom fold where
the input is a MUL, DAGCombiner::visitMUL hasn't yet reordered the
constant splat to the RHS.

This patch leads to a few improvements, but also a few  minor regressions,
which I traced down to D46492. When I tried reverting this change to see
if the changes were still necessary, I ran into some segfaults. Not sure
if there is some latent bug there.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117794
2022-01-24 09:38:36 +00:00
Qiu Chaofan
8dedf9b58b [PowerPC] Change CTR clobber estimation for 128-bit floating types
Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D117459
2022-01-22 23:20:14 +08:00
Fangrui Song
e6cdef187e [XRay][test] Clean up llc RUN lines 2022-01-21 17:00:03 -08:00
Mircea Trofin
e67430cca4 [MLGO] ML Regalloc Eviction Advisor
The bulk of the implementation is common between 'release' mode (==AOT-ed
model) and 'development' mode (for training), the main difference is
that in development mode, we may also log features (for training logs),
inject scoring information (currently after the Virtual Register
Rewriter) and then produce the log file.

This patch also introduces the score injection pass, 'Register
Allocation Pass Scoring', which is trivially just logging the score in
development mode.

Differential Revision: https://reviews.llvm.org/D117147
2022-01-19 11:00:32 -08:00
Stefan Pintilie
1324bb29f7 [PowerPC] Fix issue with strict float to int conversion.
When doing the float to int conversion the strict conversion also needs to
retun a chain. This patch fixes that.

Reviewed By: nemanjai, #powerpc, qiucf

Differential Revision: https://reviews.llvm.org/D117464
2022-01-19 10:57:22 -06:00
Sean Fertile
10d3bf9518 [PowerPC][AIX] Fallback to DAG-ISEL if global has toc-data attribute.
FAST-ISEL should fall back to DAG-ISEL when a global variable has the
toc-data attribute. A number of the checks were duplicated in the lit
test becuase of
1) Slightly different output between -O0 and -O2 due to FAST-ISEL vs
   DAG-ISEL codegen.
2) In preperation of a peephole optimization that will run when
   optimizations are enabled.

Differential Revision: https://reviews.llvm.org/D115373
2022-01-17 16:21:38 -05:00
Sanjay Patel
fe17ce0fa6 [PowerPC] add RUN lines for both endians to test; NFC
The load narrowing transform works for both targets,
so we might as well test both with simple examples
like this.
2022-01-13 10:49:23 -05:00
Nick Desaulniers
79ebc3b0dd [llvm][test] rewrite callbr to use i rather than X constraint NFC
In D115311, we're looking to modify clang to emit i constraints rather
than X constraints for callbr's indirect destinations. Prior to doing
so, update all of the existing tests in llvm/ to match.

Reviewed By: void, jyknight

Differential Revision: https://reviews.llvm.org/D115410
2022-01-11 11:31:08 -08:00
Nick Desaulniers
9c4b49db19 [ShrinkWrap] check for PPC's non-callee-saved LR
As pointed out in https://reviews.llvm.org/D115688#inline-1108193, we
don't want to sink the save point past an INLINEASM_BR, otherwise
prologepilog may incorrectly sink a prolog past the MBB containing an
INLINEASM_BR and into the wrong MBB.

ShrinkWrap is getting this wrong because LR is not in the list of callee
saved registers. Specifically, ShrinkWrap::useOrDefCSROrFI calls
RegisterClassInfo::getLastCalleeSavedAlias which reads
CalleeSavedAliases which was populated by
RegisterClassInfo::runOnMachineFunction by iterating the list of
MCPhysReg returned from MachineRegisterInfo::getCalleeSavedRegs.

Because PPC's LR is non-allocatable, it's NOT considered callee saved.
Add an interface to TargetRegisterInfo for such a case and use it in
Shrinkwrap to ensure we don't sink a prolog past an INLINEASM or
INLINEASM_BR that clobbers LR.

Reviewed By: jyknight, efriedma, nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D116424
2022-01-11 10:01:34 -08:00
Nadav Rotem
e2cc091a7d Fix a missed opportunity to merge stores.
This commit fixes a missed opportunity in merging consecutive stores.
The code that searches for stores skipped the case of stores that
directly connect to the root. The comment above the implementation lists
this case but the code did not handle it. I found this pattern when
looking into the shared_ptr destructor. GCC generates the right
sequence. Here is a small repo:

    int foo(int* buff) {
        buff[0] = 0;
        int x = buff[1];
        buff[1] = 0;
        return x;
    }

Differential Revision: https://reviews.llvm.org/D116895
2022-01-10 13:49:02 -08:00
Chen Zheng
2c46ca96e2 [PowerPC] fast isel can lower intrinsics call on AIX.
Reviewed By: qiucf

Differential Revision: https://reviews.llvm.org/D114778
2022-01-10 02:30:05 +00:00
Qiu Chaofan
c9e8a516df [NFC] Pre-commit case for PowerPC perfect shuffle 2022-01-07 18:07:26 +08:00
Nikita Popov
f430c1eb64 [Tests] Add elementtype attribute to indirect inline asm operands (NFC)
This updates LLVM tests for D116531 by adding elementtype attributes
to operands that correspond to indirect asm constraints.
2022-01-06 14:23:51 +01:00
Stefan Pintilie
04496201e0 [PowerPC] Add support for ROP protection for 32 bit.
Add support for Return Oriented Programming (ROP) protection for 32 bit.
This patch also adds a testing for AIX on both 64 and 32 bit.

Reviewed By: amyk

Differential Revision: https://reviews.llvm.org/D111362
2022-01-05 15:15:53 -06:00
Philip Reames
b061d86c69 [SCEV] Compute exit count from overflow check expressed w/ x.with.overflow intrinsics
This ports the logic we generate in instcombine for a single use x.with.overflow check for use in SCEV's analysis. The result is that we can prove trip counts for many checks, and (through existing logic) often discharge them.

Motivation comes from compiling a simple example with -ftrapv.

Differential Revision: https://reviews.llvm.org/D116499
2022-01-04 09:44:23 -08:00
Nemanja Ivanovic
de4e0195ae [PowerPC] Add missed test case updates
In commit 1674d9b6b2da914619c7c197336bb74f7988cf38,
I missed adding the updates to existing test cases.
This should bring the bots back to green.
2021-12-21 14:55:19 -06:00
Nemanja Ivanovic
1674d9b6b2 [PowerPC] Fix vector equality comparison for v2i64 pre-Power8
The current code makes the assumption that equality
comparison can be performed with a word comparison
instruction. While this is true if the entire 64-bit
results are used, it does not generally work. It is
possible that the low order words and high order
words produce different results and a user of only
one will get the wrong result.

This patch adds an and of the result words so that
each word has the result of the comparison of the
entire doubleword that contains it.

Differential revision: https://reviews.llvm.org/D115678
2021-12-21 14:28:41 -06:00
Mircea Trofin
09103807e7 [NFC][regalloc] Introduce the RegAllocEvictionAdvisorAnalysis
This patch introduces the eviction analysis and the eviction advisor,
the default implementation, and the scaffolding for introducing the
other implementations of the advisor.

Differential Revision: https://reviews.llvm.org/D115707
2021-12-16 17:56:46 -08:00
Florian Hahn
59a85a7a52
[PPC] Update test after f5f421e0eefa492. 2021-12-16 11:28:54 +00:00