2489 Commits

Author SHA1 Message Date
Amy Kwan
b631f86ac5 [TLI][PowerPC] Introduce TLI query to check if MULH is cheaper than MUL + SHIFT
This patch introduces a TargetLowering query, isMulhCheaperThanMulShift.

Currently in DAG Combine, it will transform mulhs/mulhu into a
wider multiply and a shift if the wide multiply is legal.

This TLI function is implemented on 64-bit PowerPC, as it is more desirable to
have multiply-high over multiply + shift for words and doublewords. Having
multiply-high can also aid in further transformations that can be done.

Differential Revision: https://reviews.llvm.org/D78271
2020-05-23 16:47:12 -05:00
Xiangling_Liao
2419dce5d1 [NFC][AIX] Remove spaces after the comma for '.csect' directive
To be consistent with other directives like '.comm', '.lcomm', we remove
the spaces after the comma for '.csect' on AIX.

Differential Revision: https://reviews.llvm.org/D80247
2020-05-22 11:10:32 -04:00
Nemanja Ivanovic
1a493b0fa5 [PowerPC] Add missing handling for half precision
The fix for PR39865 took care of some of the handling for half precision
but it missed a number of issues that still exist. This patch fixes the
remaining issues that cause crashes in the PPC back end.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45776

Differential revision: https://reviews.llvm.org/D79283
2020-05-22 07:50:11 -05:00
Jon Roelofs
5a8db275f8 Revert "[llvm][test] Add COM: directives before colon-less non-CHECKs in comments. NFC"
This reverts commit 183d6af081899973f00fc24aeafcfc32de732f02.

Revert pending further consensus building: https://reviews.llvm.org/D79963#2050521
2020-05-22 05:36:15 -06:00
QingShan Zhang
d1076d729a [NFC][Test] Add test coverage for fsqrt on PowerPC 2020-05-22 10:59:27 +00:00
Jon Roelofs
183d6af081 [llvm][test] Add COM: directives before colon-less non-CHECKs in comments. NFC
Differential Revision: https://reviews.llvm.org/D79963
2020-05-21 09:29:27 -06:00
Chen Zheng
8086cdd1b0 [PowerPC] add more high latency opcodes for machine combiner pass
Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D80097
2020-05-21 02:39:20 -04:00
Kang Zhang
58684fbb6f [NFC][PowerPC] Add 2 new cases to test livevars pass 2020-05-20 05:32:09 +00:00
Qiu Chaofan
ad4f196e25 [NFC] [PowerPC] Refresh fma-mutate.ll using script
This is a clean-up after D78989. The old comments are out of date.
2020-05-19 13:39:58 +08:00
Chen Zheng
a6be4d17e3 [PowerPC-QPX] adjust operands order of qpx fma instructions.
convert
  %3 = QVFMADD %2, %0, %1, implicit $rm
to
  %3 = QVFMADD %2, %1, %0, implicit $rm

Reviewed By: hfinkel, steven.zhang

Differential Revision: https://reviews.llvm.org/D78986
2020-05-18 22:59:51 -04:00
Chen Zheng
4a69eda6f3 [PowerPC][MachineCombiner] add testcase for reassociating FMA - NFC 2020-05-18 21:18:01 -04:00
Chen Zheng
455ccde137 [PowerPC] add more high latency opcodes for machinecombiner - NFC 2020-05-17 21:02:06 -04:00
Li Rong Yi
80173566f4 [PowerPC] Add an intrinsic for Popcntb
Summary: This patch adds the intrinsic llvm.ppc.popcntb for the HW
instruction POPCNTB

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D79703
2020-05-15 15:19:12 +08:00
Sean Fertile
ce4ebc14a8 [PowerPC] Remove support for SplitCSR.
SplitCSR was only suppored for functions with CXX_FAST_TLS calling
convention. Clang only emits that calling convention for Darwin which is
no longer supported by the PowerPC backend. Another IR producer could
use the calling convention, but considering the calling convention is
meant to be an optimization and the codegen for SplitCSR can be
attrocious on Power (see the modifed lit test) it is best to remove it
and codegen CXX_FAST_TLS same as the C calling convention.

Differential Revision: https://reviews.llvm.org/D79018
2020-05-14 10:32:17 -04:00
Qiu Chaofan
2866c6cad4 [NFC] [PowerPC] Narrow fast-math flags in tests
A lot of tests under PowerPC are using fast flag, while fast is just
alias of 7 fast-math flags. This change makes test points clearer.

mc-instrlat.ll and sms-iterator.ll keeps unchanged since they are not
testing fast-math behavior. (one for machine combiner crash, one for
machine pipeliner bug)

Reviewed By: steven.zhang, spatel

Differential Revision: https://reviews.llvm.org/D78989
2020-05-13 17:22:45 +08:00
Qiu Chaofan
8ffe8891cd [PowerPC] Exploit VSX neg, abs and nabs for f32
xsnegdp, xsabsdp and xsnabsdp can be used to operate on f32 operand.

This patch adds the missing patterns since we prefer VSX instructions
when available.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D75344
2020-05-13 14:28:50 +08:00
Qiu Chaofan
e9753822b5 [PowerPC] Respect SDNodeFlags in lowering SELECT_CC
Legalizer should respect both command-line options or SDNode-level
fast-math flags.

Also, this patch propagates other flags during custom simplifying.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D79074
2020-05-13 14:05:47 +08:00
Kang Zhang
782a4dd1a4 [PowerPC] Use add instead of addReg in ppc-early-ret pass
Summary:
The ppc-early-ret pass use the addReg() to add operand to the new
instruction, it can't reserve the flag of old operand. This has caused
machine verfications failed.
This patch use add() to instead of addReg().

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D77997
2020-05-13 05:59:52 +00:00
Justin Hibbits
0138cc0125 PowerPC: Treat llvm.fma.f* intrinsic as using CTR with SPE
Summary:
The SPE doesn't have a 'fma' instruction, so the intrinsic becomes a
libcall.  It really should become an expansion to two instructions, but
for some reason the compiler doesn't think that's as optimal as a
branch.  Since this lowering is done after CTR is allocated for loops,
tell the optimizer that CTR may be used in this case.  This prevents a
"Invalid PPC CTR loop!" assertion in the case that a fma() function call
is used in a C/C++ file, and clang converts it into an intrinsic.

Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D78668
2020-05-12 17:19:43 -05:00
Kamau Bridgeman
cd83333fc8 [PowerPC] Fold redundant load immediates of zero and delete if possible
This patch folds redundant load immediates into a zero for instructions
which recognise this as the value zero and not the register. If the load
immediate is no longer in use it is then deleted.

This is already done in earlier passes but the ppc-mi-peephole allows for
a more general implementation.

Differential Revision: https://reviews.llvm.org/D69168
2020-05-12 13:15:06 -05:00
Qiu Chaofan
e8d2ff22f0 [PowerPC] Add fma/fsqrt/fmax strict-fp intrinsics
This patch adds strict-fp intrinsics support for fma, fsqrt, fmaxnum and
fminnum on PowerPC.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D72749
2020-05-12 13:44:09 +08:00
jasonliu
a1b04aaea2 Move PowerPC specific test under PowerPC directive to fix build break
Fix build break in x86 platform which introduced by
https://reviews.llvm.org/D79127
2020-05-11 20:05:05 +00:00
jasonliu
51e6fc44d0 [XCOFF][AIX] Emit correct alignment for csect
Summary:
This patch tries to emit the correct alignment result for both
object file generation path and assembly path.

Reviewed by: hubert.reinterpretcast, DiggerLin, daltenty

Differential Revision: https://reviews.llvm.org/D79127
2020-05-11 19:43:10 +00:00
Kang Zhang
dcc5ff3bc2 [PowerPC] Use PredictableSelectIsExpensive to enable select to branch in CGP
Summary:
This patch will set the variable PredictableSelectIsExpensive to do the
select to if based on BranchProbability in CodeGenPrepare.

When the BranchProbability more than MinPercentageForPredictableBranch,
PPC will convert SELECT to branch.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D71883
2020-05-11 15:02:09 +00:00
Hubert Tong
601d5bd516 [Target][XCOFF] Correctly halt when mixing AIX or XCOFF with ppc64le
The code to prevent using `PPCXCOFFMCAsmInfo` with little-endian targets
used an incorrect check. Also, there does not appear to be sufficient
earlier checking to prevent failing this check, so the check here is
upgraded to be a `report_fatal_error`.

`PPCAIXAsmPrinter` was also missing a check against use with
little-endian targets. This patch adds such a check in.
2020-05-08 16:51:34 -04:00
Hubert Tong
b116ded57d [AIX] Avoid structor alias; die before bad alias codegen
Summary:
`AsmPrinter::emitGlobalIndirectSymbol` is dependent on
`MCStreamer::emitAssignment` to produce `.set` directives for alias
symbols; however, the `.set` pseudo-op on AIX is documented as not
usable with external relocatable terms or expressions, which limits its
applicability in generating alias symbols.

Disable generating aliases on AIX until a different implementation
strategy is available.

Reviewers: cebowleratibm, jasonliu, sfertile, daltenty, DiggerLin

Reviewed By: jasonliu

Differential Revision: https://reviews.llvm.org/D79044
2020-05-08 16:51:34 -04:00
Jinsong Ji
80b78a47e5 [MachinePipeliner] Add ORE for MachinePipeliner
This patch adds ORE for MachinePipeliner, so that people can anaylyze
their code using opt-viewer or other tools, then optimize the code to
catch more piplining opportunities.

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D79368
2020-05-05 16:04:53 +00:00
Sean Fertile
47f9e71ac7 [PowerPC][AIX][NFC] Remove spills and reloads from arg passing test. 2020-05-04 14:26:33 -04:00
diggerlin
a2c8cd1812 [AIX] emit .extern and .weak directive linkage
SUMMARY:

emit .extern and .weak directive linkage

Reviewers: hubert.reinterpretcast, Jason Liu
Subscribers: wuzish, nemanjai, hiraditya

Differential Revision: https://reviews.llvm.org/D76932
2020-04-30 09:54:10 -04:00
Chen Zheng
957c5dd78b [PowerPC-QPX] add more test for QPX madd/msub operands order - NFC 2020-04-29 01:17:14 -04:00
QingShan Zhang
b5f89744cc [DAGCombine] Checking the cost directly to improve the code readability
Call getNegatedExpression(Cost) and check the Cost to make the code more clear.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D78347
2020-04-29 01:49:39 +00:00
Sean Fertile
2a3cf5e583 [PowerPC][AIX] Pass ByVal formal args that span registers and stack.
Implement passing of ByVal formal arguments when the argument is passed
partly in the argument registers, with the remainder of the argument
passed on the stack.

Differential Revision: https://reviews.llvm.org/D78515
2020-04-28 14:57:14 -04:00
Chen Zheng
22fdbd01a3 [Powerpc] add triple for new added qpx test case - NFC 2020-04-28 05:32:10 -04:00
Ng Zhi An
500b4ad5f4 [PowerPC] Fix downcast from nullptr for target streamer
getTargetStreamer() might return null (e.g. when running inlined-strings.ll test),
downcasting to a reference will be wrong. This is detectable with -fsanitize=null.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D78686
2020-04-28 09:20:10 +00:00
Chen Zheng
949018cc27 [PowerPC] add test case for reorder operands of qpx fma instr - nfc. 2020-04-28 04:43:32 -04:00
Chen Zheng
45d92806ea [PowerPC] use inst-level fast-math-flags to drive MachineCombiner
Currently, on PowerPC target, it uses function scope UnsafeFPMath
option to drive Machine Combiner pass.

This is not accurate in two ways:
1: the scope is not accurate. Machine Combiner pass only requires
   instruction-level flags instead of the function scope.
2: the float point flag is not accurate. Machine Combiner pass
   only requires float point flags reassoc and nsz.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D78183
2020-04-28 03:31:12 -04:00
Kang Zhang
4bb0a1cb70 [PowerPC] Fix the liveins for ppc-expand-isel pass
Summary:
In the ppc-expand-isel pass, we use stepForward() to update the
liveins, this function is not recommended, because it needs the
accurate kill info.

This patch uses the function computeAndAddLiveIns() to update the
liveins, it's the recommended method and can fix the liveins bug for
ppc-expand-isel pass..

Reviewed By: efriedma, lkail

Differential Revision: https://reviews.llvm.org/D78657
2020-04-28 03:22:48 +00:00
LemonBoy
f30416fdde [AsmPrinter] Fix emission of non-standard integer constants for BE targets
The code assumed that zero-extending the integer constant to the
designated alloc size would be fine even for BE targets, but that's not
the case as that pulls in zeros from the MSB side while we actually
expect the padding zeros to go after the LSB.

I've changed the codepath handling the constant integers to use the
store size for both small(er than u64) and big constants and then add
zero padding right after that.

Differential Revision: https://reviews.llvm.org/D78011
2020-04-27 14:57:29 -07:00
Stefan Pintilie
1354a03e74 [PowerPC][Future] Implement PC Relative Tail Calls
Tail Calls were initially disabled for PC Relative code because it was not safe
to make certain assumptions about the tail calls (namely that all compiled
functions no longer used the TOC pointer in R2). However, once all of the
TOC pointer references have been removed it is safe to tail call everything
that was tail called prior to the PC relative additions as well as a number of
new cases.
For example, it is now possible to tail call indirect functions as there is no
need to save and restore the TOC pointer for indirect functions if the caller
is marked as may clobber R2 (st_other=1). For the same reason it is now also
possible to tail call functions that are external.

Differential Revision: https://reviews.llvm.org/D77788
2020-04-27 12:55:08 -05:00
Kang Zhang
f85e35d2a3 [NFC][PowerPC] Add the killed flag for the case expand-isel-liveness.mir 2020-04-26 04:40:20 +00:00
Kang Zhang
fe2a522533 [NFC][PowerPC] Add a new test case in expand-isel-liveness.mir 2020-04-26 03:15:54 +00:00
Fangrui Song
10bc12588d [XRay] Change Sled.Function to PC-relative for sled version 2 and make llvm-xray support sled version 2 addresses
Follow-up of D78082 and D78590.

Otherwise, because xray_instr_map is now read-only, the absolute
relocation used for Sled.Function will cause a text relocation.
2020-04-24 14:41:56 -07:00
Fangrui Song
25e22613df [XRay] Change ARM/AArch64/powerpc64le to use version 2 sled (PC-relative address)
Follow-up of D78082 (x86-64).

This change avoids dynamic relocations in `xray_instr_map` for ARM/AArch64/powerpc64le.

MIPS64 cannot use 64-bit PC-relative addresses because R_MIPS_PC64 is not defined.
Because MIPS32 shares the same code, for simplicity, we don't use PC-relative addresses for MIPS32 as well.

Tested on AArch64 Linux and ppc64le Linux.

Reviewed By: ianlevesque

Differential Revision: https://reviews.llvm.org/D78590
2020-04-24 08:35:43 -07:00
Kang Zhang
302e11cd97 [NFC][PowerPC] Fix the liveins for 3 mir test cases 2020-04-24 08:03:02 +00:00
Victor Huang
a60ca4b4e9 [PowerPC][Future] Initial support for PCRel addressing to get block address
Add initial support for PCRelative addressing to get block address
instead of using TOC.

Differential Revision: https://reviews.llvm.org/D76294
2020-04-22 15:01:29 -05:00
Victor Huang
02141a17ae [PowerPC][Future] Remove redundant r2 save and restore for indirect call
Currently an indirect call produces the following sequence on PCRelative mode:

extern void function( );
extern void (*ptrfunc) ( );

void g() {
    ptrfunc=function;
}

void f() {
    (*ptrfunc) ( );
}

Producing

paddi 3, 0, .LC0@PCREL, 1
ld 3, 0(3)
std 2, 24(1)
ld 12, 0(3)
mtctr 12
bctrl
ld 2, 24(1)

Though the caller does not use or preserve r2, it is still saved and restored
across a function call. This patch is added to remove these redundant save and
restores for indirect calls.

Differential Revision: https://reviews.llvm.org/D77749
2020-04-22 12:05:51 -05:00
Victor Huang
43abef06f4 [PowerPC][Future] Initial support for PCRel addressing for jump tables.
Add initial support for PC Relative addressing to get jump table base
address instead of using TOC.

Differential Revision: https://reviews.llvm.org/D75931
2020-04-22 10:45:01 -05:00
Qiu Chaofan
c12722cde8 [PowerPC] Exploit RLDIMI for OR with large immediates
This patch exploits rldimi instruction for patterns like
`or %a, 0b000011110000`, which saves number of instructions when the
operand has only one use, compared with `li-ori-sldi-or`.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D77850
2020-04-22 14:16:52 +08:00
Stefan Pintilie
a92ee77d85 [PowerPC][Future] Add offsets to PC Relative relocations.
This is an optimization that applies to global addresses and
allows for the following transformation:
Convert this:

paddi r3, 0, symbol@PCREL, 1
ld r4, 8(r3)

To this:

pld r4, symbol@PCREL+8(0), 1

An instruction is saved and the linker can do the addition when
the symbol is resolved.

Differential Revision: https://reviews.llvm.org/D76160
2020-04-21 11:08:19 -05:00
Kang Zhang
e477915bfe [PowerPC] Add a new test case expand-isel-liveness.mir 2020-04-21 16:00:34 +00:00