2540 Commits

Author SHA1 Message Date
Nemanja Ivanovic
1fed131660 [PowerPC] Canonicalize shuffles to match more single-instruction masks on LE
We currently miss a number of opportunities to emit single-instruction
VMRG[LH][BHW] instructions for shuffles on little endian subtargets. Although
this in itself is not a huge performance opportunity since loading the permute
vector for a VPERM can always be pulled out of loops, producing such merge
instructions is useful to downstream optimizations.
Since VPERM is essentially opaque to all subsequent optimizations, we want to
avoid it as much as possible. Other permute instructions have semantics that can
be reasoned about much more easily in later optimizations.

This patch does the following:
- Canonicalize shuffles so that the first element comes from the first vector
  (since that's what most of the mask matching functions want)
- Switch the elements that come from splat vectors so that they match the
  corresponding elements from the other vector (to allow for merges)
- Adds debugging messages for when a shuffle is matched to a VPERM so that
  anyone interested in improving this further can get the info for their code

Differential revision: https://reviews.llvm.org/D77448
2020-06-18 21:54:22 -05:00
Amy Kwan
c45c161130 [PowerPC][Power10] Implement Parallel Bits Deposit/Extract Builtins in LLVM/Clang
This patch implements builtins for the following prototypes:

vector unsigned long long vec_pdep(vector unsigned long long, vector unsigned long long);
vector unsigned long long vec_pext(vector unsigned long long, vector unsigned long long __b);
unsigned long long __builtin_pdepd (unsigned long long, unsigned long long);
unsigned long long __builtin_pextd (unsigned long long, unsigned long long);

Revision Depends on D80758

Differential Revision: https://reviews.llvm.org/D80935
2020-06-18 16:23:56 -05:00
Kang Zhang
58e19d465a [PowerPC] Don't convert Loop to CTR Loop for fp128 BinaryOperator
Summary:
For PPC BinaryOperator of fp128 will become libcall, we shouldn't
convert loop to CTR loop if the loop contain libCall.

But currently, in the PPCTTIImpl::mightUseCTR() function, we only deal
with BinaryOperator for ppc_fp128, don't deal with the fp128.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D81353
2020-06-18 02:54:19 +00:00
Esme-Yi
ad6024e29f [PowerPC] Custom lower rotl v1i128 to vector_shuffle.
Summary: A bug is reported in bugzilla-45628, where the swap_with_shift case can’t be matched to a single HW instruction xxswapd as expected.
In fact the case matches the idiom of rotate. We have MatchRotate to handle an ‘or’ of two operands and generate a rot[lr] if the case matches the idiom of rotate. While PPC doesn’t support ROTL v1i128. We can custom lower ROTL v1i128 to the vector_shuffle. The vector_shuffle will be matched to a single HW instruction during the phase of instruction selection.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D81076
2020-06-18 01:32:23 +00:00
Kang Zhang
c2574dc9f7 [NFC]][PowerPC] Remove unused intrinsic for old CTR loop pass
Summary:

In the patch D62907 the PPC CTRLoops pass has been replaced by Generic
Hardware Loop pass, and it has imported some new intrinsic for Generic
Hardware Loop.

The old intrinsic used in PPC CTRLoops int_ppc_mtctr and
int_ppc_is_decremented_ctr_nonzero is been replaced by
int_set_loop_iterations and loop_decrement.

This patch is to remove above unused two instrinsic.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D81539
2020-06-17 07:06:46 +00:00
Chen Zheng
50155bcd46 [PowerPC] remove wrong added FIXME in testcases, NFC
remove the wrong added comments as xsmaddasp is introduced in PWR8
2020-06-16 22:10:48 -04:00
Nick Desaulniers
2d8e105db6 [PPCAsmPrinter] support 'L' output template for memory operands
Summary:
L is meant to support the second word used by 32b calling conventions for 64b arguments.

This is required for build 32b PowerPC Linux kernels after upstream
commit 334710b1496a ("powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'")

Thanks for the report from @nathanchance, and reference to GCC's
implementation from @segher.

Fixes: pr/46186
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1044

Reviewers: echristo, hfinkel, MaskRay

Reviewed By: MaskRay

Subscribers: MaskRay, wuzish, nemanjai, hiraditya, kbarton, steven.zhang, llvm-commits, segher, nathanchance, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81767
2020-06-15 14:31:44 -07:00
Stefan Pintilie
57c9dc0521 [PowerPC] Do not add the relocation addend to the instruction encoding
We should not be adding the relocation addend to the instruction encoding.
This patch removes that and sets those bits to zero.

Differential Revision: https://reviews.llvm.org/D81082
2020-06-15 09:51:34 -05:00
Chen Zheng
bd7096b977 [PowerPC] fma chain break to expose more ILP
This patch tries to reassociate two patterns related to FMA to expose
more ILP on PowerPC.

// Pattern 1:
//   A =  FADD X,  Y          (Leaf)
//   B =  FMA  A,  M21,  M22  (Prev)
//   C =  FMA  B,  M31,  M32  (Root)
// -->
//   A =  FMA  X,  M21,  M22
//   B =  FMA  Y,  M31,  M32
//   C =  FADD A,  B

// Pattern 2:
//   A =  FMA  X,  M11,  M12  (Leaf)
//   B =  FMA  A,  M21,  M22  (Prev)
//   C =  FMA  B,  M31,  M32  (Root)
// -->
//   A =  FMUL M11,  M12
//   B =  FMA  X,  M21,  M22
//   D =  FMA  A,  M31,  M32
//   C =  FADD B,  D

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D80175
2020-06-15 00:00:04 -04:00
Chen Zheng
163162a0a4 [PowerPC] fold a bug for rlwinm folding when with full mask.
Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D81006
2020-06-14 21:27:01 -04:00
Qiu Chaofan
13edcd696e [PowerPC] Support constrained rounding operations
This patch adds handling of constrained FP intrinsics about round,
truncate and extend for PowerPC target, with necessary tests.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D64193
2020-06-14 23:43:31 +08:00
Qiu Chaofan
7315d221a2 [PowerPC] Exploit vnmsubfp instruction
On PowerPC, we have vnmsubfp Altivec instruction for fnmsub operation on
v4f32 type. Default pattern for this instruction never works since we
don't have legal fneg for v4f32 when VSX disabled.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D80617
2020-06-14 23:19:17 +08:00
Qiu Chaofan
f8ef7c99a0 [DAGCombiner] Require ninf for division estimation
Current implementation of division estimation isn't correct for some
cases like 1.0/0.0 (result is nan, not expected inf).

And this change exposes a potential infinite loop: we use
isConstOrConstSplatFP in combineRepeatedFPDivisors to look up if the
divisor is some constant. But it doesn't work after legalized on some
platforms. This patch restricts the method to act before LegalDAG.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D80542
2020-06-14 22:58:22 +08:00
Masoud Ataei
2d038370bb DAGCombiner optimization for pow(x,0.75) and pow(x,0.25) on double and single precision even in case massv function is asked
Here, I am proposing to add an special case for massv powf4/powd2 function (SIMD counterpart of powf/pow function in MASSV library) in MASSV pass to get later optimizations like conversion from pow(x,0.75) and pow(x,0.25) for double and single precision to sequence of sqrt's in the DAGCombiner in vector float case. My reason for doing this is: the optimized pow(x,0.75) and pow(x,0.25) for double and single precision to sequence of sqrt's is faster than powf4/powd2 on P8 and P9.

In case MASSV functions is called, and if the exponent of pow is 0.75 or 0.25, we will get the sequence of sqrt's and if exponent is not 0.75 or 0.25 we will get the appropriate MASSV function.

Reviewed By: steven.zhang

Tags: #LLVM #PowerPC

Differential Revision: https://reviews.llvm.org/D80744
2020-06-12 10:02:16 -04:00
Esme-Yi
af9f8c24a0 Revert "[PowerPC][NFC] Testing ROTL of v1i128."
This reverts commit 174192af0106be9764aeda34988f27dc2c1bd4c4.
2020-06-12 02:23:52 +00:00
Esme-Yi
174192af01 [PowerPC][NFC] Testing ROTL of v1i128.
Summary: Add RUN lines for pwr8.
2020-06-11 07:45:31 +00:00
diggerlin
2a3f5021f5 Added test case for the patch D75866 "supporting the visibility attribute for aix assembly"
The test case has been reviewed in the patch D75866

Reviewers: Jason Liu ,hubert.reinterpretcast,James Henderson

Differential Revision: https://reviews.llvm.org/D75866
2020-06-09 16:29:28 -04:00
David Green
2fea3fe41c [MachineScheduler] Update available queue on the first mop of a new cycle
If a resource can be held for multiple cycles in the schedule model
then an instruction can be placed into the available queue, another
instruction can be scheduled, but the first will not be taken back out if
the two instructions hazard. To fix this make sure that we update the
available queue even on the first MOp of a cycle, pushing available
instructions back into the pending queue if they now conflict.

This happens with some downstream schedules we have around MVE
instruction scheduling where we use ResourceCycles=[2] to show the
instruction executing over two beats. Apparently the test changes here
are OK too.

Differential Revision: https://reviews.llvm.org/D76909
2020-06-09 19:13:53 +01:00
Kang Zhang
1b6602275d [MachineVerifier] Add TiedOpsRewritten flag to fix verify two-address error
Summary:
Currently, MachineVerifier will attempt to verify that tied operands
satisfy register constraints as soon as the function is no longer in
SSA form. However, PHIElimination will take the function out of SSA
form while TwoAddressInstructionPass will actually rewrite tied operands
to match the constraints. PHIElimination runs first in the pipeline.
Therefore, whenever the MachineVerifier is run after PHIElimination,
it will encounter verification errors on any tied operands.

This patch adds a function property called TiedOpsRewritten that will be
set by TwoAddressInstructionPass and will control when the verifier checks
tied operands.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D80538
2020-06-09 07:39:42 +00:00
Anil Mahmud
246d106094 [PowerPC] Fix pattern for DCBFL/DCBFLP instrinsics.
The previous implementation used "asm parser only" pseudo-instructions in their
output patterns. Those are not meant to emit code and will caused crashes when
built with -filetype=obj.

Differential Revision: https://reviews.llvm.org/D80151
2020-06-08 20:54:59 -05:00
Anil Mahmud
c9790d54f8 [PowerPC] Remove extra instruction left by emitRLDICWhenLoweringJumpTables
The function emitRLDICWhenLoweringJumpTables in PPCMIPeephole.cpp
was supposed to convert a pair of RLDICL and RLDICR to a single RLDIC,
but it was leaving out the RLDICL instruction. This PR fixes the bug.

Differential Revision: https://reviews.llvm.org/D78063
2020-06-08 20:43:56 -05:00
jasonliu
775ef44514 [XCOFF][AIX] report_fatal_error when an overflow section is needed
If there are more than 65534 relocation entries in a single section,
we should generate an overflow section.
Since we don't support overflow section for now, we should generate
an error.

Differential revision: https://reviews.llvm.org/D81104
2020-06-08 19:59:04 +00:00
Kang Zhang
47dff1881f [NFC][PowerPC] Modify the test case to test RM 2020-06-08 08:55:31 +00:00
Nemanja Ivanovic
a56d057dfe [PowerPC] Do not assume operand of ADDI is an immediate
After pseudo-expansion, we may end up with ADDI (add immediate)
instructions where the operand is not an immediate but a relocation.
For such instructions, attempts to get the immediate result in
assertion failures for obvious reasons.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45432
2020-06-07 22:18:31 -05:00
QingShan Zhang
f8eabd6d01 [Power9] Add addi post-ra scheduling heuristic
The instruction addi is usually used to post increase the loop indvar, which looks like this:

label_X:
 load x, base(i)
 ...
 y = op x
 ...
 i = addi i, 1
 goto label_X

However, for PowerPC, if there are too many vsx instructions that between y = op x and  i = addi i, 1,
it will use all the hw resource that block the execution of i = addi, i, 1, which result in the stall
of the load instruction in next iteration. So, a heuristic is added to move the addi as early as possible
to have the load hide the latency of vsx instructions, if other heuristic didn't apply to avoid the starve.

Reviewed By: jji

Differential Revision: https://reviews.llvm.org/D80269
2020-06-08 01:31:07 +00:00
Kang Zhang
c3f5ceefb8 [NFC][PowerPC] Add a new case to test ctrloop for fp128 2020-06-07 16:35:32 +00:00
Stefan Pintilie
8dbf5a9501 [PowerPC] Remove extra nop after notoc call
Calls that are marked as @notoc do not require the extra nop after the call
for the TOC restore.

Differential Revision: https://reviews.llvm.org/D81081
2020-06-05 06:47:44 -05:00
Stefan Pintilie
05e21f8cea [PowerPC][NFC] Add more PC Relative tests
Modify the pcrel.ll test file to add more testing for PC Relative.
2020-06-05 05:55:03 -05:00
Esme-Yi
1b6cccba3e [PowerPC][NFC] Testing ROTL of v1i128.
Summary: A bug is reported in bugzilla-45628, where the swap_with_shift case can’t be matched to a single HW instruction xxswapd as expected. In fact the case matches the idiom of rotate, but PPC doesn’t support ROTL v1i128.
This is a NFC patch for testing ROTL with v1i128 at master.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D81073
2020-06-04 10:09:06 +00:00
Qiu Chaofan
7a001a2d92 [PowerPC] Require nsz flag for c-a*b to FNMSUB
On PowerPC, FNMSUB (both VSX and non-VSX version) means -(a*b-c). But
the backend used to generate these instructions regardless whether nsz
flag exists or not. If a*b-c==0, such transformation changes sign of
zero.

This patch introduces PPC specific FNMSUB ISD opcode, which may help
improving combined FMA code sequence.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D76585
2020-06-04 16:41:27 +08:00
Matt Arsenault
66251f7e1d RegAllocFast: Record internal state based on register units
Record internal state based on register units. This is often more
efficient as there are typically fewer register units to update
compared to iterating over all the aliases of a register.

Original patch by Matthias Braun, but I've been rebasing and fixing it
for almost 2 years and fixed a few bugs causing intermediate failures
to make this patch independent of the changes in
https://reviews.llvm.org/D52010.
2020-06-03 16:51:46 -04:00
jasonliu
f5415f7c5a [XCOFF][AIX] Use 'L..' instead of 'L' for PrivateGlobalPrefix
Without this change, names start with 'L' will get created as
temporary symbol in MCContext::createSymbol.

Some other potential prefix considered:
.L, does not work for AIX, as a function start with L will end
up with .L as prefix for its function entry point.

..L could work, but it does not play well with the convention
on AIX that anything start with '.' are considered as entry point.

L. could work, but not sure if it's safe enough, as it's possible
to have suffixes like .something append to a plain L, giving L.something
which is not necessarily a temporary.

That's why we picked L.. for now.

Differential Revision: https://reviews.llvm.org/D80831
2020-06-03 17:18:11 +00:00
Victor Huang
3abe7aca45 [CodeGen] Enable tail call position check for speculatable functions
In the function "Analysis.cpp:isInTailCallPosition", it only checks whether
a call is in a tail call position if the call has side effects, access memory
or it is not safe to speculative execute. Therefore, a speculatable function
will not go through tail call position check and improperly tail called when
it is not in a tail-call position. This patch enables tail call position check
for speculatable functions.

Differential Revision: https://reviews.llvm.org/D80661
2020-06-03 10:37:45 -05:00
David Tenty
d20fdcabf8 [AIX] Update data directives for AIX assembly
Summary:
The standard data emission directives (e.g. .short, .long) in the AIX assembler
have the unintended consequence of aligning their output to the natural byte
boundary. This cause problems because we aren't expecting behavior from the
Data*bitsDirectives, so the final alignment of data isn't correct in some cases
on AIX.

This patch updated the Data*bitsDirectives to use .vbyte pseudo-ops instead to emit the
data, since we will emit the .align directives as needed. We update the existing
testcases and add a test for emission of struct data.

Reviewers: hubert.reinterpretcast, Xiangling_L, jasonliu

Reviewed By: hubert.reinterpretcast, jasonliu

Subscribers: wuzish, nemanjai, hiraditya, kbarton, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80934
2020-06-03 10:55:59 -04:00
Amy Kwan
a3ada630d8 [DAGCombiner] Combine shifts into multiply-high
This patch implements a target independent DAG combine to produce multiply-high
instructions from shifts. This DAG combine will combine shifts for any type as
long as the MULH on the narrow type is legal.

For now, it is enabled on PowerPC as PowerPC is the only target that has an
implementation of the isMulhCheaperThanMulShift TLI hook introduced in
D78271.

Moreover, this DAG combine focuses on catching the pattern:
(shift (mul (ext <narrow_type>:$a to <wide_type>), (ext <narrow_type>:$b to <wide_type>)), <narrow_width>)
to produce mulhs when we have a sign-extend, and mulhu when we have
a zero-extend.

The patch performs the following checks:
- Operation is a right shift arithmetic (sra) or logical (srl)
- Input to the shift is a multiply
- Both operands to the shift are sext/zext nodes
- The extends into the multiply are both the same
- The narrow type is half the width of the wide type
- The shift amount is the width of the narrow type
- The respective mulh operation is legal

Differential Revision: https://reviews.llvm.org/D78272
2020-06-02 15:22:48 -05:00
Li Rong Yi
3101601b54 [PowerPC] Exploit vabsd on P9
Summary: Exploit vabsd* for for absolute difference of vectors on P9,
for example:
void foo (char *restrict p, char *restrict q, char *restrict t)
{
  for (int i = 0; i < 16; i++)
     t[i] = abs (p[i] - q[i]);
}
this case should be matched to the HW instruction vabsdub.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D80271
2020-06-01 02:30:27 +00:00
Kang Zhang
bfdf9ef009 Revert "[NFC][PowerPC] Add a new case to test phi-node-elimination pass"
This case wll be failed on some machines which enable expensive-checks.

This reverts commit af3abbf7bd2213003a133c361c212ac6efb1bd2b.
2020-05-31 09:24:21 +00:00
Kang Zhang
af3abbf7bd [NFC][PowerPC] Add a new case to test phi-node-elimination pass 2020-05-31 08:05:27 +00:00
Zequan Wu
80e107ccd0 Add NoMerge MIFlag to avoid MIR branch folding
Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given

Differential Revision: https://reviews.llvm.org/D79537
2020-05-29 12:31:06 -07:00
Xiangling Liao
26604d06b6 [AIX] Emit AvailableExternally Linkage on AIX
Since on AIX, our strategy is to not use -u to suppress any undefined
symbols, we need to emit .extern for the symbols with AvailableExternally
linkage.

Differential Revision: https://reviews.llvm.org/D80642
2020-05-29 13:12:59 -04:00
Kevin P. Neal
66d1899e2f Fix errors in use of strictfp attribute.
Errors spotted with use of: https://reviews.llvm.org/D68233
2020-05-29 12:25:13 -04:00
Lei Huang
2368bf52cd [PowerPC] Add support for -mcpu=pwr10 in both clang and llvm
Summary:
This patch simply adds support for the new CPU in anticipation of
Power10. There isn't really any functionality added so there are no
associated test cases at this time.

Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc

Reviewed By: stefanp, nemanjai, amyk, #powerpc

Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo

Tags: #clang, #powerpc, #llvm

Differential Revision: https://reviews.llvm.org/D80020
2020-05-27 13:14:25 -05:00
Lei Huang
559845f8fe Revert "[PowerPC] Add support for -mcpu=pwr10 in both clang and llvm"
This reverts commit 7eb666b1556b86503f2f386bf921186cdbb2d22a.
2020-05-27 09:40:21 -05:00
Kang Zhang
23a2f45214 [NFC][PowerPC] Modify the test case two-address-crash.mir 2020-05-27 02:35:45 +00:00
Lei Huang
7eb666b155 [PowerPC] Add support for -mcpu=pwr10 in both clang and llvm
Summary:
This patch simply adds support for the new CPU in anticipation of
Power10. There isn't really any functionality added so there are no
associated test cases at this time.

Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc

Reviewed By: stefanp, nemanjai, amyk, #powerpc

Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo

Tags: #clang, #powerpc, #llvm

Differential Revision: https://reviews.llvm.org/D80020
2020-05-26 13:48:22 -05:00
Nemanja Ivanovic
6e9223a2c6 [PowerPC][NFC] Update test to prevent DCE from causing failures
The test case provided in PR45709 can be simplified by DCE to an
empty function. To prevent this from happening if DCE is run prior
to ISEL in the back end, just add optnone to the function. The
behaviour it is testing for is in the SDAG legalization and is
not sensitive to optnone so the test case still achieves its desired
objective.
2020-05-26 13:37:48 -05:00
Sean Fertile
d6c8736287 [PowerPC][AIX] Spill CSRs to the ABI specified stack offsets.
Extend the CSR save/restore insertion code to support both 32-bit and
64-bit AIX.

Differential Revision: https://reviews.llvm.org/D79252
2020-05-26 12:24:29 -04:00
Nemanja Ivanovic
099a875f28 [PowerPC] Unaligned FP default should apply to scalars only
As reported in PR45186, we could be in a situation where we don't
want to handle unaligned memory accesses for FP scalars but still
have VSX (which allows unaligned access for vectors). Change the
default to only apply to scalars.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45186
2020-05-26 10:19:06 -05:00
Kang Zhang
e6e89875b0 [NFC][PowerPC] Add a new case to test two-address verification 2020-05-26 06:14:08 +00:00
Nemanja Ivanovic
793cc518b9 [PowerPC] Prevent legalization loop from promoting SELECT_CC from v4i32 to v4i32
As reported in https://bugs.llvm.org/show_bug.cgi?id=45709 we can hit an
infinite loop in legalization since we set the legalization action for
ISD::SELECT_CC for all fixed length vector types to Promote. Without some
different legalization action for the type being promoted to, the legalizer
simply loops. Since we don't have patterns to match the node, the right
legalization action should be Expand.

Differential revision: https://reviews.llvm.org/D79854
2020-05-25 20:09:07 -05:00