llvm-project

Author	SHA1	Message	Date
Nemanja Ivanovic	1fed131660	[PowerPC] Canonicalize shuffles to match more single-instruction masks on LE We currently miss a number of opportunities to emit single-instruction VMRG[LH][BHW] instructions for shuffles on little endian subtargets. Although this in itself is not a huge performance opportunity since loading the permute vector for a VPERM can always be pulled out of loops, producing such merge instructions is useful to downstream optimizations. Since VPERM is essentially opaque to all subsequent optimizations, we want to avoid it as much as possible. Other permute instructions have semantics that can be reasoned about much more easily in later optimizations. This patch does the following: - Canonicalize shuffles so that the first element comes from the first vector (since that's what most of the mask matching functions want) - Switch the elements that come from splat vectors so that they match the corresponding elements from the other vector (to allow for merges) - Adds debugging messages for when a shuffle is matched to a VPERM so that anyone interested in improving this further can get the info for their code Differential revision: https://reviews.llvm.org/D77448	2020-06-18 21:54:22 -05:00
Amy Kwan	c45c161130	[PowerPC][Power10] Implement Parallel Bits Deposit/Extract Builtins in LLVM/Clang This patch implements builtins for the following prototypes: vector unsigned long long vec_pdep(vector unsigned long long, vector unsigned long long); vector unsigned long long vec_pext(vector unsigned long long, vector unsigned long long __b); unsigned long long __builtin_pdepd (unsigned long long, unsigned long long); unsigned long long __builtin_pextd (unsigned long long, unsigned long long); Revision Depends on D80758 Differential Revision: https://reviews.llvm.org/D80935	2020-06-18 16:23:56 -05:00
Kang Zhang	58e19d465a	[PowerPC] Don't convert Loop to CTR Loop for fp128 BinaryOperator Summary: For PPC BinaryOperator of fp128 will become libcall, we shouldn't convert loop to CTR loop if the loop contain libCall. But currently, in the PPCTTIImpl::mightUseCTR() function, we only deal with BinaryOperator for ppc_fp128, don't deal with the fp128. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D81353	2020-06-18 02:54:19 +00:00
Esme-Yi	ad6024e29f	[PowerPC] Custom lower rotl v1i128 to vector_shuffle. Summary: A bug is reported in bugzilla-45628, where the swap_with_shift case can’t be matched to a single HW instruction xxswapd as expected. In fact the case matches the idiom of rotate. We have MatchRotate to handle an ‘or’ of two operands and generate a rot[lr] if the case matches the idiom of rotate. While PPC doesn’t support ROTL v1i128. We can custom lower ROTL v1i128 to the vector_shuffle. The vector_shuffle will be matched to a single HW instruction during the phase of instruction selection. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D81076	2020-06-18 01:32:23 +00:00
Kang Zhang	c2574dc9f7	[NFC]][PowerPC] Remove unused intrinsic for old CTR loop pass Summary: In the patch D62907 the PPC CTRLoops pass has been replaced by Generic Hardware Loop pass, and it has imported some new intrinsic for Generic Hardware Loop. The old intrinsic used in PPC CTRLoops int_ppc_mtctr and int_ppc_is_decremented_ctr_nonzero is been replaced by int_set_loop_iterations and loop_decrement. This patch is to remove above unused two instrinsic. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D81539	2020-06-17 07:06:46 +00:00
Chen Zheng	50155bcd46	[PowerPC] remove wrong added FIXME in testcases, NFC remove the wrong added comments as xsmaddasp is introduced in PWR8	2020-06-16 22:10:48 -04:00
Nick Desaulniers	2d8e105db6	[PPCAsmPrinter] support 'L' output template for memory operands Summary: L is meant to support the second word used by 32b calling conventions for 64b arguments. This is required for build 32b PowerPC Linux kernels after upstream commit 334710b1496a ("powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'") Thanks for the report from @nathanchance, and reference to GCC's implementation from @segher. Fixes: pr/46186 Fixes: https://github.com/ClangBuiltLinux/linux/issues/1044 Reviewers: echristo, hfinkel, MaskRay Reviewed By: MaskRay Subscribers: MaskRay, wuzish, nemanjai, hiraditya, kbarton, steven.zhang, llvm-commits, segher, nathanchance, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D81767	2020-06-15 14:31:44 -07:00
Stefan Pintilie	57c9dc0521	[PowerPC] Do not add the relocation addend to the instruction encoding We should not be adding the relocation addend to the instruction encoding. This patch removes that and sets those bits to zero. Differential Revision: https://reviews.llvm.org/D81082	2020-06-15 09:51:34 -05:00
Chen Zheng	bd7096b977	[PowerPC] fma chain break to expose more ILP This patch tries to reassociate two patterns related to FMA to expose more ILP on PowerPC. // Pattern 1: // A = FADD X, Y (Leaf) // B = FMA A, M21, M22 (Prev) // C = FMA B, M31, M32 (Root) // --> // A = FMA X, M21, M22 // B = FMA Y, M31, M32 // C = FADD A, B // Pattern 2: // A = FMA X, M11, M12 (Leaf) // B = FMA A, M21, M22 (Prev) // C = FMA B, M31, M32 (Root) // --> // A = FMUL M11, M12 // B = FMA X, M21, M22 // D = FMA A, M31, M32 // C = FADD B, D Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D80175	2020-06-15 00:00:04 -04:00
Chen Zheng	163162a0a4	[PowerPC] fold a bug for rlwinm folding when with full mask. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D81006	2020-06-14 21:27:01 -04:00
Qiu Chaofan	13edcd696e	[PowerPC] Support constrained rounding operations This patch adds handling of constrained FP intrinsics about round, truncate and extend for PowerPC target, with necessary tests. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D64193	2020-06-14 23:43:31 +08:00
Qiu Chaofan	7315d221a2	[PowerPC] Exploit vnmsubfp instruction On PowerPC, we have vnmsubfp Altivec instruction for fnmsub operation on v4f32 type. Default pattern for this instruction never works since we don't have legal fneg for v4f32 when VSX disabled. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D80617	2020-06-14 23:19:17 +08:00
Qiu Chaofan	f8ef7c99a0	[DAGCombiner] Require ninf for division estimation Current implementation of division estimation isn't correct for some cases like 1.0/0.0 (result is nan, not expected inf). And this change exposes a potential infinite loop: we use isConstOrConstSplatFP in combineRepeatedFPDivisors to look up if the divisor is some constant. But it doesn't work after legalized on some platforms. This patch restricts the method to act before LegalDAG. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D80542	2020-06-14 22:58:22 +08:00
Masoud Ataei	2d038370bb	DAGCombiner optimization for pow(x,0.75) and pow(x,0.25) on double and single precision even in case massv function is asked Here, I am proposing to add an special case for massv powf4/powd2 function (SIMD counterpart of powf/pow function in MASSV library) in MASSV pass to get later optimizations like conversion from pow(x,0.75) and pow(x,0.25) for double and single precision to sequence of sqrt's in the DAGCombiner in vector float case. My reason for doing this is: the optimized pow(x,0.75) and pow(x,0.25) for double and single precision to sequence of sqrt's is faster than powf4/powd2 on P8 and P9. In case MASSV functions is called, and if the exponent of pow is 0.75 or 0.25, we will get the sequence of sqrt's and if exponent is not 0.75 or 0.25 we will get the appropriate MASSV function. Reviewed By: steven.zhang Tags: #LLVM #PowerPC Differential Revision: https://reviews.llvm.org/D80744	2020-06-12 10:02:16 -04:00
Esme-Yi	af9f8c24a0	Revert "[PowerPC][NFC] Testing ROTL of v1i128." This reverts commit 174192af0106be9764aeda34988f27dc2c1bd4c4.	2020-06-12 02:23:52 +00:00
Esme-Yi	174192af01	[PowerPC][NFC] Testing ROTL of v1i128. Summary: Add RUN lines for pwr8.	2020-06-11 07:45:31 +00:00
diggerlin	2a3f5021f5	Added test case for the patch D75866 "supporting the visibility attribute for aix assembly" The test case has been reviewed in the patch D75866 Reviewers: Jason Liu ,hubert.reinterpretcast,James Henderson Differential Revision: https://reviews.llvm.org/D75866	2020-06-09 16:29:28 -04:00
David Green	2fea3fe41c	[MachineScheduler] Update available queue on the first mop of a new cycle If a resource can be held for multiple cycles in the schedule model then an instruction can be placed into the available queue, another instruction can be scheduled, but the first will not be taken back out if the two instructions hazard. To fix this make sure that we update the available queue even on the first MOp of a cycle, pushing available instructions back into the pending queue if they now conflict. This happens with some downstream schedules we have around MVE instruction scheduling where we use ResourceCycles=[2] to show the instruction executing over two beats. Apparently the test changes here are OK too. Differential Revision: https://reviews.llvm.org/D76909	2020-06-09 19:13:53 +01:00
Kang Zhang	1b6602275d	[MachineVerifier] Add TiedOpsRewritten flag to fix verify two-address error Summary: Currently, MachineVerifier will attempt to verify that tied operands satisfy register constraints as soon as the function is no longer in SSA form. However, PHIElimination will take the function out of SSA form while TwoAddressInstructionPass will actually rewrite tied operands to match the constraints. PHIElimination runs first in the pipeline. Therefore, whenever the MachineVerifier is run after PHIElimination, it will encounter verification errors on any tied operands. This patch adds a function property called TiedOpsRewritten that will be set by TwoAddressInstructionPass and will control when the verifier checks tied operands. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D80538	2020-06-09 07:39:42 +00:00
Anil Mahmud	246d106094	[PowerPC] Fix pattern for DCBFL/DCBFLP instrinsics. The previous implementation used "asm parser only" pseudo-instructions in their output patterns. Those are not meant to emit code and will caused crashes when built with -filetype=obj. Differential Revision: https://reviews.llvm.org/D80151	2020-06-08 20:54:59 -05:00
Anil Mahmud	c9790d54f8	[PowerPC] Remove extra instruction left by emitRLDICWhenLoweringJumpTables The function emitRLDICWhenLoweringJumpTables in PPCMIPeephole.cpp was supposed to convert a pair of RLDICL and RLDICR to a single RLDIC, but it was leaving out the RLDICL instruction. This PR fixes the bug. Differential Revision: https://reviews.llvm.org/D78063	2020-06-08 20:43:56 -05:00
jasonliu	775ef44514	[XCOFF][AIX] report_fatal_error when an overflow section is needed If there are more than 65534 relocation entries in a single section, we should generate an overflow section. Since we don't support overflow section for now, we should generate an error. Differential revision: https://reviews.llvm.org/D81104	2020-06-08 19:59:04 +00:00
Kang Zhang	47dff1881f	[NFC][PowerPC] Modify the test case to test RM	2020-06-08 08:55:31 +00:00
Nemanja Ivanovic	a56d057dfe	[PowerPC] Do not assume operand of ADDI is an immediate After pseudo-expansion, we may end up with ADDI (add immediate) instructions where the operand is not an immediate but a relocation. For such instructions, attempts to get the immediate result in assertion failures for obvious reasons. Fixes: https://bugs.llvm.org/show_bug.cgi?id=45432	2020-06-07 22:18:31 -05:00
QingShan Zhang	f8eabd6d01	[Power9] Add addi post-ra scheduling heuristic The instruction addi is usually used to post increase the loop indvar, which looks like this: label_X: load x, base(i) ... y = op x ... i = addi i, 1 goto label_X However, for PowerPC, if there are too many vsx instructions that between y = op x and i = addi i, 1, it will use all the hw resource that block the execution of i = addi, i, 1, which result in the stall of the load instruction in next iteration. So, a heuristic is added to move the addi as early as possible to have the load hide the latency of vsx instructions, if other heuristic didn't apply to avoid the starve. Reviewed By: jji Differential Revision: https://reviews.llvm.org/D80269	2020-06-08 01:31:07 +00:00
Kang Zhang	c3f5ceefb8	[NFC][PowerPC] Add a new case to test ctrloop for fp128	2020-06-07 16:35:32 +00:00
Stefan Pintilie	8dbf5a9501	[PowerPC] Remove extra nop after notoc call Calls that are marked as @notoc do not require the extra nop after the call for the TOC restore. Differential Revision: https://reviews.llvm.org/D81081	2020-06-05 06:47:44 -05:00
Stefan Pintilie	05e21f8cea	[PowerPC][NFC] Add more PC Relative tests Modify the pcrel.ll test file to add more testing for PC Relative.	2020-06-05 05:55:03 -05:00
Esme-Yi	1b6cccba3e	[PowerPC][NFC] Testing ROTL of v1i128. Summary: A bug is reported in bugzilla-45628, where the swap_with_shift case can’t be matched to a single HW instruction xxswapd as expected. In fact the case matches the idiom of rotate, but PPC doesn’t support ROTL v1i128. This is a NFC patch for testing ROTL with v1i128 at master. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D81073	2020-06-04 10:09:06 +00:00
Qiu Chaofan	7a001a2d92	[PowerPC] Require nsz flag for c-ab to FNMSUB On PowerPC, FNMSUB (both VSX and non-VSX version) means -(ab-c). But the backend used to generate these instructions regardless whether nsz flag exists or not. If a*b-c==0, such transformation changes sign of zero. This patch introduces PPC specific FNMSUB ISD opcode, which may help improving combined FMA code sequence. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D76585	2020-06-04 16:41:27 +08:00
Matt Arsenault	66251f7e1d	RegAllocFast: Record internal state based on register units Record internal state based on register units. This is often more efficient as there are typically fewer register units to update compared to iterating over all the aliases of a register. Original patch by Matthias Braun, but I've been rebasing and fixing it for almost 2 years and fixed a few bugs causing intermediate failures to make this patch independent of the changes in https://reviews.llvm.org/D52010.	2020-06-03 16:51:46 -04:00
jasonliu	f5415f7c5a	[XCOFF][AIX] Use 'L..' instead of 'L' for PrivateGlobalPrefix Without this change, names start with 'L' will get created as temporary symbol in MCContext::createSymbol. Some other potential prefix considered: .L, does not work for AIX, as a function start with L will end up with .L as prefix for its function entry point. ..L could work, but it does not play well with the convention on AIX that anything start with '.' are considered as entry point. L. could work, but not sure if it's safe enough, as it's possible to have suffixes like .something append to a plain L, giving L.something which is not necessarily a temporary. That's why we picked L.. for now. Differential Revision: https://reviews.llvm.org/D80831	2020-06-03 17:18:11 +00:00
Victor Huang	3abe7aca45	[CodeGen] Enable tail call position check for speculatable functions In the function "Analysis.cpp:isInTailCallPosition", it only checks whether a call is in a tail call position if the call has side effects, access memory or it is not safe to speculative execute. Therefore, a speculatable function will not go through tail call position check and improperly tail called when it is not in a tail-call position. This patch enables tail call position check for speculatable functions. Differential Revision: https://reviews.llvm.org/D80661	2020-06-03 10:37:45 -05:00
David Tenty	d20fdcabf8	[AIX] Update data directives for AIX assembly Summary: The standard data emission directives (e.g. .short, .long) in the AIX assembler have the unintended consequence of aligning their output to the natural byte boundary. This cause problems because we aren't expecting behavior from the DatabitsDirectives, so the final alignment of data isn't correct in some cases on AIX. This patch updated the DatabitsDirectives to use .vbyte pseudo-ops instead to emit the data, since we will emit the .align directives as needed. We update the existing testcases and add a test for emission of struct data. Reviewers: hubert.reinterpretcast, Xiangling_L, jasonliu Reviewed By: hubert.reinterpretcast, jasonliu Subscribers: wuzish, nemanjai, hiraditya, kbarton, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80934	2020-06-03 10:55:59 -04:00
Amy Kwan	a3ada630d8	[DAGCombiner] Combine shifts into multiply-high This patch implements a target independent DAG combine to produce multiply-high instructions from shifts. This DAG combine will combine shifts for any type as long as the MULH on the narrow type is legal. For now, it is enabled on PowerPC as PowerPC is the only target that has an implementation of the isMulhCheaperThanMulShift TLI hook introduced in D78271. Moreover, this DAG combine focuses on catching the pattern: (shift (mul (ext <narrow_type>:$a to <wide_type>), (ext <narrow_type>:$b to <wide_type>)), <narrow_width>) to produce mulhs when we have a sign-extend, and mulhu when we have a zero-extend. The patch performs the following checks: - Operation is a right shift arithmetic (sra) or logical (srl) - Input to the shift is a multiply - Both operands to the shift are sext/zext nodes - The extends into the multiply are both the same - The narrow type is half the width of the wide type - The shift amount is the width of the narrow type - The respective mulh operation is legal Differential Revision: https://reviews.llvm.org/D78272	2020-06-02 15:22:48 -05:00
Li Rong Yi	3101601b54	[PowerPC] Exploit vabsd on P9 Summary: Exploit vabsd* for for absolute difference of vectors on P9, for example: void foo (char restrict p, char restrict q, char *restrict t) { for (int i = 0; i < 16; i++) t[i] = abs (p[i] - q[i]); } this case should be matched to the HW instruction vabsdub. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D80271	2020-06-01 02:30:27 +00:00
Kang Zhang	bfdf9ef009	Revert "[NFC][PowerPC] Add a new case to test phi-node-elimination pass" This case wll be failed on some machines which enable expensive-checks. This reverts commit af3abbf7bd2213003a133c361c212ac6efb1bd2b.	2020-05-31 09:24:21 +00:00
Kang Zhang	af3abbf7bd	[NFC][PowerPC] Add a new case to test phi-node-elimination pass	2020-05-31 08:05:27 +00:00
Zequan Wu	80e107ccd0	Add NoMerge MIFlag to avoid MIR branch folding Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given Differential Revision: https://reviews.llvm.org/D79537	2020-05-29 12:31:06 -07:00
Xiangling Liao	26604d06b6	[AIX] Emit AvailableExternally Linkage on AIX Since on AIX, our strategy is to not use -u to suppress any undefined symbols, we need to emit .extern for the symbols with AvailableExternally linkage. Differential Revision: https://reviews.llvm.org/D80642	2020-05-29 13:12:59 -04:00
Kevin P. Neal	66d1899e2f	Fix errors in use of strictfp attribute. Errors spotted with use of: https://reviews.llvm.org/D68233	2020-05-29 12:25:13 -04:00
Lei Huang	2368bf52cd	[PowerPC] Add support for -mcpu=pwr10 in both clang and llvm Summary: This patch simply adds support for the new CPU in anticipation of Power10. There isn't really any functionality added so there are no associated test cases at this time. Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc Reviewed By: stefanp, nemanjai, amyk, #powerpc Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo Tags: #clang, #powerpc, #llvm Differential Revision: https://reviews.llvm.org/D80020	2020-05-27 13:14:25 -05:00
Lei Huang	559845f8fe	Revert "[PowerPC] Add support for -mcpu=pwr10 in both clang and llvm" This reverts commit 7eb666b1556b86503f2f386bf921186cdbb2d22a.	2020-05-27 09:40:21 -05:00
Kang Zhang	23a2f45214	[NFC][PowerPC] Modify the test case two-address-crash.mir	2020-05-27 02:35:45 +00:00
Lei Huang	7eb666b155	[PowerPC] Add support for -mcpu=pwr10 in both clang and llvm Summary: This patch simply adds support for the new CPU in anticipation of Power10. There isn't really any functionality added so there are no associated test cases at this time. Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc Reviewed By: stefanp, nemanjai, amyk, #powerpc Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo Tags: #clang, #powerpc, #llvm Differential Revision: https://reviews.llvm.org/D80020	2020-05-26 13:48:22 -05:00
Nemanja Ivanovic	6e9223a2c6	[PowerPC][NFC] Update test to prevent DCE from causing failures The test case provided in PR45709 can be simplified by DCE to an empty function. To prevent this from happening if DCE is run prior to ISEL in the back end, just add optnone to the function. The behaviour it is testing for is in the SDAG legalization and is not sensitive to optnone so the test case still achieves its desired objective.	2020-05-26 13:37:48 -05:00
Sean Fertile	d6c8736287	[PowerPC][AIX] Spill CSRs to the ABI specified stack offsets. Extend the CSR save/restore insertion code to support both 32-bit and 64-bit AIX. Differential Revision: https://reviews.llvm.org/D79252	2020-05-26 12:24:29 -04:00
Nemanja Ivanovic	099a875f28	[PowerPC] Unaligned FP default should apply to scalars only As reported in PR45186, we could be in a situation where we don't want to handle unaligned memory accesses for FP scalars but still have VSX (which allows unaligned access for vectors). Change the default to only apply to scalars. Fixes: https://bugs.llvm.org/show_bug.cgi?id=45186	2020-05-26 10:19:06 -05:00
Kang Zhang	e6e89875b0	[NFC][PowerPC] Add a new case to test two-address verification	2020-05-26 06:14:08 +00:00
Nemanja Ivanovic	793cc518b9	[PowerPC] Prevent legalization loop from promoting SELECT_CC from v4i32 to v4i32 As reported in https://bugs.llvm.org/show_bug.cgi?id=45709 we can hit an infinite loop in legalization since we set the legalization action for ISD::SELECT_CC for all fixed length vector types to Promote. Without some different legalization action for the type being promoted to, the legalizer simply loops. Since we don't have patterns to match the node, the right legalization action should be Expand. Differential revision: https://reviews.llvm.org/D79854	2020-05-25 20:09:07 -05:00

1 2 3 4 5 ...

2540 Commits