llvm-project

Author	SHA1	Message	Date
alex-t	0a488cba2c	[AMDGPU] use scalar shift for SALU users in frame index elimination In the frame index lowering we have to insert shift and add instructions to adjust stack object access. We need to take care of the stack object user kind and use scalar shift/add for scalar users. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D121524	2022-03-22 11:43:23 +01:00
Zakk Chen	9ab18cc535	[RISCV] Add policy operand for masked vid and viota IR intrinsics. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D120227	2022-03-22 02:32:31 -07:00
Alex Bradbury	86cc731f4c	[WebAssembly] Always emit functype directives for defined functions This fixes bug <https://github.com/llvm/llvm-project/issues/54022>. For now this means that defined functions will have two .functype directives emitted. Given discussion in that bug has suggested interest in moving towards using something other than .functype to mark the beginning of a function (which would, as a side-effect, solve this issue), this patch doesn't attempt to avoid that duplication. Some test cases that used CHECK-LABEL: foo rather than CHECK-LABEL: foo: are broken by this change. This patch updates those test cases to always have a colon at the end of the CHECK-LABEL string. Differential Revision: https://reviews.llvm.org/D122134	2022-03-22 09:24:58 +00:00
Zakk Chen	abb5a985e9	[RISCV] Support mask policy for RVV IR intrinsics. Add the UsesMaskPolicy flag to indicate the operations result would be effected by the mask policy. (ex. mask operations). It means RISCVInsertVSETVLI should decide the mask policy according by mask policy operand or passthru operand. If UsesMaskPolicy is false (ex. unmasked, store, and reduction operations), the mask policy could be either mask undisturbed or agnostic. Currently, RISCVInsertVSETVLI sets UsesMaskPolicy operations default to MA, otherwise to MU to keep the current mask policy would not be changed for unmasked operations. Add masked-tama, masked-tamu, masked-tuma and masked-tumu test cases. I didn't add all operations because most of implementations are using the same pseudo multiclass. Some tests maybe be duplicated in different tests. (ex. masked vmacc with tumu shows in vmacc-rv32.ll and masked-tumu) I think having different tests only for policy would make the testing clear. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120226	2022-03-22 01:19:16 -07:00
Lian Wang	0ff19b1905	[RISCV][NFC] Add some check prefixes to remove redundant checks in some IR tests Reviewed By: frasercrmck, jacquesguan Differential Revision: https://reviews.llvm.org/D122211	2022-03-22 08:14:08 +00:00
jacquesguan	f863df9a05	[RISCV][NFC] Add common check prefix to reduce duplicate check lines. Differential Revision: https://reviews.llvm.org/D122120	2022-03-22 11:06:52 +08:00
Carl Ritson	8e64d84995	[MachineSink] Check block prologue interference Sinking must check for interference between the block prologue and the instruction being sunk. Specifically check for clobbering of uses by the prologue, and overwrites to prologue defined registers by the sunk instruction. Reviewed By: rampitec, ruiling Differential Revision: https://reviews.llvm.org/D121277	2022-03-22 11:15:37 +09:00
Craig Topper	cc5b0868ff	Revert "[RISCV] Special case sign extended scalars when type legalizing nxvXi64 .vx instrinsics on RV32." This reverts commit 8c4937b33fe9090546f6dc834e174177075b5084. Committed by mistake.	2022-03-21 14:58:11 -07:00
Craig Topper	8c4937b33f	[RISCV] Special case sign extended scalars when type legalizing nxvXi64 .vx instrinsics on RV32. On RV32, we need to type legalize i64 scalar arguments to intrinsics. We usually do this by splatting the value into a vector separately. If the scalar happens to be sign extended, we can continue using a .vx intrinsic. We already special cased sign extended constants, this extends it to any sign extended value. I've only added tests for one case of vadd. Most intrinsics go through the same check. I can add more tests if we're concerned. Differential Revision: https://reviews.llvm.org/D122186	2022-03-21 14:50:55 -07:00
Simon Pilgrim	438ac282db	[X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y) (REAPPLIED) As suggested on PR35908, if we are adding/subtracting an extracted bit, attempt to use BT instead to fold the op and use a ADC/SBB op. Reapply with extra type legality checks - LowerAndToBT was originally only used during lowering, now that it can occur earlier we might encounter illegal types that we can either promote to i32 or just bail. Differential Revision: https://reviews.llvm.org/D122084	2022-03-21 21:37:42 +00:00
Nikita Popov	ff3f279dac	[X86] Regenerate test checks Update test checks after the revert in 15336828395792bfc818e6fcd3d951cba1b8477b.	2022-03-21 22:13:19 +01:00
Nikita Popov	1533682839	Revert "[X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y)" This reverts commit 81569f5b6ef531a48023f28133481262ee1509a3. This causes a segfault when building consumer-typeset in ReleaseLTO-g configuration: https://llvm-compile-time-tracker.com/show_error.php?commit=81569f5b6ef531a48023f28133481262ee1509a3	2022-03-21 21:52:36 +01:00
Stefan Pintilie	4275d7e65a	[PowerPC][NFC] Add test case for byval argument passing Add a test case for byval argument passing where the argument size is more than 8 bytes and is not a factor of 8 bytes.	2022-03-21 15:14:28 -05:00
alex-t	a0ea7ec90f	[AMDGPU] divergence patterns for the BUILD_VECTOR i16, undef expansion. BUILD_VECTOR of i16 and undef gets expanded to the COPY_TO_REGCLASS. The latter is further lowererd to the copy instructions. We need to provide the correct register class for the uniform and divergent BUILD_VECTOR nodes to avoid VGPR to SGPR copies. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D122068	2022-03-21 21:11:20 +01:00
Simon Pilgrim	5fd9451668	[X86][AVX512] lower1BitShuffle - fold broadcast(setcc(x,y)) -> setcc(broadcast(x),broadcast(y)) (PR52500) AVX512 has excellent broadcast ops for everything but vXi1 bool vectors - so if we're broadcasting a comparison result, see if we can broadcast the comparison operands instead.	2022-03-21 17:42:49 +00:00
Simon Pilgrim	8692e27ad6	[X86][AVX512] Add PR52500 vXi1 broadcast test case	2022-03-21 17:25:29 +00:00
Simon Pilgrim	21378593fb	[X86] Add PR34666 redundant broadcast test case	2022-03-21 16:10:06 +00:00
Simon Pilgrim	b6e2832fc2	[X86] Don't fold SUB(X,SBB(0,0,W)) -> SUB(ADC(0,0,W),Y) This will further fold to a AND(SETCC_CARRY(),1) pattern which tends to prevent further folds.	2022-03-21 15:54:48 +00:00
Simon Pilgrim	58dda03f7c	[X86] Add ((z & m) >> s) - (x + y)) sub -> sbb test case Another variant based off the PR35908 test cases	2022-03-21 15:54:47 +00:00
zhongyunde	828b89bc0b	[AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads Trying to reduce the number of masked loads in favour of more unpklo/hi instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions from legal types. Both of normal and masked loads test cases added to guard compile crash. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D120953	2022-03-21 23:47:33 +08:00
Simon Pilgrim	315896d3ac	[X86] Fold SUB(X,SBB(Y,Z,W)) -> SUB(ADC(X,Z,W),Y) Prefer the commutable ADC over SBB to improve load folding opportunities	2022-03-21 14:20:46 +00:00
Alex Bradbury	da9ba89d48	[WebAssembly][NFC] Add test case for functype emission This test aims to demonstrate the WebAssembly backend's behaviour around emission of the .functype directive. It covers defined and declared functions as well as libcalls. It currently fails to emit functypes for all defined functions at the head of the file, causing issues with the type checker <https://github.com/llvm/llvm-project/issues/54022>. The patch in <https://reviews.llvm.org/D122134> is a proposal to fix this issue.	2022-03-21 14:04:32 +00:00
Simon Pilgrim	ed51e26ab4	[X86] combineAddOrSubToADCOrSBB - commute + neg subtraction patterns Handle SUB(AND(SRL(Y,Z),1),X) -> NEG(SBB(X,0,BT(Y,Z))) folds I'll address the X86 lost folded-load regressions in a follow-up patch	2022-03-21 13:55:35 +00:00
Simon Pilgrim	5e9365c5eb	[X86] combineAddOrSubToADCOrSBB - bail for illegal types Ensure we don't attempt to fold to illegal types to ADC/SBB nodes. After D122084 its possible for ADD(X,AND(SRL(Y,Z),1) patterns to be matched before type legalization.	2022-03-21 13:31:21 +00:00
Simon Pilgrim	35a7be6ccb	[SDAG] enable binop identity constant folds for shifts Add shl/srl/sra to the list of ops that we canonicalize with a select to expose an identity merge Differential Revision: https://reviews.llvm.org/D122070	2022-03-21 13:02:50 +00:00
Simon Pilgrim	76cbfd949d	[X86] Add nounwind to adc/sbb tests to prevent cfi noise	2022-03-21 11:44:22 +00:00
Jay Foad	321c8ab81b	[AMDGPU] Add an agpr copy propagation test	2022-03-21 11:42:57 +00:00
Jay Foad	692341e998	[AMDGPU] Update checks in agpr-copy-propagation.mir	2022-03-21 11:42:56 +00:00
Simon Pilgrim	81569f5b6e	[X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y) As suggested on PR35908, if we are adding/subtracting an extracted bit, attempt to use BT instead to fold the op and use a ADC/SBB op. Differential Revision: https://reviews.llvm.org/D122084	2022-03-21 10:57:12 +00:00
Simon Pilgrim	65cf643073	[X86] Add (x - y - ((z & m) >> s)) sub -> sbb test case for D122084	2022-03-21 10:44:17 +00:00
Thomas Symalla	7de6107dce	Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3." This reverts commit 011c64191ef9ccc6538d52f4b57f98f37d4ea36e and e725e2afe02e18398525652c9bceda1eb055ea64. Differential Revision: https://reviews.llvm.org/D122117	2022-03-21 09:50:44 +01:00
Thomas Symalla	011c64191e	[AMDGPU] Improve v_cmpx usage on GFX10.3. On GFX10.3 targets, the following instruction sequence v_cmp_* SGPR, ... s_and_saveexec ..., SGPR leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR. An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison. This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence. It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets. Reviewed By: sebastian-ne, critson Differential Revision: https://reviews.llvm.org/D119696	2022-03-21 09:31:59 +01:00
Aaron Puchert	c1a31ee65b	[PPCISelLowering] Avoid emitting calls to __multi3, __muloti4 After D108936, @llvm.smul.with.overflow.i64 was lowered to __multi3 instead of __mulodi4, which also doesn't exist on PowerPC 32-bit, not even with compiler-rt. Block it as well so that we get inline code. Because libgcc doesn't have __muloti4, we block that as well. Fixes #54460. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122090	2022-03-20 20:59:30 +01:00
Chen Zheng	973b02b6f1	[PowerPC][NFC] use right hardware loop intrinsics in test case	2022-03-20 10:00:57 -04:00
esmeyi	de20a3b677	[XCOFF] support XCOFFObjectWriter for fileHeader and sectionHeaders in 64-bit XCOFF. This is the first patch to enable the XCOFF64 object writer. Currently only fileHeader and sectionHeaders are supported. Reviewed By: jhenderson, DiggerLin Differential Revision: https://reviews.llvm.org/D120861	2022-03-20 09:31:29 -04:00
Luo, Yuanke	10bb623192	enable binop identity constant folds for add Differential Revision: https://reviews.llvm.org/D119654	2022-03-20 19:07:16 +08:00
Simon Pilgrim	06fa67dc0a	[X86] Add test add with bit0 extraction and improve comments Based on feedback from D122084	2022-03-20 09:31:52 +00:00
Craig Topper	4eb59f0179	[SelectionDAG][RISCV] Make RegsForValue::getCopyToRegs explicitly zero_extend constants. ComputePHILiveOutRegInfo assumes that constant incoming values to Phis will be zero extended if they aren't a legal type. To guarantee that we should zero_extend rather than any_extend constants. This fixes a bug for RISCV where any_extend of constants can be treated as a sign_extend. Differential Revision: https://reviews.llvm.org/D122053	2022-03-19 18:43:14 -07:00
Craig Topper	268371cf7b	[RISCV] Add test case for miscompile caused by treating ANY_EXTEND of constants as SIGN_EXTEND. The code that inserts AssertZExt based on predecessor information assumes constants are zero extended for phi incoming values this allows AssertZExt to be created in blocks consuming a Phi. SelectionDAG::getNode treats any_extend of i32 constants as sext for RISCV. The code that creates phi incoming values in the predecessors creates an any_extend for the constants which then gets treated as a sext by getNode. This makes the AssertZExt incorrect and can cause zexts to be incorrectly removed. This bug was introduced by D105918 Differential Revision: https://reviews.llvm.org/D122052	2022-03-19 18:43:14 -07:00
Simon Pilgrim	b929db5968	[X86] Add some initial test coverage for PR35908 add/sub + bittest patterns	2022-03-19 19:20:19 +00:00
Simon Pilgrim	b90478d422	[X86] createShuffleMaskFromVSELECT - handle BLENDV constant masks as well as VSELECT constant masks Handle constant masks for both vselect nodes (mask != 0) and blendv nodes (mask < 0)	2022-03-19 16:51:07 +00:00
Simon Pilgrim	a6c18bfbe3	[X86] combineSelect - don't constant fold BLENDV nodes like VSELECT If a X86ISD::BLENDV op appears before legalization (in this test case due to the icmp_slt x, 0) its constant mask was being treated as a vselect mask (mask != 0) instead of blendv (mask < 0) This just prevents constant folding entirely for non-VSELECT ops.	2022-03-19 16:31:19 +00:00
Simon Pilgrim	33d2c00814	[X86] Add test showing a bug where a BLENDV mask is being constant folded as VSELECT mask combineSelect doesn't expect X86ISD::BLENDV ops to appear before legalization and is treating the constant mask as a vselect mask (mask != 0) instead of blendv (mask < 0)	2022-03-19 16:31:19 +00:00
Simon Pilgrim	2dacd0d9c3	[X86] Update remaining AVX512 VBMI2 VL intrinsic tests to avoid adds As noticed in D119654, by adding the masked intrinsics results together we can end up with the selects being canonicalized away from the intrinsic - this isn't what we want to test here so replace with a insertvalue chain into a aggregate instead to retain all the results.	2022-03-19 15:41:25 +00:00
Simon Pilgrim	56ad791f46	[X86] LowerAndToBT - fold BT(NOT(X),Y) -> BT(X,Y) and flip the CondCode	2022-03-19 14:03:03 +00:00
Simon Pilgrim	c7ba5a9aff	[X86][SSE] Add initial support for extracting non-constant bool vector elements We can use MOVMSK+TEST/BT to extract individual bool elements even if the index isn't constant This relies on combineBitcastvxi1 so some AVX512 cases still aren't optimized as they avoid MOVMSK usage.	2022-03-19 13:31:05 +00:00
Simon Pilgrim	abb9cbb22e	[X86][SSE] Add tests for non-constant bool vector extractions We should be able to perform this with MOVMSK+TEST/BT instead of spilling to stack	2022-03-19 13:25:21 +00:00
chenglin.bi	dd3b90e4d7	[AArch64] Combine ISD::SETCC into AArch64ISD::ANDS When N > 12, (2^N -1) is not a legal add immediate (isLegalAddImmediate will return false). ANd if SetCC input use this number, DAG combiner will generate one more SRL instruction. So combine [setcc (srl x, imm), 0, ne] to [setcc (and x, (-1 << imm)), 0, ne] to get better optimization in emitComparison Fix https://github.com/llvm/llvm-project/issues/54283 Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D121449	2022-03-19 13:04:16 +00:00
Paul Walker	f46fe36d59	[AArch64] Fix incorrect getSetCCInverse usage within trySwapVSelectOperands. When inverting the compare predicate trySwapVSelectOperands is incorrectly using the type of the select's cond operand rather than the type of cond's operands. This means we're treating all inversions as if they're integer. Differential Revision: https://reviews.llvm.org/D121968	2022-03-19 12:36:14 +00:00
Eli Friedman	ddca66622c	[ARM] Fix shouldExpandAtomicLoadInIR for subtargets without ldrexd. Regression from 2f497ec3; we should not try to generate ldrexd on targets that don't have it. Also, while I'm here, fix shouldExpandAtomicStoreInIR, for consistency. That doesn't really have any practical effect, though. On Thumb targets where we need to use __sync_* libcalls, there is no libcall for stores, so SelectionDAG calls __sync_lock_test_and_set_8 anyway.	2022-03-18 15:54:38 -07:00

1 2 3 4 5 ...

42666 Commits