42666 Commits

Author SHA1 Message Date
alex-t
0a488cba2c [AMDGPU] use scalar shift for SALU users in frame index elimination
In the frame index lowering we have to insert shift and add
instructions to adjust stack object access.  We need to take care of the stack
object user kind and use scalar shift/add for scalar users.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D121524
2022-03-22 11:43:23 +01:00
Zakk Chen
9ab18cc535 [RISCV] Add policy operand for masked vid and viota IR intrinsics.
Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D120227
2022-03-22 02:32:31 -07:00
Alex Bradbury
86cc731f4c [WebAssembly] Always emit functype directives for defined functions
This fixes bug <https://github.com/llvm/llvm-project/issues/54022>. For
now this means that defined functions will have two .functype directives
emitted. Given discussion in that bug has suggested interest in moving
towards using something other than .functype to mark the beginning of a
function (which would, as a side-effect, solve this issue), this patch
doesn't attempt to avoid that duplication.

Some test cases that used CHECK-LABEL: foo rather than CHECK-LABEL: foo:
are broken by this change. This patch updates those test cases to always
have a colon at the end of the CHECK-LABEL string.

Differential Revision: https://reviews.llvm.org/D122134
2022-03-22 09:24:58 +00:00
Zakk Chen
abb5a985e9 [RISCV] Support mask policy for RVV IR intrinsics.
Add the UsesMaskPolicy flag to indicate the operations result
would be effected by the mask policy. (ex. mask operations).

It means RISCVInsertVSETVLI should decide the mask policy according
by mask policy operand or passthru operand.
If UsesMaskPolicy is false (ex. unmasked, store, and reduction operations),
the mask policy could be either mask undisturbed or agnostic.
Currently, RISCVInsertVSETVLI sets UsesMaskPolicy operations default to
MA, otherwise to MU to keep the current mask policy would not be changed
for unmasked operations.

Add masked-tama, masked-tamu, masked-tuma and masked-tumu test cases.
I didn't add all operations because most of implementations are using
the same pseudo multiclass. Some tests maybe be duplicated in different
tests. (ex. masked vmacc with tumu shows in vmacc-rv32.ll and masked-tumu)
I think having different tests only for policy would make the testing
clear.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120226
2022-03-22 01:19:16 -07:00
Lian Wang
0ff19b1905 [RISCV][NFC] Add some check prefixes to remove redundant checks in some IR tests
Reviewed By: frasercrmck, jacquesguan

Differential Revision: https://reviews.llvm.org/D122211
2022-03-22 08:14:08 +00:00
jacquesguan
f863df9a05 [RISCV][NFC] Add common check prefix to reduce duplicate check lines.
Differential Revision: https://reviews.llvm.org/D122120
2022-03-22 11:06:52 +08:00
Carl Ritson
8e64d84995 [MachineSink] Check block prologue interference
Sinking must check for interference between the block prologue
and the instruction being sunk.
Specifically check for clobbering of uses by the prologue, and
overwrites to prologue defined registers by the sunk instruction.

Reviewed By: rampitec, ruiling

Differential Revision: https://reviews.llvm.org/D121277
2022-03-22 11:15:37 +09:00
Craig Topper
cc5b0868ff Revert "[RISCV] Special case sign extended scalars when type legalizing nxvXi64 .vx instrinsics on RV32."
This reverts commit 8c4937b33fe9090546f6dc834e174177075b5084.

Committed by mistake.
2022-03-21 14:58:11 -07:00
Craig Topper
8c4937b33f [RISCV] Special case sign extended scalars when type legalizing nxvXi64 .vx instrinsics on RV32.
On RV32, we need to type legalize i64 scalar arguments to intrinsics.
We usually do this by splatting the value into a vector separately.
If the scalar happens to be sign extended, we can continue using a .vx
intrinsic.

We already special cased sign extended constants, this extends it
to any sign extended value.

I've only added tests for one case of vadd. Most intrinsics go
through the same check. I can add more tests if we're concerned.

Differential Revision: https://reviews.llvm.org/D122186
2022-03-21 14:50:55 -07:00
Simon Pilgrim
438ac282db [X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y) (REAPPLIED)
As suggested on PR35908, if we are adding/subtracting an extracted bit, attempt to use BT instead to fold the op and use a ADC/SBB op.

Reapply with extra type legality checks - LowerAndToBT was originally only used during lowering, now that it can occur earlier we might encounter illegal types that we can either promote to i32 or just bail.

Differential Revision: https://reviews.llvm.org/D122084
2022-03-21 21:37:42 +00:00
Nikita Popov
ff3f279dac [X86] Regenerate test checks
Update test checks after the revert in
15336828395792bfc818e6fcd3d951cba1b8477b.
2022-03-21 22:13:19 +01:00
Nikita Popov
1533682839 Revert "[X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y)"
This reverts commit 81569f5b6ef531a48023f28133481262ee1509a3.

This causes a segfault when building consumer-typeset in
ReleaseLTO-g configuration:
https://llvm-compile-time-tracker.com/show_error.php?commit=81569f5b6ef531a48023f28133481262ee1509a3
2022-03-21 21:52:36 +01:00
Stefan Pintilie
4275d7e65a [PowerPC][NFC] Add test case for byval argument passing
Add a test case for byval argument passing where the argument size is more than
8 bytes and is not a factor of 8 bytes.
2022-03-21 15:14:28 -05:00
alex-t
a0ea7ec90f [AMDGPU] divergence patterns for the BUILD_VECTOR i16, undef expansion.
BUILD_VECTOR of i16 and undef gets expanded to the COPY_TO_REGCLASS.
         The latter is further lowererd to the copy instructions.
	 We need to provide the correct register class for the uniform and divergent BUILD_VECTOR nodes
	 to avoid VGPR to SGPR copies.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D122068
2022-03-21 21:11:20 +01:00
Simon Pilgrim
5fd9451668 [X86][AVX512] lower1BitShuffle - fold broadcast(setcc(x,y)) -> setcc(broadcast(x),broadcast(y)) (PR52500)
AVX512 has excellent broadcast ops for everything but vXi1 bool vectors - so if we're broadcasting a comparison result, see if we can broadcast the comparison operands instead.
2022-03-21 17:42:49 +00:00
Simon Pilgrim
8692e27ad6 [X86][AVX512] Add PR52500 vXi1 broadcast test case 2022-03-21 17:25:29 +00:00
Simon Pilgrim
21378593fb [X86] Add PR34666 redundant broadcast test case 2022-03-21 16:10:06 +00:00
Simon Pilgrim
b6e2832fc2 [X86] Don't fold SUB(X,SBB(0,0,W)) -> SUB(ADC(0,0,W),Y)
This will further fold to a AND(SETCC_CARRY(),1) pattern which tends to prevent further folds.
2022-03-21 15:54:48 +00:00
Simon Pilgrim
58dda03f7c [X86] Add ((z & m) >> s) - (x + y)) sub -> sbb test case
Another variant based off the PR35908 test cases
2022-03-21 15:54:47 +00:00
zhongyunde
828b89bc0b [AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads
Trying to reduce the number of masked loads in favour of more unpklo/hi
instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions
from legal types.

Both of normal and masked loads test cases added to guard compile crash.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D120953
2022-03-21 23:47:33 +08:00
Simon Pilgrim
315896d3ac [X86] Fold SUB(X,SBB(Y,Z,W)) -> SUB(ADC(X,Z,W),Y)
Prefer the commutable ADC over SBB to improve load folding opportunities
2022-03-21 14:20:46 +00:00
Alex Bradbury
da9ba89d48 [WebAssembly][NFC] Add test case for functype emission
This test aims to demonstrate the WebAssembly backend's behaviour around
emission of the .functype directive. It covers defined and declared
functions as well as libcalls.

It currently fails to emit functypes for all defined functions at the
head of the file, causing issues with the type checker
<https://github.com/llvm/llvm-project/issues/54022>. The patch in
<https://reviews.llvm.org/D122134> is a proposal to fix this issue.
2022-03-21 14:04:32 +00:00
Simon Pilgrim
ed51e26ab4 [X86] combineAddOrSubToADCOrSBB - commute + neg subtraction patterns
Handle SUB(AND(SRL(Y,Z),1),X) -> NEG(SBB(X,0,BT(Y,Z))) folds

I'll address the X86 lost folded-load regressions in a follow-up patch
2022-03-21 13:55:35 +00:00
Simon Pilgrim
5e9365c5eb [X86] combineAddOrSubToADCOrSBB - bail for illegal types
Ensure we don't attempt to fold to illegal types to ADC/SBB nodes.

After D122084 its possible for ADD(X,AND(SRL(Y,Z),1) patterns to be matched before type legalization.
2022-03-21 13:31:21 +00:00
Simon Pilgrim
35a7be6ccb [SDAG] enable binop identity constant folds for shifts
Add shl/srl/sra to the list of ops that we canonicalize with a select to expose an identity merge

Differential Revision: https://reviews.llvm.org/D122070
2022-03-21 13:02:50 +00:00
Simon Pilgrim
76cbfd949d [X86] Add nounwind to adc/sbb tests to prevent cfi noise 2022-03-21 11:44:22 +00:00
Jay Foad
321c8ab81b [AMDGPU] Add an agpr copy propagation test 2022-03-21 11:42:57 +00:00
Jay Foad
692341e998 [AMDGPU] Update checks in agpr-copy-propagation.mir 2022-03-21 11:42:56 +00:00
Simon Pilgrim
81569f5b6e [X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y)
As suggested on PR35908, if we are adding/subtracting an extracted bit, attempt to use BT instead to fold the op and use a ADC/SBB op.

Differential Revision: https://reviews.llvm.org/D122084
2022-03-21 10:57:12 +00:00
Simon Pilgrim
65cf643073 [X86] Add (x - y - ((z & m) >> s)) sub -> sbb test case for D122084 2022-03-21 10:44:17 +00:00
Thomas Symalla
7de6107dce Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3."
This reverts commit 011c64191ef9ccc6538d52f4b57f98f37d4ea36e and
e725e2afe02e18398525652c9bceda1eb055ea64.

Differential Revision: https://reviews.llvm.org/D122117
2022-03-21 09:50:44 +01:00
Thomas Symalla
011c64191e [AMDGPU] Improve v_cmpx usage on GFX10.3.
On GFX10.3 targets, the following instruction sequence

v_cmp_* SGPR, ...
s_and_saveexec ..., SGPR

leads to a fairly long stall caused by a VALU write to a SGPR and having the
following SALU wait for the SGPR.

An equivalent sequence is to save the exec mask manually instead of letting
s_and_saveexec do the work and use a v_cmpx instruction instead to do the
comparison.

This patch modifies the SIOptimizeExecMasking pass as this is the last position
where s_and_saveexec instructions are inserted. It does the transformation by
trying to find the pattern, extracting the operands and generating the new
instruction sequence.

It also changes some existing lit tests and introduces a few new tests to show
the changed behavior on GFX10.3 targets.

Reviewed By: sebastian-ne, critson

Differential Revision: https://reviews.llvm.org/D119696
2022-03-21 09:31:59 +01:00
Aaron Puchert
c1a31ee65b [PPCISelLowering] Avoid emitting calls to __multi3, __muloti4
After D108936, @llvm.smul.with.overflow.i64 was lowered to __multi3
instead of __mulodi4, which also doesn't exist on PowerPC 32-bit, not
even with compiler-rt. Block it as well so that we get inline code.

Because libgcc doesn't have __muloti4, we block that as well.

Fixes #54460.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122090
2022-03-20 20:59:30 +01:00
Chen Zheng
973b02b6f1 [PowerPC][NFC] use right hardware loop intrinsics in test case 2022-03-20 10:00:57 -04:00
esmeyi
de20a3b677 [XCOFF] support XCOFFObjectWriter for fileHeader and sectionHeaders in 64-bit XCOFF.
This is the first patch to enable the XCOFF64 object writer.
Currently only fileHeader and sectionHeaders are supported.

Reviewed By: jhenderson, DiggerLin

Differential Revision: https://reviews.llvm.org/D120861
2022-03-20 09:31:29 -04:00
Luo, Yuanke
10bb623192 enable binop identity constant folds for add
Differential Revision: https://reviews.llvm.org/D119654
2022-03-20 19:07:16 +08:00
Simon Pilgrim
06fa67dc0a [X86] Add test add with bit0 extraction and improve comments
Based on feedback from D122084
2022-03-20 09:31:52 +00:00
Craig Topper
4eb59f0179 [SelectionDAG][RISCV] Make RegsForValue::getCopyToRegs explicitly zero_extend constants.
ComputePHILiveOutRegInfo assumes that constant incoming values to
Phis will be zero extended if they aren't a legal type. To guarantee
that we should zero_extend rather than any_extend constants.

This fixes a bug for RISCV where any_extend of constants can be
treated as a sign_extend.

Differential Revision: https://reviews.llvm.org/D122053
2022-03-19 18:43:14 -07:00
Craig Topper
268371cf7b [RISCV] Add test case for miscompile caused by treating ANY_EXTEND of constants as SIGN_EXTEND.
The code that inserts AssertZExt based on predecessor information assumes
constants are zero extended for phi incoming values this allows
AssertZExt to be created in blocks consuming a Phi.
SelectionDAG::getNode treats any_extend of i32 constants as sext for RISCV.
The code that creates phi incoming values in the predecessors creates an
any_extend for the constants which then gets treated as a sext by getNode.
This makes the AssertZExt incorrect and can cause zexts to be
incorrectly removed.

This bug was introduced by D105918

Differential Revision: https://reviews.llvm.org/D122052
2022-03-19 18:43:14 -07:00
Simon Pilgrim
b929db5968 [X86] Add some initial test coverage for PR35908 add/sub + bittest patterns 2022-03-19 19:20:19 +00:00
Simon Pilgrim
b90478d422 [X86] createShuffleMaskFromVSELECT - handle BLENDV constant masks as well as VSELECT constant masks
Handle constant masks for both vselect nodes (mask != 0) and blendv nodes (mask < 0)
2022-03-19 16:51:07 +00:00
Simon Pilgrim
a6c18bfbe3 [X86] combineSelect - don't constant fold BLENDV nodes like VSELECT
If a X86ISD::BLENDV op appears before legalization (in this test case due to the icmp_slt x, 0) its constant mask was being treated as a vselect mask (mask != 0) instead of blendv (mask < 0)

This just prevents constant folding entirely for non-VSELECT ops.
2022-03-19 16:31:19 +00:00
Simon Pilgrim
33d2c00814 [X86] Add test showing a bug where a BLENDV mask is being constant folded as VSELECT mask
combineSelect doesn't expect X86ISD::BLENDV ops to appear before legalization and is treating the constant mask as a vselect mask (mask != 0) instead of blendv (mask < 0)
2022-03-19 16:31:19 +00:00
Simon Pilgrim
2dacd0d9c3 [X86] Update remaining AVX512 VBMI2 VL intrinsic tests to avoid adds
As noticed in D119654, by adding the masked intrinsics results together we can end up with the selects being canonicalized away from the intrinsic - this isn't what we want to test here so replace with a insertvalue chain into a aggregate instead to retain all the results.
2022-03-19 15:41:25 +00:00
Simon Pilgrim
56ad791f46 [X86] LowerAndToBT - fold BT(NOT(X),Y) -> BT(X,Y) and flip the CondCode 2022-03-19 14:03:03 +00:00
Simon Pilgrim
c7ba5a9aff [X86][SSE] Add initial support for extracting non-constant bool vector elements
We can use MOVMSK+TEST/BT to extract individual bool elements even if the index isn't constant

This relies on combineBitcastvxi1 so some AVX512 cases still aren't optimized as they avoid MOVMSK usage.
2022-03-19 13:31:05 +00:00
Simon Pilgrim
abb9cbb22e [X86][SSE] Add tests for non-constant bool vector extractions
We should be able to perform this with MOVMSK+TEST/BT instead of spilling to stack
2022-03-19 13:25:21 +00:00
chenglin.bi
dd3b90e4d7 [AArch64] Combine ISD::SETCC into AArch64ISD::ANDS
When N > 12, (2^N -1) is not a legal add immediate (isLegalAddImmediate will return false).
ANd if SetCC input use this number, DAG combiner will generate one more SRL instruction.
So combine [setcc (srl x, imm), 0, ne] to [setcc (and x, (-1 << imm)), 0, ne] to get better optimization in emitComparison
Fix https://github.com/llvm/llvm-project/issues/54283

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D121449
2022-03-19 13:04:16 +00:00
Paul Walker
f46fe36d59 [AArch64] Fix incorrect getSetCCInverse usage within trySwapVSelectOperands.
When inverting the compare predicate trySwapVSelectOperands is
incorrectly using the type of the select's cond operand rather
than the type of cond's operands. This means we're treating all
inversions as if they're integer.

Differential Revision: https://reviews.llvm.org/D121968
2022-03-19 12:36:14 +00:00
Eli Friedman
ddca66622c [ARM] Fix shouldExpandAtomicLoadInIR for subtargets without ldrexd.
Regression from 2f497ec3; we should not try to generate ldrexd on
targets that don't have it.

Also, while I'm here, fix shouldExpandAtomicStoreInIR, for consistency.
That doesn't really have any practical effect, though.  On Thumb targets
where we need to use __sync_* libcalls, there is no libcall for stores,
so SelectionDAG calls __sync_lock_test_and_set_8 anyway.
2022-03-18 15:54:38 -07:00