42676 Commits

Author SHA1 Message Date
Craig Topper
9933015fdd [X86] Fold MMX_MOVD64from64rr + store to MMX_MOVQ64mr instead of MMX_MOVD64from64mr.
MMX_MOVD64from64rr moves an MMX register to a 64-bit GPR.

MMX_MOVD64from64mr is the memory version of moving a MMX register to a
64-bit GPR. It requires the REX.W bit to be set. There are no isel
patterns that use this instruction.

MMX_MOVQ64mr is the MMX register store instruction. It doesn't
require a REX.W prefix. This makes it one byte shorter to encode
than MMX_MOVD64from64mr in many cases.

Both store instructions output the same mnemonic string. The assembler
would choose MMX_MOVQ64mr if it was to parse the output. Which is
another reason using it is the correct thing to do.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122241
2022-03-22 14:21:55 -07:00
Stanislav Mekhanoshin
72c1a0d9c2 [AMDGPU] Allow v_accvgpr_write to use SGPR on gfx90a
This is undocumented, but it should work.

Differential Revision: https://reviews.llvm.org/D122252
2022-03-22 13:52:29 -07:00
Stanislav Mekhanoshin
631a643940 [AMDGPU] Update mfma test to run gfx940 checks. NFC. 2022-03-22 12:43:57 -07:00
Simon Pilgrim
7f8572b8c3 [ARM] select_xform.ll - re-add and fix missing CHECK prefixes
We were still checking test results with the CHECK prefix but they had bit-rotted since whenever it'd been removed from the --check-prefixes list
2022-03-22 17:35:10 +00:00
Craig Topper
51940d69cb [RISCV] Special case sign extended scalars when type legalizing nxvXi64 .vx instrinsics on RV32.
On RV32, we need to type legalize i64 scalar arguments to intrinsics.
We usually do this by splatting the value into a vector separately.
If the scalar happens to be sign extended, we can continue using a .vx
intrinsic.

We already special cased sign extended constants, this extends it
to any sign extended value.

I've only added tests for one case of vadd. Most intrinsics go
through the same check.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D122186
2022-03-22 10:29:06 -07:00
Craig Topper
9b0f227d7b [TableGen][RISCV] Add InstAliases with zero_reg to cover unmasked vnot.v, vncvt.x.x.w, vneg.v, etc.
The mask being NoRegister prevented the existing aliases from matching
since NoRegister isn't in the VMV0 register class.

To workaround this I've added new aliases that look for zero_reg.
I had to motify tablegen to generate matching code for zero_reg.
And as a consequence, I had to change the EmitPriority for an ARM
alias that used zero_reg that started printing.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D121496
2022-03-22 10:14:43 -07:00
Craig Topper
49c2206b3b [VP] Preserve address space of pointer for strided load/store intrinsics.
This adds LLVMAnyPointerToElt to use instead of LLVMPointerToElt.
This allows us to preserve the address space as part of the type
overload for the intrinsic, but still require the vector element
type to match the pointer type.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D122042
2022-03-22 09:52:54 -07:00
Zakk Chen
10fd2822b7 [RISCV] Add policy operand for masked compare and vmsbf/vmsif/vmsof IR
intrinsics.

Those operations are updated under a tail agnostic policy, but they
could have mask agnostic or undisturbed.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D120228
2022-03-22 07:47:21 -07:00
Igor Kudrin
d7681d9f77 [NVPTX] Avoid a crash when 'llc' is called with '-filetype=null'
For '-filetype=null', 'NVPTXTargetStreamer' is not created, so the
return value of 'OutStreamer->getTargetStreamer()' should be checked
before calling the methods.

Differential Revision: https://reviews.llvm.org/D122001
2022-03-22 16:46:47 +04:00
Igor Kudrin
2881696b40 [tests] Force (some) X86-specific tests to use an explicit triple
These tests are located in 'X86' subfolders which means that they should
be compiled for that target. As they did not have the target specified
explicitly, they in fact were compiled for a default target triple. Not
all targets support all required features for these tests; for example,
if NVPTX is used as a default triple, the tests fail. The patch makes the
tests run for 'x86_64', thus they pass regardless of the default target.

Differential Revision: https://reviews.llvm.org/D121998
2022-03-22 16:46:47 +04:00
alex-t
0a488cba2c [AMDGPU] use scalar shift for SALU users in frame index elimination
In the frame index lowering we have to insert shift and add
instructions to adjust stack object access.  We need to take care of the stack
object user kind and use scalar shift/add for scalar users.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D121524
2022-03-22 11:43:23 +01:00
Zakk Chen
9ab18cc535 [RISCV] Add policy operand for masked vid and viota IR intrinsics.
Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D120227
2022-03-22 02:32:31 -07:00
Alex Bradbury
86cc731f4c [WebAssembly] Always emit functype directives for defined functions
This fixes bug <https://github.com/llvm/llvm-project/issues/54022>. For
now this means that defined functions will have two .functype directives
emitted. Given discussion in that bug has suggested interest in moving
towards using something other than .functype to mark the beginning of a
function (which would, as a side-effect, solve this issue), this patch
doesn't attempt to avoid that duplication.

Some test cases that used CHECK-LABEL: foo rather than CHECK-LABEL: foo:
are broken by this change. This patch updates those test cases to always
have a colon at the end of the CHECK-LABEL string.

Differential Revision: https://reviews.llvm.org/D122134
2022-03-22 09:24:58 +00:00
Zakk Chen
abb5a985e9 [RISCV] Support mask policy for RVV IR intrinsics.
Add the UsesMaskPolicy flag to indicate the operations result
would be effected by the mask policy. (ex. mask operations).

It means RISCVInsertVSETVLI should decide the mask policy according
by mask policy operand or passthru operand.
If UsesMaskPolicy is false (ex. unmasked, store, and reduction operations),
the mask policy could be either mask undisturbed or agnostic.
Currently, RISCVInsertVSETVLI sets UsesMaskPolicy operations default to
MA, otherwise to MU to keep the current mask policy would not be changed
for unmasked operations.

Add masked-tama, masked-tamu, masked-tuma and masked-tumu test cases.
I didn't add all operations because most of implementations are using
the same pseudo multiclass. Some tests maybe be duplicated in different
tests. (ex. masked vmacc with tumu shows in vmacc-rv32.ll and masked-tumu)
I think having different tests only for policy would make the testing
clear.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120226
2022-03-22 01:19:16 -07:00
Lian Wang
0ff19b1905 [RISCV][NFC] Add some check prefixes to remove redundant checks in some IR tests
Reviewed By: frasercrmck, jacquesguan

Differential Revision: https://reviews.llvm.org/D122211
2022-03-22 08:14:08 +00:00
jacquesguan
f863df9a05 [RISCV][NFC] Add common check prefix to reduce duplicate check lines.
Differential Revision: https://reviews.llvm.org/D122120
2022-03-22 11:06:52 +08:00
Carl Ritson
8e64d84995 [MachineSink] Check block prologue interference
Sinking must check for interference between the block prologue
and the instruction being sunk.
Specifically check for clobbering of uses by the prologue, and
overwrites to prologue defined registers by the sunk instruction.

Reviewed By: rampitec, ruiling

Differential Revision: https://reviews.llvm.org/D121277
2022-03-22 11:15:37 +09:00
Craig Topper
cc5b0868ff Revert "[RISCV] Special case sign extended scalars when type legalizing nxvXi64 .vx instrinsics on RV32."
This reverts commit 8c4937b33fe9090546f6dc834e174177075b5084.

Committed by mistake.
2022-03-21 14:58:11 -07:00
Craig Topper
8c4937b33f [RISCV] Special case sign extended scalars when type legalizing nxvXi64 .vx instrinsics on RV32.
On RV32, we need to type legalize i64 scalar arguments to intrinsics.
We usually do this by splatting the value into a vector separately.
If the scalar happens to be sign extended, we can continue using a .vx
intrinsic.

We already special cased sign extended constants, this extends it
to any sign extended value.

I've only added tests for one case of vadd. Most intrinsics go
through the same check. I can add more tests if we're concerned.

Differential Revision: https://reviews.llvm.org/D122186
2022-03-21 14:50:55 -07:00
Simon Pilgrim
438ac282db [X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y) (REAPPLIED)
As suggested on PR35908, if we are adding/subtracting an extracted bit, attempt to use BT instead to fold the op and use a ADC/SBB op.

Reapply with extra type legality checks - LowerAndToBT was originally only used during lowering, now that it can occur earlier we might encounter illegal types that we can either promote to i32 or just bail.

Differential Revision: https://reviews.llvm.org/D122084
2022-03-21 21:37:42 +00:00
Nikita Popov
ff3f279dac [X86] Regenerate test checks
Update test checks after the revert in
15336828395792bfc818e6fcd3d951cba1b8477b.
2022-03-21 22:13:19 +01:00
Nikita Popov
1533682839 Revert "[X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y)"
This reverts commit 81569f5b6ef531a48023f28133481262ee1509a3.

This causes a segfault when building consumer-typeset in
ReleaseLTO-g configuration:
https://llvm-compile-time-tracker.com/show_error.php?commit=81569f5b6ef531a48023f28133481262ee1509a3
2022-03-21 21:52:36 +01:00
Stefan Pintilie
4275d7e65a [PowerPC][NFC] Add test case for byval argument passing
Add a test case for byval argument passing where the argument size is more than
8 bytes and is not a factor of 8 bytes.
2022-03-21 15:14:28 -05:00
alex-t
a0ea7ec90f [AMDGPU] divergence patterns for the BUILD_VECTOR i16, undef expansion.
BUILD_VECTOR of i16 and undef gets expanded to the COPY_TO_REGCLASS.
         The latter is further lowererd to the copy instructions.
	 We need to provide the correct register class for the uniform and divergent BUILD_VECTOR nodes
	 to avoid VGPR to SGPR copies.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D122068
2022-03-21 21:11:20 +01:00
Simon Pilgrim
5fd9451668 [X86][AVX512] lower1BitShuffle - fold broadcast(setcc(x,y)) -> setcc(broadcast(x),broadcast(y)) (PR52500)
AVX512 has excellent broadcast ops for everything but vXi1 bool vectors - so if we're broadcasting a comparison result, see if we can broadcast the comparison operands instead.
2022-03-21 17:42:49 +00:00
Simon Pilgrim
8692e27ad6 [X86][AVX512] Add PR52500 vXi1 broadcast test case 2022-03-21 17:25:29 +00:00
Simon Pilgrim
21378593fb [X86] Add PR34666 redundant broadcast test case 2022-03-21 16:10:06 +00:00
Simon Pilgrim
b6e2832fc2 [X86] Don't fold SUB(X,SBB(0,0,W)) -> SUB(ADC(0,0,W),Y)
This will further fold to a AND(SETCC_CARRY(),1) pattern which tends to prevent further folds.
2022-03-21 15:54:48 +00:00
Simon Pilgrim
58dda03f7c [X86] Add ((z & m) >> s) - (x + y)) sub -> sbb test case
Another variant based off the PR35908 test cases
2022-03-21 15:54:47 +00:00
zhongyunde
828b89bc0b [AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads
Trying to reduce the number of masked loads in favour of more unpklo/hi
instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions
from legal types.

Both of normal and masked loads test cases added to guard compile crash.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D120953
2022-03-21 23:47:33 +08:00
Simon Pilgrim
315896d3ac [X86] Fold SUB(X,SBB(Y,Z,W)) -> SUB(ADC(X,Z,W),Y)
Prefer the commutable ADC over SBB to improve load folding opportunities
2022-03-21 14:20:46 +00:00
Alex Bradbury
da9ba89d48 [WebAssembly][NFC] Add test case for functype emission
This test aims to demonstrate the WebAssembly backend's behaviour around
emission of the .functype directive. It covers defined and declared
functions as well as libcalls.

It currently fails to emit functypes for all defined functions at the
head of the file, causing issues with the type checker
<https://github.com/llvm/llvm-project/issues/54022>. The patch in
<https://reviews.llvm.org/D122134> is a proposal to fix this issue.
2022-03-21 14:04:32 +00:00
Simon Pilgrim
ed51e26ab4 [X86] combineAddOrSubToADCOrSBB - commute + neg subtraction patterns
Handle SUB(AND(SRL(Y,Z),1),X) -> NEG(SBB(X,0,BT(Y,Z))) folds

I'll address the X86 lost folded-load regressions in a follow-up patch
2022-03-21 13:55:35 +00:00
Simon Pilgrim
5e9365c5eb [X86] combineAddOrSubToADCOrSBB - bail for illegal types
Ensure we don't attempt to fold to illegal types to ADC/SBB nodes.

After D122084 its possible for ADD(X,AND(SRL(Y,Z),1) patterns to be matched before type legalization.
2022-03-21 13:31:21 +00:00
Simon Pilgrim
35a7be6ccb [SDAG] enable binop identity constant folds for shifts
Add shl/srl/sra to the list of ops that we canonicalize with a select to expose an identity merge

Differential Revision: https://reviews.llvm.org/D122070
2022-03-21 13:02:50 +00:00
Simon Pilgrim
76cbfd949d [X86] Add nounwind to adc/sbb tests to prevent cfi noise 2022-03-21 11:44:22 +00:00
Jay Foad
321c8ab81b [AMDGPU] Add an agpr copy propagation test 2022-03-21 11:42:57 +00:00
Jay Foad
692341e998 [AMDGPU] Update checks in agpr-copy-propagation.mir 2022-03-21 11:42:56 +00:00
Simon Pilgrim
81569f5b6e [X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y)
As suggested on PR35908, if we are adding/subtracting an extracted bit, attempt to use BT instead to fold the op and use a ADC/SBB op.

Differential Revision: https://reviews.llvm.org/D122084
2022-03-21 10:57:12 +00:00
Simon Pilgrim
65cf643073 [X86] Add (x - y - ((z & m) >> s)) sub -> sbb test case for D122084 2022-03-21 10:44:17 +00:00
Thomas Symalla
7de6107dce Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3."
This reverts commit 011c64191ef9ccc6538d52f4b57f98f37d4ea36e and
e725e2afe02e18398525652c9bceda1eb055ea64.

Differential Revision: https://reviews.llvm.org/D122117
2022-03-21 09:50:44 +01:00
Thomas Symalla
011c64191e [AMDGPU] Improve v_cmpx usage on GFX10.3.
On GFX10.3 targets, the following instruction sequence

v_cmp_* SGPR, ...
s_and_saveexec ..., SGPR

leads to a fairly long stall caused by a VALU write to a SGPR and having the
following SALU wait for the SGPR.

An equivalent sequence is to save the exec mask manually instead of letting
s_and_saveexec do the work and use a v_cmpx instruction instead to do the
comparison.

This patch modifies the SIOptimizeExecMasking pass as this is the last position
where s_and_saveexec instructions are inserted. It does the transformation by
trying to find the pattern, extracting the operands and generating the new
instruction sequence.

It also changes some existing lit tests and introduces a few new tests to show
the changed behavior on GFX10.3 targets.

Reviewed By: sebastian-ne, critson

Differential Revision: https://reviews.llvm.org/D119696
2022-03-21 09:31:59 +01:00
Aaron Puchert
c1a31ee65b [PPCISelLowering] Avoid emitting calls to __multi3, __muloti4
After D108936, @llvm.smul.with.overflow.i64 was lowered to __multi3
instead of __mulodi4, which also doesn't exist on PowerPC 32-bit, not
even with compiler-rt. Block it as well so that we get inline code.

Because libgcc doesn't have __muloti4, we block that as well.

Fixes #54460.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122090
2022-03-20 20:59:30 +01:00
Chen Zheng
973b02b6f1 [PowerPC][NFC] use right hardware loop intrinsics in test case 2022-03-20 10:00:57 -04:00
esmeyi
de20a3b677 [XCOFF] support XCOFFObjectWriter for fileHeader and sectionHeaders in 64-bit XCOFF.
This is the first patch to enable the XCOFF64 object writer.
Currently only fileHeader and sectionHeaders are supported.

Reviewed By: jhenderson, DiggerLin

Differential Revision: https://reviews.llvm.org/D120861
2022-03-20 09:31:29 -04:00
Luo, Yuanke
10bb623192 enable binop identity constant folds for add
Differential Revision: https://reviews.llvm.org/D119654
2022-03-20 19:07:16 +08:00
Simon Pilgrim
06fa67dc0a [X86] Add test add with bit0 extraction and improve comments
Based on feedback from D122084
2022-03-20 09:31:52 +00:00
Craig Topper
4eb59f0179 [SelectionDAG][RISCV] Make RegsForValue::getCopyToRegs explicitly zero_extend constants.
ComputePHILiveOutRegInfo assumes that constant incoming values to
Phis will be zero extended if they aren't a legal type. To guarantee
that we should zero_extend rather than any_extend constants.

This fixes a bug for RISCV where any_extend of constants can be
treated as a sign_extend.

Differential Revision: https://reviews.llvm.org/D122053
2022-03-19 18:43:14 -07:00
Craig Topper
268371cf7b [RISCV] Add test case for miscompile caused by treating ANY_EXTEND of constants as SIGN_EXTEND.
The code that inserts AssertZExt based on predecessor information assumes
constants are zero extended for phi incoming values this allows
AssertZExt to be created in blocks consuming a Phi.
SelectionDAG::getNode treats any_extend of i32 constants as sext for RISCV.
The code that creates phi incoming values in the predecessors creates an
any_extend for the constants which then gets treated as a sext by getNode.
This makes the AssertZExt incorrect and can cause zexts to be
incorrectly removed.

This bug was introduced by D105918

Differential Revision: https://reviews.llvm.org/D122052
2022-03-19 18:43:14 -07:00
Simon Pilgrim
b929db5968 [X86] Add some initial test coverage for PR35908 add/sub + bittest patterns 2022-03-19 19:20:19 +00:00