47118 Commits

Author SHA1 Message Date
Vikram
64fc892cda [AMDGPU] Autogenerate carryout-selection.ll, uaddo.ll, usubo.ll (NFC)
Differential Revision: https://reviews.llvm.org/D143987
2023-02-24 02:03:56 -05:00
Min-Yih Hsu
058f7449cf [M68k] Provide exception pointer and selector registers
Using d0 for exception pointer and d1 for selector, as suggested by GCC.
2023-02-23 16:25:34 -08:00
Manolis Tsamis
7b79e8d455 [RISCV] Add vendor-defined XTheadFMemIdx (FP Indexed Memory Operations) extension
The vendor-defined XTHeadFMemIdx (no comparable standard extension exists
at the time of writing) extension adds indexed load/store instructions
for floating-point registers.

It is supported by the C9xx cores (e.g., found in the wild in the
Allwinner D1) by Alibaba T-Head.

The current (as of this commit) public documentation for this
extension is available at:
  https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf

Support for these instructions has already landed in GNU Binutils:
  https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=f511f80fa3fcaf6bcbe727fb902b8bd5ec8f9c20

Depends on D144249

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144647
2023-02-24 00:35:37 +01:00
Manolis Tsamis
f6262201d8 [RISCV] Add vendor-defined XTheadMemIdx (Indexed Memory Operations) extension
The vendor-defined XTHeadMemIdx (no comparable standard extension exists
at the time of writing) extension adds indexed load/store instructions
as well as load/store and update register instructions.

It is supported by the C9xx cores (e.g., found in the wild in the
Allwinner D1) by Alibaba T-Head.

The current (as of this commit) public documentation for this
extension is available at:
  https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf

Support for these instructions has already landed in GNU Binutils:
  https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=27cfd142d0a7e378d19aa9a1278e2137f849b71b

Depends on D144002

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144249
2023-02-24 00:17:58 +01:00
Samuel Parker
f48d3b6f46 Revert "[DAGCombine] Fold redundant select"
This reverts commit c7f9344d0f8f6a00adab138037e2e7b406ef2b69.
2023-02-23 17:59:41 +00:00
Craig Topper
230e61658b [LegalizeTypes] Add a special case for (add X, 1) to ExpandIntRes_ADDSUB.
On targets without ADDCARRY or ADDE, we need to emit a separate
SETCC to determine carry from the low half to the high half. Usually
we do (setult Lo, LHSLo). If RHSLo is 1 we can instead do (seteq Lo, 0).
This can reduce the live range of LHSLo.
2023-02-23 09:47:42 -08:00
Craig Topper
2fc5a5117c [LegalizeTypes][RISCV] Add a special case to ExpandIntRes_UADDSUBO for (uaddo X, 1).
On targets that lack ADDCARRY support we split a wide uaddo into
an ADD and a SETCC that both need to be split.

For (uaddo X, 1) we can observe that when the add overflows the result
will be 0. We can emit (seteq (or Lo, Hi), 0) to detect this.

This improves D142071.

There is an alternative here. We could use either ~(lo(X) & hi(X)) == 0 or
(lo(X) & hi(X)) == -1 before the addition. That would be closer to the
code before D142071.

Reviewed By: liaolucy

Differential Revision: https://reviews.llvm.org/D144614
2023-02-23 09:16:54 -08:00
Luke Lau
8d15e7275f [RISCV] Lower interleave and deinterleave intrinsics
Lower the two intrinsics introduced in D141924.

These intrinsics can be combined with loads and stores into the much more efficient segmented load and store instructions in a following patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D144092
2023-02-23 16:23:02 +00:00
Kerry McLaughlin
6c82d16d60 [SME2][AArch64] Add multi-vector rounding shift left intrinsics
Adds intrinsics for the following SME2 instructions:
 - srshl (single, 2 & 4 vector)
 - srshl (multi, 2 & 4 vector)
 - urshl (single, 2 & 4 vector)
 - urshl (multi, 2 & 4 vector)

NOTE: These intrinsics are still in development and are subject to future changes.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D144118
2023-02-23 14:33:33 +00:00
Piotr Sobczak
ab174c57f4 [AMDGPU] Add more tests for buffer intrinsics
Add more tests for buffer intrinsics with large voffsets.
2023-02-23 14:39:12 +01:00
Mirko Brkusanin
926746d22a [AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions
If more registers are needed for VAddr then the NSA format allows then the
final register can act as a contigous set of remaining addresses. Update
legalizer to pack register for this new format and allow instruction
selection to use NSA encoding when number of addresses exceeds max size.
Also update SIShrinkInstructions to handle partial NSA.

Differential Revision: https://reviews.llvm.org/D144034
2023-02-23 13:33:34 +01:00
Diana Picus
da629d3381 [AMDGPU] Add GISel RUN lines to 2 existing tests. NFC
This adds a bit of coverage for GlobalISel.

Differential Revision: https://reviews.llvm.org/D144555
2023-02-23 09:46:54 +01:00
Piotr Sobczak
1b9b4f3bfa [AMDGPU][NFC] Convert llvm.amdgcn tests to autogen 2023-02-23 08:21:12 +01:00
Yeting Kuo
419948fe67 [VP] Reorder is_int_min_poison/is_zero_poison operand before mask for vp.abs/ctlz/cttz.
The patch ensures last two operands of vp.abs/ctlz/cttz are mask and evl.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144536
2023-02-23 13:58:21 +08:00
Leonard Chan
cdb9a0c086 [MC][CodeGen] Define R_RISCV_PLT32 and lower dso_local_equivalent to it
This introduces R_RISCV_PLT32, PC-relative data relocation that takes
the 32-bit relative offset to a function or its PLT entry from its
relocation location.

This is needed to support relative vtables on RISCV.

Github PR: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/363

The lld handling of this reloc is D143115.

Differential Revision: https://reviews.llvm.org/D143226
2023-02-23 01:26:27 +00:00
David Green
74b67e53c6 [LSR] Fix incorrect check in 73cd3d4391ad47ae7
I missed that the test needed a icelake-server cpu to fail, and left a testing
"false &&" in the if condition. Hopefully this is now the correct fix.
2023-02-22 23:42:21 +00:00
David Green
73cd3d4391 [LSR] Prevent creating SCEVs of addrecs from mismatching loops
LSR can include Regs of AddRec SCEVs from different loops, which do not combine
well when added in Scalar Evolution. As they should never produce constant
differences so we can just guard against trying to create them.

Fixes #60927
2023-02-22 22:50:37 +00:00
Cameron McInally
af4c4f4e21 [DAGCombine] Fix an ICE in combineMinNumMaxNum(...)
65420c8041f4 introduced an ICE in combineMinNumMaxNum(...) when
combineMinNumMaxNumImpl(...) returns an SDValue(). Make sure to check that a
value is returned before trying to perform an FNEG on it.

GitHub Issue: #60924

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144571
2023-02-22 11:00:51 -08:00
Manolis Tsamis
a6446668a3 [RISCV] XTHeadMemPair: Fix invalid mempair combine for types other than i32/i64
A mistake in the control flow of performMemPairCombine resulted in paired
loads/stores for types that were not supported by the instructions (i8/i16).
These loads/stores could not match the constraints of the patterns defined
in the THead td file and the compiler would throw a 'Cannot select' error.

This is now fixed and two new test functions have been added in xtheadmempair.ll
which would previously crash the compiler. The compiler was additionally tested
with a wide range of benchmarks and no issues were observed.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144559
2023-02-22 19:57:37 +01:00
Konstantina Mitropoulou
944f429b21 [AMDGPU] Improve the lowering of raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16} intrinsics
Currently, raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16}
intrinsics are lowered as buffer_load_{u8,u16}. This patch combines
buffer_load_{u8,u16} and sign extension instructions in order to
generate buffer_load_{i8,i16} instructions.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D144313
2023-02-22 09:01:33 -08:00
Joe Nash
80a8e6805a [AMDGPU] Don't set src mods on permlane16
v_permlane16_b32 and v_permlanex16_b32 should not set abs and neg src
modifiers on any input, but they can set op_sel on src0 or src1 to
represent fi or bc when desired. The ISel patterns were setting
the src_modifier bits to -1, effectively setting abs and neg as well,
whenever it was intended to set op_sel, due to an error in ISel. ISel
should now correctly only set the op_sel bits.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144519
2023-02-22 11:41:52 -05:00
Jessica Del
fc672b6a8b [AMDGPU] Improved wide multiplies
These checks show optimized instructions if an operand is known to be
(partially) zero.

Change-Id: Ie2f6d0d3ee9d5b279d1f4c1dd0787492e39cc77a

Differential Revision: https://reviews.llvm.org/D140208
2023-02-22 16:39:06 +01:00
Jay Foad
e2eee902a4 [AMDGPU] Fix an assertion failure when folding into src2 of V_FMAC_F16
D139469 "[AMDGPU] Enable OMod on more VOP3 instructions" caused an
assertion failure when trying to fold into src2 of V_FMAC_F16. It would
temporarily convert the instruction to V_FMA_F16_gfx9 and add an opsel
operand, but if the fold still failed then it would forget to remove the
opsel operand.

Differential Revision: https://reviews.llvm.org/D144558
2023-02-22 14:26:03 +00:00
Piyou Chen
3b8c0b342e [RISCV] Add new pass to transform undef to pseudo for vector values.
RISC-V vector instruction has register overlapping constraint for certain
instructions, and will cause illegal instruction trap if violated, we use
early clobber to model this constraint, but it can't prevent register allocator
allocated same or overlapped if the input register is undef value, so convert
IMPLICIT_DEF to temporary pseudo could prevent that happen, it's not best way
to resolve this. Ideally we should model the constraint right, but before we
model the constraint right, it's the approach to prevent that happen.

See also: https://github.com/llvm/llvm-project/issues/50157

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D129735
2023-02-22 04:03:22 -08:00
David Green
c33fd3b47f [AArch64] Lower all fp zero buildvectors through BUILD_VECTOR.
Just like with integers, we can treat zero fp buildvector as legal so that they
can be recognized in tablegen patterns using immAllZerosV.
2023-02-22 11:26:41 +00:00
Manolis Tsamis
16a6cf6a99 [RISCV] Add vendor-defined XTheadSync (Multi-core synchronization) extension
The vendor-defined XTheadSync (no comparable standard extension exists
at the time of writing) extension adds multi-core synchronization
instructions.

It is supported by the C9xx cores (e.g., found in the wild in the
Allwinner D1) by Alibaba T-Head.

The current (as of this commit) public documentation for this
extension is available at:
  https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf

Support for these instructions has already landed in GNU Binutils:
  https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=547c18d9bb95571261dbd17f4767194037eb82bd

Depends on D144496

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144501
2023-02-22 11:15:40 +01:00
Samuel Parker
28ee604071 [WebAssembly] pmin/pmax fixes
Reverse the operand ordering to ? rhs : lhs.

Differential Revision: https://reviews.llvm.org/D144466
2023-02-22 10:02:16 +00:00
Manolis Tsamis
f5b484c56f [RISCV] Add vendor-defined XTheadCmo (Cache Management Operations) extension
The vendor-defined XTHeadCmo (there are some similarities with the
Zicbom standard extension) extension adds cache management instructions.

It is supported by the C9xx cores (e.g., found in the wild in the
Allwinner D1) by Alibaba T-Head.

The current (as of this commit) public documentation for this
extension is available at:
  https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf

Support for these instructions has already landed in GNU Binutils:
  https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a9ba8bc2d396fb8ae2b892f3bc6be8cdfe4b555c

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144496
2023-02-22 10:57:48 +01:00
Ricardo Jesus
272bd573dc [AArch64] Fix abs(sub nsw) -> absd
This partially reverts a regression introduced in 8f25e382c5b1 for
AArch64 targets. In particular, we restore the logic of `(abs (sub nsw
x, y)) -> abds(x, y)` for all targets except X86, which keeps the logic
introduced in 8f25e382c5b1. See also https://reviews.llvm.org/D142288.

Differential Revision: https://reviews.llvm.org/D144379
2023-02-22 09:17:25 +00:00
Jun Ma
e9d7f96a11 [WebAssembly] Add more combine pattern for vector shift
After change with D144169, the codegen generates redundant instructions
like and and wrap. This fixes it.

Differential Revision: https://reviews.llvm.org/D144360
2023-02-22 09:53:00 +08:00
Ting Wang
00ed95c3a2 [PowerPC][NFC] add const-splat-array-init.ll
Add test case and will show combiner can improve these.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D144235
2023-02-21 20:24:12 -05:00
Krzysztof Parzyszek
a069eda1ba [Hexagon] Improve selection algorithm in HvxSelector::select
The previous algorithm could order nodes incorrectly, this one strictly
follows the topological order.
2023-02-21 12:56:33 -08:00
Michal Paszkowski
b8435e392c [SPIR-V] Emit spv_undef intrinsic for aggregate undef operands
This change adds a new spv_undef intrinsic which is emitted in place of
aggregate undef operands and later selected to single OpUndef SPIR-V
instruction. The behavior matches that of Khronos SPIR-V Translator and
should support nested aggregates.

Differential Revision: https://reviews.llvm.org/D143107
2023-02-21 21:17:33 +01:00
David Green
afa557fad6 [AArch64] Add a test for loading into a zerovector. NFC 2023-02-21 14:42:53 +00:00
Jessica Del
c9fd858172 [AMDGPU] MIR-Tests for Multiplication using KBA
These tests show inefficient behavior that will be optimized by a
later change.

By using Known Bits Analysis, we can avoid unnecessary multiplications
or additions with 0.
2023-02-21 14:47:56 +01:00
Manolis Tsamis
bbb58a2302 [RISCV] Add vendor-defined XTheadMemPair (two-GPR Memory Operations) extension
The vendor-defined XTHeadMemPair (no comparable standard extension exists
at the time of writing) extension adds two-GPR load/store pair instructions.

It is supported by the C9xx cores (e.g., found in the wild in the
Allwinner D1) by Alibaba T-Head.

The current (as of this commit) public documentation for this
extension is available at:
  https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf

Support for these instructions has already landed in GNU Binutils:
  https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=6e17ae625570ff8f3c12c8765b8d45d4db8694bd

Depends on D143847

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144002
2023-02-21 12:21:49 +01:00
Luke Lau
b486246135 [RISCV] Use a smaller VL when interleaving fixed vectors
Interleaves generated with vwaddu.vv and vwmaccu.vx were using VLs that
were twice the number of elements actually needed in the vector.
This also pulls the interleaving logic out into its own function so it
can be reused by later patches, and adapts it for scalable vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144386
2023-02-21 09:46:23 +00:00
pvanhout
8e68c12045 [AMDGPU] Remove function with incompatible features
Adds a new pass that removes functions
if they use features that are not supported on the current GPU.

This change is aimed at preventing crashes when building code at O0 that
uses idioms such as `if (ISA_VERSION >= N) intrinsic_a(); else intrinsic_b();`
where ISA_VERSION is not constexpr, and intrinsic_a is not selectable
on older targets.
This is a pattern that's used all over the ROCm device libs. The main
motive behind this change is to allow code using ROCm device libs
to be built at O0.

Note: the feature checking logic is done ad-hoc in the pass. There is no other
pass that needs (or will need in the foreseeable future) to do similar
feature-checking logic so I did not see a need to generalize the feature
checking logic yet. It can (and should probably) be generalized later and
moved to a TargetInfo-like class or helper file.

Reviewed By: arsenm, Joe_Nash

Differential Revision: https://reviews.llvm.org/D139000
2023-02-21 10:42:39 +01:00
Kazu Hirata
250620ab19 [X86] Precommit a test
This patch precommits a test for:

https://github.com/llvm/llvm-project/issues/60374
2023-02-21 00:01:43 -08:00
Jessica Del
959216f9b1 [AMDGPU] MIR-Tests for Multiplication using KBA
These tests show inefficient behavior that will be optimized by a
later change.

By using Known Bits Analysis, we can avoid unnecessary multiplications
or additions with 0.
2023-02-21 08:41:56 +01:00
Konstantina Mitropoulou
a0e258da19 [AMDGPU] Add tests for future commit
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D144312
2023-02-20 21:36:25 -08:00
esmeyi
fd226142fc [AIX] Lower some memory intrinsics to millicode functions on AIX
Summary: Currently we lower MEMCPY/MEMMOVE/MEMSET/BZERO to the corresponding libc functions. And the libc functions call the millicode functions on AIX. We can lower these intrinsics directly to save one call layer.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D143997
2023-02-20 22:25:49 -05:00
Kazu Hirata
fd5d92e622 [X86] Precommit a test
This is for:

https://github.com/llvm/llvm-project/issues/60854
2023-02-20 17:00:03 -08:00
Kazu Hirata
a942a94424 [X86] Improve (select carry C1+1 C1)
Without this patch:

  return X < 4 ? 3 : 2;

  return X < 9 ? 7 : 6;

are compiled as:

  31 c0                   xor    %eax,%eax
  83 ff 04                cmp    $0x4,%edi
  0f 93 c0                setae  %al
  83 f0 03                xor    $0x3,%eax

  31 c0                   xor    %eax,%eax
  83 ff 09                cmp    $0x9,%edi
  0f 92 c0                setb   %al
  83 c8 06                or     $0x6,%eax

respectively.  With this patch, we generate:

  31 c0                   xor    %eax,%eax
  83 ff 04                cmp    $0x4,%edi
  83 d0 02                adc    $0x2,%eax

  31 c0                   xor    %eax,%eax
  83 ff 04                cmp    $0x4,%edi
  83 d0 02                adc    $0x2,%eax

respectively, saving 3 bytes while reducing the tree height.

This patch recognizes the equivalence of OR and ADD
(if bits do not overlap) and delegates to combineAddOrSubToADCOrSBB
for further processing.  The same applies to the equivalence of XOR
and SUB.

Differential Revision: https://reviews.llvm.org/D143838
2023-02-20 16:38:21 -08:00
Luo, Yuanke
c09e224c25 [X86] Add test case that clobber base pointer register. 2023-02-21 07:49:51 +08:00
Brad Smith
4b09cb2b16 [PowerPC] Correctly use ELFv2 ABI on all OS's that use the ELFv2 ABI
Add a member function isPPC64ELFv2ABI() to determine what ABI is used on the
64-bit PowerPC big endian operating environment.

Reviewed By: nemanjai, dim, pkubaj

Differential Revision: https://reviews.llvm.org/D144321
2023-02-20 18:11:24 -05:00
Tiwari Abhinav Ashok Kumar
bfb1559fbe [NFC] Fix missing colon in CHECK directives
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D144412
2023-02-21 00:13:04 +05:30
Ricardo Jesus
4077e8ab2f [AArch64] Add tests for saba (NFC)
Tests in sve-saba.ll currently exhibit inefficient codegen.

Differential Revision: https://reviews.llvm.org/D144399
2023-02-20 17:41:00 +00:00
David Green
c6c6723189 [AArch64] More consistently use buildvector for zero and all-ones constants
The AArch64 backend will use legal BUILDVECTORs for zero vectors or all-ones
vectors, so during selection tablegen patterns get rely on immAllZerosV and
immAllOnesV pattern frags in patterns like vnot. It was not always consistent
though, which this patch attempt to fix by recognizing where constant splat +
insert vector element is used. The main outcome of this will be that full
vector movi v0.2d, #0000000000000000 will be used as opposed to movi d0, #0, as
per https://reviews.llvm.org/D53579. This helps simplify what tablegen will
see, to make pattern matching simpler.

Differential Revision: https://reviews.llvm.org/D144018
2023-02-20 14:13:53 +00:00
Kerry McLaughlin
028c722ac8 [SME2][AArch64] Add multi-multi multiply-add long long intrinsics
Adds intrinsics for the following SME2 instructions (2 & 4 vectors):
 - smlall
 - smlsll
 - umlall
 - umlsll
 - usmlall

NOTE: These intrinsics are still in development and are subject to future changes.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D143277
2023-02-20 14:01:56 +00:00