llvm-project

Author	SHA1	Message	Date
Vikram	64fc892cda	[AMDGPU] Autogenerate carryout-selection.ll, uaddo.ll, usubo.ll (NFC) Differential Revision: https://reviews.llvm.org/D143987	2023-02-24 02:03:56 -05:00
Min-Yih Hsu	058f7449cf	[M68k] Provide exception pointer and selector registers Using d0 for exception pointer and d1 for selector, as suggested by GCC.	2023-02-23 16:25:34 -08:00
Manolis Tsamis	7b79e8d455	[RISCV] Add vendor-defined XTheadFMemIdx (FP Indexed Memory Operations) extension The vendor-defined XTHeadFMemIdx (no comparable standard extension exists at the time of writing) extension adds indexed load/store instructions for floating-point registers. It is supported by the C9xx cores (e.g., found in the wild in the Allwinner D1) by Alibaba T-Head. The current (as of this commit) public documentation for this extension is available at: https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf Support for these instructions has already landed in GNU Binutils: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=f511f80fa3fcaf6bcbe727fb902b8bd5ec8f9c20 Depends on D144249 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144647	2023-02-24 00:35:37 +01:00
Manolis Tsamis	f6262201d8	[RISCV] Add vendor-defined XTheadMemIdx (Indexed Memory Operations) extension The vendor-defined XTHeadMemIdx (no comparable standard extension exists at the time of writing) extension adds indexed load/store instructions as well as load/store and update register instructions. It is supported by the C9xx cores (e.g., found in the wild in the Allwinner D1) by Alibaba T-Head. The current (as of this commit) public documentation for this extension is available at: https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf Support for these instructions has already landed in GNU Binutils: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=27cfd142d0a7e378d19aa9a1278e2137f849b71b Depends on D144002 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144249	2023-02-24 00:17:58 +01:00
Samuel Parker	f48d3b6f46	Revert "[DAGCombine] Fold redundant select" This reverts commit c7f9344d0f8f6a00adab138037e2e7b406ef2b69.	2023-02-23 17:59:41 +00:00
Craig Topper	230e61658b	[LegalizeTypes] Add a special case for (add X, 1) to ExpandIntRes_ADDSUB. On targets without ADDCARRY or ADDE, we need to emit a separate SETCC to determine carry from the low half to the high half. Usually we do (setult Lo, LHSLo). If RHSLo is 1 we can instead do (seteq Lo, 0). This can reduce the live range of LHSLo.	2023-02-23 09:47:42 -08:00
Craig Topper	2fc5a5117c	[LegalizeTypes][RISCV] Add a special case to ExpandIntRes_UADDSUBO for (uaddo X, 1). On targets that lack ADDCARRY support we split a wide uaddo into an ADD and a SETCC that both need to be split. For (uaddo X, 1) we can observe that when the add overflows the result will be 0. We can emit (seteq (or Lo, Hi), 0) to detect this. This improves D142071. There is an alternative here. We could use either ~(lo(X) & hi(X)) == 0 or (lo(X) & hi(X)) == -1 before the addition. That would be closer to the code before D142071. Reviewed By: liaolucy Differential Revision: https://reviews.llvm.org/D144614	2023-02-23 09:16:54 -08:00
Luke Lau	8d15e7275f	[RISCV] Lower interleave and deinterleave intrinsics Lower the two intrinsics introduced in D141924. These intrinsics can be combined with loads and stores into the much more efficient segmented load and store instructions in a following patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D144092	2023-02-23 16:23:02 +00:00
Kerry McLaughlin	6c82d16d60	[SME2][AArch64] Add multi-vector rounding shift left intrinsics Adds intrinsics for the following SME2 instructions: - srshl (single, 2 & 4 vector) - srshl (multi, 2 & 4 vector) - urshl (single, 2 & 4 vector) - urshl (multi, 2 & 4 vector) NOTE: These intrinsics are still in development and are subject to future changes. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D144118	2023-02-23 14:33:33 +00:00
Piotr Sobczak	ab174c57f4	[AMDGPU] Add more tests for buffer intrinsics Add more tests for buffer intrinsics with large voffsets.	2023-02-23 14:39:12 +01:00
Mirko Brkusanin	926746d22a	[AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions If more registers are needed for VAddr then the NSA format allows then the final register can act as a contigous set of remaining addresses. Update legalizer to pack register for this new format and allow instruction selection to use NSA encoding when number of addresses exceeds max size. Also update SIShrinkInstructions to handle partial NSA. Differential Revision: https://reviews.llvm.org/D144034	2023-02-23 13:33:34 +01:00
Diana Picus	da629d3381	[AMDGPU] Add GISel RUN lines to 2 existing tests. NFC This adds a bit of coverage for GlobalISel. Differential Revision: https://reviews.llvm.org/D144555	2023-02-23 09:46:54 +01:00
Piotr Sobczak	1b9b4f3bfa	[AMDGPU][NFC] Convert llvm.amdgcn tests to autogen	2023-02-23 08:21:12 +01:00
Yeting Kuo	419948fe67	[VP] Reorder is_int_min_poison/is_zero_poison operand before mask for vp.abs/ctlz/cttz. The patch ensures last two operands of vp.abs/ctlz/cttz are mask and evl. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144536	2023-02-23 13:58:21 +08:00
Leonard Chan	cdb9a0c086	[MC][CodeGen] Define R_RISCV_PLT32 and lower dso_local_equivalent to it This introduces R_RISCV_PLT32, PC-relative data relocation that takes the 32-bit relative offset to a function or its PLT entry from its relocation location. This is needed to support relative vtables on RISCV. Github PR: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/363 The lld handling of this reloc is D143115. Differential Revision: https://reviews.llvm.org/D143226	2023-02-23 01:26:27 +00:00
David Green	74b67e53c6	[LSR] Fix incorrect check in 73cd3d4391ad47ae7 I missed that the test needed a icelake-server cpu to fail, and left a testing "false &&" in the if condition. Hopefully this is now the correct fix.	2023-02-22 23:42:21 +00:00
David Green	73cd3d4391	[LSR] Prevent creating SCEVs of addrecs from mismatching loops LSR can include Regs of AddRec SCEVs from different loops, which do not combine well when added in Scalar Evolution. As they should never produce constant differences so we can just guard against trying to create them. Fixes #60927	2023-02-22 22:50:37 +00:00
Cameron McInally	af4c4f4e21	[DAGCombine] Fix an ICE in combineMinNumMaxNum(...) 65420c8041f4 introduced an ICE in combineMinNumMaxNum(...) when combineMinNumMaxNumImpl(...) returns an SDValue(). Make sure to check that a value is returned before trying to perform an FNEG on it. GitHub Issue: #60924 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144571	2023-02-22 11:00:51 -08:00
Manolis Tsamis	a6446668a3	[RISCV] XTHeadMemPair: Fix invalid mempair combine for types other than i32/i64 A mistake in the control flow of performMemPairCombine resulted in paired loads/stores for types that were not supported by the instructions (i8/i16). These loads/stores could not match the constraints of the patterns defined in the THead td file and the compiler would throw a 'Cannot select' error. This is now fixed and two new test functions have been added in xtheadmempair.ll which would previously crash the compiler. The compiler was additionally tested with a wide range of benchmarks and no issues were observed. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144559	2023-02-22 19:57:37 +01:00
Konstantina Mitropoulou	944f429b21	[AMDGPU] Improve the lowering of raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16} intrinsics Currently, raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16} intrinsics are lowered as buffer_load_{u8,u16}. This patch combines buffer_load_{u8,u16} and sign extension instructions in order to generate buffer_load_{i8,i16} instructions. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D144313	2023-02-22 09:01:33 -08:00
Joe Nash	80a8e6805a	[AMDGPU] Don't set src mods on permlane16 v_permlane16_b32 and v_permlanex16_b32 should not set abs and neg src modifiers on any input, but they can set op_sel on src0 or src1 to represent fi or bc when desired. The ISel patterns were setting the src_modifier bits to -1, effectively setting abs and neg as well, whenever it was intended to set op_sel, due to an error in ISel. ISel should now correctly only set the op_sel bits. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144519	2023-02-22 11:41:52 -05:00
Jessica Del	fc672b6a8b	[AMDGPU] Improved wide multiplies These checks show optimized instructions if an operand is known to be (partially) zero. Change-Id: Ie2f6d0d3ee9d5b279d1f4c1dd0787492e39cc77a Differential Revision: https://reviews.llvm.org/D140208	2023-02-22 16:39:06 +01:00
Jay Foad	e2eee902a4	[AMDGPU] Fix an assertion failure when folding into src2 of V_FMAC_F16 D139469 "[AMDGPU] Enable OMod on more VOP3 instructions" caused an assertion failure when trying to fold into src2 of V_FMAC_F16. It would temporarily convert the instruction to V_FMA_F16_gfx9 and add an opsel operand, but if the fold still failed then it would forget to remove the opsel operand. Differential Revision: https://reviews.llvm.org/D144558	2023-02-22 14:26:03 +00:00
Piyou Chen	3b8c0b342e	[RISCV] Add new pass to transform undef to pseudo for vector values. RISC-V vector instruction has register overlapping constraint for certain instructions, and will cause illegal instruction trap if violated, we use early clobber to model this constraint, but it can't prevent register allocator allocated same or overlapped if the input register is undef value, so convert IMPLICIT_DEF to temporary pseudo could prevent that happen, it's not best way to resolve this. Ideally we should model the constraint right, but before we model the constraint right, it's the approach to prevent that happen. See also: https://github.com/llvm/llvm-project/issues/50157 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D129735	2023-02-22 04:03:22 -08:00
David Green	c33fd3b47f	[AArch64] Lower all fp zero buildvectors through BUILD_VECTOR. Just like with integers, we can treat zero fp buildvector as legal so that they can be recognized in tablegen patterns using immAllZerosV.	2023-02-22 11:26:41 +00:00
Manolis Tsamis	16a6cf6a99	[RISCV] Add vendor-defined XTheadSync (Multi-core synchronization) extension The vendor-defined XTheadSync (no comparable standard extension exists at the time of writing) extension adds multi-core synchronization instructions. It is supported by the C9xx cores (e.g., found in the wild in the Allwinner D1) by Alibaba T-Head. The current (as of this commit) public documentation for this extension is available at: https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf Support for these instructions has already landed in GNU Binutils: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=547c18d9bb95571261dbd17f4767194037eb82bd Depends on D144496 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144501	2023-02-22 11:15:40 +01:00
Samuel Parker	28ee604071	[WebAssembly] pmin/pmax fixes Reverse the operand ordering to ? rhs : lhs. Differential Revision: https://reviews.llvm.org/D144466	2023-02-22 10:02:16 +00:00
Manolis Tsamis	f5b484c56f	[RISCV] Add vendor-defined XTheadCmo (Cache Management Operations) extension The vendor-defined XTHeadCmo (there are some similarities with the Zicbom standard extension) extension adds cache management instructions. It is supported by the C9xx cores (e.g., found in the wild in the Allwinner D1) by Alibaba T-Head. The current (as of this commit) public documentation for this extension is available at: https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf Support for these instructions has already landed in GNU Binutils: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a9ba8bc2d396fb8ae2b892f3bc6be8cdfe4b555c Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144496	2023-02-22 10:57:48 +01:00
Ricardo Jesus	272bd573dc	[AArch64] Fix abs(sub nsw) -> absd This partially reverts a regression introduced in 8f25e382c5b1 for AArch64 targets. In particular, we restore the logic of `(abs (sub nsw x, y)) -> abds(x, y)` for all targets except X86, which keeps the logic introduced in 8f25e382c5b1. See also https://reviews.llvm.org/D142288. Differential Revision: https://reviews.llvm.org/D144379	2023-02-22 09:17:25 +00:00
Jun Ma	e9d7f96a11	[WebAssembly] Add more combine pattern for vector shift After change with D144169, the codegen generates redundant instructions like and and wrap. This fixes it. Differential Revision: https://reviews.llvm.org/D144360	2023-02-22 09:53:00 +08:00
Ting Wang	00ed95c3a2	[PowerPC][NFC] add const-splat-array-init.ll Add test case and will show combiner can improve these. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D144235	2023-02-21 20:24:12 -05:00
Krzysztof Parzyszek	a069eda1ba	[Hexagon] Improve selection algorithm in HvxSelector::select The previous algorithm could order nodes incorrectly, this one strictly follows the topological order.	2023-02-21 12:56:33 -08:00
Michal Paszkowski	b8435e392c	[SPIR-V] Emit spv_undef intrinsic for aggregate undef operands This change adds a new spv_undef intrinsic which is emitted in place of aggregate undef operands and later selected to single OpUndef SPIR-V instruction. The behavior matches that of Khronos SPIR-V Translator and should support nested aggregates. Differential Revision: https://reviews.llvm.org/D143107	2023-02-21 21:17:33 +01:00
David Green	afa557fad6	[AArch64] Add a test for loading into a zerovector. NFC	2023-02-21 14:42:53 +00:00
Jessica Del	c9fd858172	[AMDGPU] MIR-Tests for Multiplication using KBA These tests show inefficient behavior that will be optimized by a later change. By using Known Bits Analysis, we can avoid unnecessary multiplications or additions with 0.	2023-02-21 14:47:56 +01:00
Manolis Tsamis	bbb58a2302	[RISCV] Add vendor-defined XTheadMemPair (two-GPR Memory Operations) extension The vendor-defined XTHeadMemPair (no comparable standard extension exists at the time of writing) extension adds two-GPR load/store pair instructions. It is supported by the C9xx cores (e.g., found in the wild in the Allwinner D1) by Alibaba T-Head. The current (as of this commit) public documentation for this extension is available at: https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf Support for these instructions has already landed in GNU Binutils: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=6e17ae625570ff8f3c12c8765b8d45d4db8694bd Depends on D143847 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144002	2023-02-21 12:21:49 +01:00
Luke Lau	b486246135	[RISCV] Use a smaller VL when interleaving fixed vectors Interleaves generated with vwaddu.vv and vwmaccu.vx were using VLs that were twice the number of elements actually needed in the vector. This also pulls the interleaving logic out into its own function so it can be reused by later patches, and adapts it for scalable vectors. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144386	2023-02-21 09:46:23 +00:00
pvanhout	8e68c12045	[AMDGPU] Remove function with incompatible features Adds a new pass that removes functions if they use features that are not supported on the current GPU. This change is aimed at preventing crashes when building code at O0 that uses idioms such as `if (ISA_VERSION >= N) intrinsic_a(); else intrinsic_b();` where ISA_VERSION is not constexpr, and intrinsic_a is not selectable on older targets. This is a pattern that's used all over the ROCm device libs. The main motive behind this change is to allow code using ROCm device libs to be built at O0. Note: the feature checking logic is done ad-hoc in the pass. There is no other pass that needs (or will need in the foreseeable future) to do similar feature-checking logic so I did not see a need to generalize the feature checking logic yet. It can (and should probably) be generalized later and moved to a TargetInfo-like class or helper file. Reviewed By: arsenm, Joe_Nash Differential Revision: https://reviews.llvm.org/D139000	2023-02-21 10:42:39 +01:00
Kazu Hirata	250620ab19	[X86] Precommit a test This patch precommits a test for: https://github.com/llvm/llvm-project/issues/60374	2023-02-21 00:01:43 -08:00
Jessica Del	959216f9b1	[AMDGPU] MIR-Tests for Multiplication using KBA These tests show inefficient behavior that will be optimized by a later change. By using Known Bits Analysis, we can avoid unnecessary multiplications or additions with 0.	2023-02-21 08:41:56 +01:00
Konstantina Mitropoulou	a0e258da19	[AMDGPU] Add tests for future commit Reviewed By: foad Differential Revision: https://reviews.llvm.org/D144312	2023-02-20 21:36:25 -08:00
esmeyi	fd226142fc	[AIX] Lower some memory intrinsics to millicode functions on AIX Summary: Currently we lower MEMCPY/MEMMOVE/MEMSET/BZERO to the corresponding libc functions. And the libc functions call the millicode functions on AIX. We can lower these intrinsics directly to save one call layer. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D143997	2023-02-20 22:25:49 -05:00
Kazu Hirata	fd5d92e622	[X86] Precommit a test This is for: https://github.com/llvm/llvm-project/issues/60854	2023-02-20 17:00:03 -08:00
Kazu Hirata	a942a94424	[X86] Improve (select carry C1+1 C1) Without this patch: return X < 4 ? 3 : 2; return X < 9 ? 7 : 6; are compiled as: 31 c0 xor %eax,%eax 83 ff 04 cmp $0x4,%edi 0f 93 c0 setae %al 83 f0 03 xor $0x3,%eax 31 c0 xor %eax,%eax 83 ff 09 cmp $0x9,%edi 0f 92 c0 setb %al 83 c8 06 or $0x6,%eax respectively. With this patch, we generate: 31 c0 xor %eax,%eax 83 ff 04 cmp $0x4,%edi 83 d0 02 adc $0x2,%eax 31 c0 xor %eax,%eax 83 ff 04 cmp $0x4,%edi 83 d0 02 adc $0x2,%eax respectively, saving 3 bytes while reducing the tree height. This patch recognizes the equivalence of OR and ADD (if bits do not overlap) and delegates to combineAddOrSubToADCOrSBB for further processing. The same applies to the equivalence of XOR and SUB. Differential Revision: https://reviews.llvm.org/D143838	2023-02-20 16:38:21 -08:00
Luo, Yuanke	c09e224c25	[X86] Add test case that clobber base pointer register.	2023-02-21 07:49:51 +08:00
Brad Smith	4b09cb2b16	[PowerPC] Correctly use ELFv2 ABI on all OS's that use the ELFv2 ABI Add a member function isPPC64ELFv2ABI() to determine what ABI is used on the 64-bit PowerPC big endian operating environment. Reviewed By: nemanjai, dim, pkubaj Differential Revision: https://reviews.llvm.org/D144321	2023-02-20 18:11:24 -05:00
Tiwari Abhinav Ashok Kumar	bfb1559fbe	[NFC] Fix missing colon in CHECK directives Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D144412	2023-02-21 00:13:04 +05:30
Ricardo Jesus	4077e8ab2f	[AArch64] Add tests for saba (NFC) Tests in sve-saba.ll currently exhibit inefficient codegen. Differential Revision: https://reviews.llvm.org/D144399	2023-02-20 17:41:00 +00:00
David Green	c6c6723189	[AArch64] More consistently use buildvector for zero and all-ones constants The AArch64 backend will use legal BUILDVECTORs for zero vectors or all-ones vectors, so during selection tablegen patterns get rely on immAllZerosV and immAllOnesV pattern frags in patterns like vnot. It was not always consistent though, which this patch attempt to fix by recognizing where constant splat + insert vector element is used. The main outcome of this will be that full vector movi v0.2d, #0000000000000000 will be used as opposed to movi d0, #0, as per https://reviews.llvm.org/D53579. This helps simplify what tablegen will see, to make pattern matching simpler. Differential Revision: https://reviews.llvm.org/D144018	2023-02-20 14:13:53 +00:00
Kerry McLaughlin	028c722ac8	[SME2][AArch64] Add multi-multi multiply-add long long intrinsics Adds intrinsics for the following SME2 instructions (2 & 4 vectors): - smlall - smlsll - umlall - umlsll - usmlall NOTE: These intrinsics are still in development and are subject to future changes. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D143277	2023-02-20 14:01:56 +00:00

1 2 3 4 5 ...

47118 Commits