llvm-project

Author	SHA1	Message	Date
David Green	8a701024f3	[ARM] Lower i1 concat via MVETRUNC The MVETRUNC operation can perform the same truncate of two vectors, without requiring lane inserts/extracts from every vector lane. This moves the concat i1 lowering to use it for v8i1 and v16i1 result types, trading a bit of extra stack space for less instructions.	2023-10-18 19:40:11 +01:00
Stanislav Mekhanoshin	84f398af74	[AMDGPU] Add missing test checks. NFC. (#69484 )	2023-10-18 11:26:39 -07:00
Ilya Leoshkevich	8e810dc7d9	[SystemZ] Support builtin_{frame,return}_address() with non-zero argument (#69405 ) When the code is built with -mbackchain, it is possible to retrieve the caller's frame and return addresses. GCC already can do this, add this support to Clang as well. Use RISCVTargetLowering and GCC's s390_return_addr_rtx() as inspiration. Add tests based on what GCC is emitting.	2023-10-18 19:05:31 +02:00
Stanislav Mekhanoshin	47ed921985	[AMDGPU] Add legality check when folding short 64-bit literals (#69391 ) We can only fold it if it can fit into 32-bit. I believe it did not trigger yet because we do not select 64-bit literals generally.	2023-10-18 09:22:23 -07:00
Sirish Pande	28e4f97320	[AMDGPU] Save/Restore SCC bit across waterfall loop. (#68363 ) Waterfall loop is overwriting SCC bit of status register. Make sure SCC bit is saved and restored across. We need to save/restore only in cases where SCC is live across waterfall loop. Co-authored-by: Sirish Pande <sirish.pande@amd.com>	2023-10-18 08:43:29 -05:00
David Green	c060757bcc	[ARM] Correct v2i1 concat extract types. For two v2i1 concat into a v4i1, we cannot extract each i64 element as an i32. This casts to a v4i32 instead and extracts the correct vector lanes.	2023-10-18 13:40:38 +01:00
pvanhout	868abf0961	Revert "[AMDGPU] Remove Code Object V3 (#67118 )" This reverts commit 544d91280c26fd5f7acd70eac4d667863562f4cc.	2023-10-18 12:55:36 +02:00
Jay Foad	104db26004	[AMDGPU] Fix image intrinsic optimizer on loads from different resources (#69355 ) The image intrinsic optimizer pass was neglecting to check any arguments of the load intrinsic after the VAddr arguments. For example multiple loads from different resources should not have been combined but were, because the pass was not checking the resource argument.	2023-10-18 11:08:01 +01:00
Paul Walker	675231eb09	[SVE ACLE] Allow default zero initialisation for svcount_t. (#69321 ) This matches the behaviour of the other SVE ACLE types.	2023-10-18 10:40:07 +01:00
Amara Emerson	e93bddb287	[AArch64][GlobalISel] Precommit indexed sextload/zextload tests.	2023-10-18 00:23:20 -07:00
Shao-Ce SUN	f48dab5237	Add RV64 constraint to SRLIW (#69416 ) Fixes #69408	2023-10-18 15:01:17 +08:00
Noah Goldstein	112e49b381	[DAGCombiner] Transform `(icmp eq/ne (and X,C0),(shift X,C1))` to use rotate or to getter constants. If `C0` is a mask and `C1` shifts out all the masked bits (to essentially compare two subsets of `X`), we can arbitrarily re-order shift as `srl` or `shl`. If `C1` (shift amount) is a power of 2, we can replace the and+shift with a rotate. Otherwise, based on target preference we can arbitrarily swap `shl` and `shl` in/out to get better constants. On x86 we can use this re-ordering to: 1) get better `and` constants for `C0` (zero extended moves or avoid imm64). 2) covert `srl` to `shl` if `shl` will be implementable with `lea` or `add` (both of which can be preferable). Proofs: https://alive2.llvm.org/ce/z/qzGM_w Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D152116	2023-10-18 01:16:55 -05:00
Noah Goldstein	0c2d28a448	[X86] Add tests for transform `(icmp eq/ne (and X, C0), (shift X, C1))`; NFC Differential Revision: https://reviews.llvm.org/D152115	2023-10-18 01:16:55 -05:00
Pierre van Houtryve	c464fea779	[DAG] Constant fold FMAD (#69324 ) This has very little effect on codegen in practice, but is a nice to have I think. See #68315	2023-10-18 07:46:24 +02:00
Kai Luo	b42738805a	[PowerPC] Auto gen test checks for #69299 . NFC.	2023-10-18 02:21:22 +00:00
Nitin John Raj	ae3ba725b7	[RISCV][GlobalISel] Select G_FRAME_INDEX (#68254 ) This patch is a bandage to get G_FRAME_INDEX working. We could import the SelectionDAG patterns for the ComplexPattern FrameAddrRegImm, and perhaps we will do that in the future. For now we just select it as an addition with 0.	2023-10-17 17:56:42 -07:00
Mircea Trofin	ab91e05e48	[mlgo] Fix tests post 760e7d0	2023-10-17 12:19:54 -07:00
Artem Belevich	b33723710f	[NVPTX] Fixed few more corner cases for v4i8 lowering. (#69263 ) Fixes https://github.com/llvm/llvm-project/issues/69124	2023-10-17 11:06:11 -07:00
Stanislav Mekhanoshin	a22a1fe151	[AMDGPU] support 64-bit immediates in SIInstrInfo::FoldImmediate (#69260 ) This is a part of https://github.com/llvm/llvm-project/issues/67781. Until we select more 64-bit move immediates the impact is minimal.	2023-10-17 10:53:22 -07:00
David Green	4266815f4d	[AArch64] Convert negative constant aarch64_neon_sshl to VASHR (#68918 ) In replacing shifts by splat with constant shifts, we can handle negative shifts by flipping the sign and using a VASHR or VLSHR.	2023-10-17 18:41:23 +01:00
David Green	658ed58de6	[AArch64] Add additional tests for fptosi/fptoui. NFC	2023-10-17 18:39:37 +01:00
akirchhoff-modular	4480e650b3	[YAMLParser] Improve plain scalar spec compliance (#68946 ) The `YAMLParser.h` header file claims support for YAML 1.2 with a few deviations, but our plain scalar parsing failed to parse some valid YAML according to the spec. This change puts us more in compliance with the YAML spec, now letting us parse plain scalars containing additional special characters in cases where they are not ambiguous.	2023-10-17 11:28:14 -06:00
Guozhi Wei	760e7d00d1	[X86, Peephole] Enable FoldImmediate for X86 Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate. Also enhanced peephole by deleting identical instructions after FoldImmediate. Differential Revision: https://reviews.llvm.org/D151848	2023-10-17 16:22:42 +00:00
Vladislav Dzhidzhoev	abd0d5d262	Reland: [AArch64][GlobalISel] Adopt dup(load) -> LD1R patterns from SelectionDAG This relands the fb8f59156f0f208f6192ed808fc223eda6c0e7ec and makes isAArch64FrameOffsetLegal function recognize LD1R instructions. Original PR: https://github.com/llvm/llvm-project/pull/66914 PR of the fix: https://github.com/llvm/llvm-project/pull/69003	2023-10-17 17:40:05 +02:00
Phoebe Wang	3d6e4160d5	[X86] Enable bfloat type support in inline assembly constraints (#68469 ) Similar to FP16 but we don't have native scalar instruction support, so limit it to vector types only. Fixes #68149	2023-10-17 22:56:25 +08:00
Weining Lu	b2773d170c	[LoongArch] Precommit a test for atomic cmpxchg optmization	2023-10-17 22:29:51 +08:00
Simon Pilgrim	be9bc54218	[X86] vselect.ll - add vXi8 select-by-constant tests with repeated/broadcastable shuffle mask	2023-10-17 11:34:08 +01:00
Momchil Velikov	bea3684944	[AArch64] Allow only LSL to be folded into addressing mode (#69235 ) There was an error in decoding shift type, which permitted shift types other than LSL to be (incorrectly) folded into the addressing mode of a load/store instruction.	2023-10-17 11:30:14 +01:00
Zhaoxuan Jiang	041a786c78	[AArch64] Fix pairing different types of registers when computing CSRs. (#66642 ) If a function has odd number of same type of registers to save, and the calling convention also requires odd number of such type of CSRs, an FP register would be accidentally marked as saved when producePairRegisters returns true. This patch also fixes the AArch64LowerHomogeneousPrologEpilog pass not handling AArch64::NoRegister; actually this pass must be fixed along with the register pairing so i can write a test for it.	2023-10-16 23:34:04 -07:00
Shao-Ce SUN	5a6ef95a1c	[RISCV][GISel] Add legalizer for G_UMAX, G_UMIN, G_SMAX, G_SMIN (#69150 ) Similar to #67577, Lower G_UMAX, G_UMIN, G_SMAX, G_SMIN.	2023-10-17 10:36:24 +08:00
Jianjian Guan	b0eba8e209	[RISCV] Support STRICT_FP_ROUND and STRICT_FP_EXTEND when only have Zvfhmin (#68559 ) This patch supports STRICT_FP_ROUND and STRICT_FP_EXTEND when we only have Zvfhmin but no Zvfh.	2023-10-17 10:10:19 +08:00
Michael Maitland	c319c74146	[RISCV] Improve performCONCAT_VECTORCombine stride matching If the load ptrs can be decomposed into a common (Base + Index) with a common constant stride, then return the constant stride.	2023-10-16 16:45:26 -07:00
Michael Maitland	30ca258614	[RISCV] Pre-commit concat-vectors-constant-stride.ll This patch commits tests that can be optimized by improving performCONCAT_VECTORCombine to do a better job at decomposing the base pointer and recognizing a constant offset.	2023-10-16 16:45:16 -07:00
Pierre van Houtryve	cc3d2533cc	[AMDGPU] Add i1 mul patterns (#67291 ) i1 muls can sometimes happen after SCEV. They resulted in ISel failures because we were missing the patterns for them. Solves SWDEV-423354	2023-10-16 16:18:27 +02:00
Pierre van Houtryve	4d6fc88946	[AMDGPU] Add patterns for V_CMP_O/U (#69157 ) Fixes SWDEV-427162	2023-10-16 13:07:56 +02:00
Nikita Popov	a72d88fb4f	Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size" This reverts commit 8840da2db237cd714d975c199d5992945d2b71e9. This results in verifier failures during LTO, see #68929.	2023-10-16 12:17:24 +02:00
chuongg3	dad563e3c2	[AArch64][GlobalISel] Add legalization for G_VECREDUCE_MUL (#68398 )	2023-10-16 11:02:03 +01:00
Phoebe Wang	0ddca87b79	[X86][FP16] Do not combine to ADDSUB if target doesn't support FP16 (#69109 ) Fix crash when build code with `-mattr=f16c,fma` or `-mattr=avx512vl`.	2023-10-16 16:27:15 +08:00
Pierre van Houtryve	544d91280c	[AMDGPU] Remove Code Object V3 (#67118 ) V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.	2023-10-16 08:21:48 +02:00
Freddy Ye	819ac45d1c	[X86] Add USER_MSR instructions. (#68944 ) For more details about this instruction, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html	2023-10-16 10:12:53 +08:00
Craig Topper	0ae4622126	[RISCV][GISel] Move variadic-call.ll from call-lowering directory to irtranslator. NFC Keeps it consistent with the other call tests.	2023-10-15 18:16:38 -07:00
Min-Yih Hsu	fd84b1a99d	[M68k] Add new calling convention M68k_RTD `M68k_RTD` is really similar to X86's stdcall, in which callee pops the arguments from stack. In LLVM IR it can be written as `m68k_rtdcc`. This patch also improves how ExpandPseudo Pass handles popping stack at function returns in the absent of the RTD instruction. Differential Revision: https://reviews.llvm.org/D149864	2023-10-15 16:12:31 -07:00
Amara Emerson	1950507212	Revert "Re-apply '[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432 )'" This reverts commit dbb9faedec5e28ab3f584f5e14d31e475ac268ac. This seems to cause miscompiles on CTMark/sqlite3 and others with GISel.	2023-10-15 14:16:37 -07:00
Markus Böck	0ad92c0cbb	[StatepointLowering] Take return attributes of `gc.result` into account (#68439 ) The current lowering of statepoints does not take into account return attributes present on the `gc.result` leading to different code being generated than if one were to not use statepoints. These return attributes can affect the ABI which is why it is important that they are applied in the lowering.	2023-10-14 18:38:18 +02:00
David Green	5e1c2bf3e6	[AArch64][GlobalISel] Expand converage of FMA. This moves the legalization of G_FMA to the action builder that can handle more types. The existing arm64-vfloatintrinsics.ll has been removed as they are covered in other test files.	2023-10-14 13:24:28 +01:00
David Green	a502dddfd0	[AArch64] Additional GISel test for FMA. NFC	2023-10-14 12:34:54 +01:00
LiqinWeng	64e7207ea5	[Test] Pre-submit tests for #68972 (#69040 )	2023-10-14 12:18:43 +08:00
Craig Topper	3750558ee1	[RISCV][GISel] Legalize G_SMULO/G_UMULO (#67635 ) Update `LegalizerHelper::widenScalarMulo` to not create a mulo if we aren't going to use the overflow flag. This prevents needing to legalize the widened operation. This generates better code when we need to make a libcall for multiply.	2023-10-13 20:34:45 -07:00
Amara Emerson	25d93f3f68	NFC: Precommit GISel checks for arm64-indexed-memory.ll	2023-10-13 16:51:39 -07:00
Amara Emerson	2f80dfc079	[GlobalISel][NFC] Add distinct CHECK/SDAG/GISEL run lines to test.	2023-10-13 16:21:52 -07:00

1 2 3 4 5 ...

50419 Commits