llvm-project

Author	SHA1	Message	Date
Vladislav Dzhidzhoev	abd0d5d262	Reland: [AArch64][GlobalISel] Adopt dup(load) -> LD1R patterns from SelectionDAG This relands the fb8f59156f0f208f6192ed808fc223eda6c0e7ec and makes isAArch64FrameOffsetLegal function recognize LD1R instructions. Original PR: https://github.com/llvm/llvm-project/pull/66914 PR of the fix: https://github.com/llvm/llvm-project/pull/69003	2023-10-17 17:40:05 +02:00
Phoebe Wang	3d6e4160d5	[X86] Enable bfloat type support in inline assembly constraints (#68469 ) Similar to FP16 but we don't have native scalar instruction support, so limit it to vector types only. Fixes #68149	2023-10-17 22:56:25 +08:00
Weining Lu	b2773d170c	[LoongArch] Precommit a test for atomic cmpxchg optmization	2023-10-17 22:29:51 +08:00
Simon Pilgrim	be9bc54218	[X86] vselect.ll - add vXi8 select-by-constant tests with repeated/broadcastable shuffle mask	2023-10-17 11:34:08 +01:00
Momchil Velikov	bea3684944	[AArch64] Allow only LSL to be folded into addressing mode (#69235 ) There was an error in decoding shift type, which permitted shift types other than LSL to be (incorrectly) folded into the addressing mode of a load/store instruction.	2023-10-17 11:30:14 +01:00
Zhaoxuan Jiang	041a786c78	[AArch64] Fix pairing different types of registers when computing CSRs. (#66642 ) If a function has odd number of same type of registers to save, and the calling convention also requires odd number of such type of CSRs, an FP register would be accidentally marked as saved when producePairRegisters returns true. This patch also fixes the AArch64LowerHomogeneousPrologEpilog pass not handling AArch64::NoRegister; actually this pass must be fixed along with the register pairing so i can write a test for it.	2023-10-16 23:34:04 -07:00
Shao-Ce SUN	5a6ef95a1c	[RISCV][GISel] Add legalizer for G_UMAX, G_UMIN, G_SMAX, G_SMIN (#69150 ) Similar to #67577, Lower G_UMAX, G_UMIN, G_SMAX, G_SMIN.	2023-10-17 10:36:24 +08:00
Jianjian Guan	b0eba8e209	[RISCV] Support STRICT_FP_ROUND and STRICT_FP_EXTEND when only have Zvfhmin (#68559 ) This patch supports STRICT_FP_ROUND and STRICT_FP_EXTEND when we only have Zvfhmin but no Zvfh.	2023-10-17 10:10:19 +08:00
Michael Maitland	c319c74146	[RISCV] Improve performCONCAT_VECTORCombine stride matching If the load ptrs can be decomposed into a common (Base + Index) with a common constant stride, then return the constant stride.	2023-10-16 16:45:26 -07:00
Michael Maitland	30ca258614	[RISCV] Pre-commit concat-vectors-constant-stride.ll This patch commits tests that can be optimized by improving performCONCAT_VECTORCombine to do a better job at decomposing the base pointer and recognizing a constant offset.	2023-10-16 16:45:16 -07:00
Pierre van Houtryve	cc3d2533cc	[AMDGPU] Add i1 mul patterns (#67291 ) i1 muls can sometimes happen after SCEV. They resulted in ISel failures because we were missing the patterns for them. Solves SWDEV-423354	2023-10-16 16:18:27 +02:00
Pierre van Houtryve	4d6fc88946	[AMDGPU] Add patterns for V_CMP_O/U (#69157 ) Fixes SWDEV-427162	2023-10-16 13:07:56 +02:00
Nikita Popov	a72d88fb4f	Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size" This reverts commit 8840da2db237cd714d975c199d5992945d2b71e9. This results in verifier failures during LTO, see #68929.	2023-10-16 12:17:24 +02:00
chuongg3	dad563e3c2	[AArch64][GlobalISel] Add legalization for G_VECREDUCE_MUL (#68398 )	2023-10-16 11:02:03 +01:00
Phoebe Wang	0ddca87b79	[X86][FP16] Do not combine to ADDSUB if target doesn't support FP16 (#69109 ) Fix crash when build code with `-mattr=f16c,fma` or `-mattr=avx512vl`.	2023-10-16 16:27:15 +08:00
Pierre van Houtryve	544d91280c	[AMDGPU] Remove Code Object V3 (#67118 ) V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.	2023-10-16 08:21:48 +02:00
Freddy Ye	819ac45d1c	[X86] Add USER_MSR instructions. (#68944 ) For more details about this instruction, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html	2023-10-16 10:12:53 +08:00
Craig Topper	0ae4622126	[RISCV][GISel] Move variadic-call.ll from call-lowering directory to irtranslator. NFC Keeps it consistent with the other call tests.	2023-10-15 18:16:38 -07:00
Min-Yih Hsu	fd84b1a99d	[M68k] Add new calling convention M68k_RTD `M68k_RTD` is really similar to X86's stdcall, in which callee pops the arguments from stack. In LLVM IR it can be written as `m68k_rtdcc`. This patch also improves how ExpandPseudo Pass handles popping stack at function returns in the absent of the RTD instruction. Differential Revision: https://reviews.llvm.org/D149864	2023-10-15 16:12:31 -07:00
Amara Emerson	1950507212	Revert "Re-apply '[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432 )'" This reverts commit dbb9faedec5e28ab3f584f5e14d31e475ac268ac. This seems to cause miscompiles on CTMark/sqlite3 and others with GISel.	2023-10-15 14:16:37 -07:00
Markus Böck	0ad92c0cbb	[StatepointLowering] Take return attributes of `gc.result` into account (#68439 ) The current lowering of statepoints does not take into account return attributes present on the `gc.result` leading to different code being generated than if one were to not use statepoints. These return attributes can affect the ABI which is why it is important that they are applied in the lowering.	2023-10-14 18:38:18 +02:00
David Green	5e1c2bf3e6	[AArch64][GlobalISel] Expand converage of FMA. This moves the legalization of G_FMA to the action builder that can handle more types. The existing arm64-vfloatintrinsics.ll has been removed as they are covered in other test files.	2023-10-14 13:24:28 +01:00
David Green	a502dddfd0	[AArch64] Additional GISel test for FMA. NFC	2023-10-14 12:34:54 +01:00
LiqinWeng	64e7207ea5	[Test] Pre-submit tests for #68972 (#69040 )	2023-10-14 12:18:43 +08:00
Craig Topper	3750558ee1	[RISCV][GISel] Legalize G_SMULO/G_UMULO (#67635 ) Update `LegalizerHelper::widenScalarMulo` to not create a mulo if we aren't going to use the overflow flag. This prevents needing to legalize the widened operation. This generates better code when we need to make a libcall for multiply.	2023-10-13 20:34:45 -07:00
Amara Emerson	25d93f3f68	NFC: Precommit GISel checks for arm64-indexed-memory.ll	2023-10-13 16:51:39 -07:00
Amara Emerson	2f80dfc079	[GlobalISel][NFC] Add distinct CHECK/SDAG/GISEL run lines to test.	2023-10-13 16:21:52 -07:00
Yingwei Zheng	53c81a8c16	[RISCV][SDAG] Fix constant narrowing when narrowing loads (#69015 ) When narrowing logic ops(OR/XOR) with constant rhs, `DAGCombiner` will fixup the constant rhs node. It is incorrect when lhs is also a constant. For example, we will incorrectly replace `xor OpaqueConstant:i64<8191>, Constant:i64<-1>` with `xor (and OpaqueConstant:i64<8191>, Constant:i64<65535>), Constant:i64<-1>`. Fixes #68855.	2023-10-14 06:38:17 +08:00
Momchil Velikov	dbb9faedec	Re-apply '[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432 )' This re-applies commit a9d0ab2ee572f179f80483f3ebbbcdd03c3b4481, which was reverted by 8abb2ace888bdd04a1bdb4ac2f2fc25d57a5760a. The issue was fixed by 7510f32f906ab4e583542eae2611b020f88629af	2023-10-13 12:14:22 +01:00
Maurice Heumann	187e02fa2d	[CodeGenPrepare] Check types when unmerging GEPs across indirect branches (#68587 ) The optimization in CodeGenPrepare, where GEPs are unmerged across indirect branches must respect the types of both GEPs and their sizes when adjusting the indices. The sample here shows the bug: https://godbolt.org/z/8e9o5sYPP The value `%elementValuePtr` addresses the second field of the `%struct.Blub`. It is therefore a GEP with index 1 and type i8. The value `%nextArrayElement` addresses the next array element. It is therefore a GEP with index 1 and type `%struct.Blub`. Both values point to completely different addresses, even if the indices are the same, due to the types being different. However, after CodeGenPrepare has run, `%nextArrayElement` is a bitcast from `%elementValuePtr`, meaning both were treated as equal. The cause for this is that the unmerging optimization does not take types into consideration. It sees both GEPs have `%currentArrayElement` as source operand and therefore tries to rewrite `%nextArrayElement` in terms of `%elementValuePtr`. It changes the index to the difference of the two GEPs. As both indices are `1`, the difference is `0`. As the indices are `0` the GEP is later replaced with a simple bitcast in CodeGenPrepare. Before adjusting the indices, the types of the GEPs would have to be aligned and the indices scaled accordingly for the optimization to be correct. Due to the size of the struct being `16` and the `%elementValuePtr` pointing to offset `1`, the correct index for the unmerged `%nextArrayElement` would be 15. I assume this bug emerged from the opaque pointer change as GEPs like `%elementValuePtr` that access the struct field based of type i8 did not naturally occur before. In light of future migration to ptradd, simply not performing the optimization if the types mismatch should be sufficient.	2023-10-13 09:47:47 +02:00
Kai Luo	3104681686	[PowerPC][Atomics] Remove redundant block to clear reservation (#68430 ) This PR is following what https://reviews.llvm.org/D134783 does for quardword CAS.	2023-10-13 10:59:27 +08:00
john-brawn-arm	a574ef6176	[AArch64] Fix incorrect big-endian spill in foldMemoryOperandImpl (#65601 ) When an sreg sub-register of a q register was spilled, AArch64InstrInfo::foldMemoryOperandImpl would emit a spill of a d register, which gives the wrong result when the target is big-endian as the following q register fill will put the value in the top half. Fix this by greatly simplifying the existing code for widening the spill to only handle wzr to xzr widening, as the default result we get if the function returns nullptr is already that a widened spill will be emitted.	2023-10-12 16:10:28 +01:00
Yusra Syeda	6cf41ada44	[SystemZ][z/OS] Add vararg support to z/OS (#68834 ) This PR adds vararg support to z/OS and updates the call-zos-vararg.ll lit test. Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>	2023-10-12 12:42:55 +02:00
Nikita Popov	127ed9ae26	[PowerPC] Use zext instead of anyext in custom and combine (#68784 ) This custom combine currently converts `and(anyext(x),c)` into `anyext(and(x,c))`. This is not correct, because the original expression guaranteed that the high bits are zero, while the new one sets them to undef. Emit `zext(and(x,c))` instead. Fixes https://github.com/llvm/llvm-project/issues/68783.	2023-10-12 09:32:17 +02:00
WANG Xuerui	956482de13	[LoongArch] Support finer-grained DBAR hints for LA664+ (#68787 ) These are treated as DBAR 0 on older uarchs, so we can start to unconditionally emit the new hints right away. Co-authored-by: WANG Rui <wangrui@loongson.cn>	2023-10-12 15:04:51 +08:00
Rahman Lavaee	28b9126879	[BasicBlockSections] Introduce the path cloning profile format to BasicBlockSectionsProfileReader. (#67214 ) Following up on prior RFC (https://lists.llvm.org/pipermail/llvm-dev/2020-September/145357.html) we can now improve above our highly-optimized basic-block-sections binary (e.g., 2% for clang) by applying path cloning. Cloning can improve performance by reducing taken branches. This patch prepares the profile format for applying cloning actions. The basic block cloning profile format extends the basic block sections profile in two ways. 1. Specifies the cloning paths with a 'p' specifier. For example, `p 1 4 5` specifies that blocks with BB ids 4 and 5 must be cloned along the edge 1 --> 4. 2. For each cloned block, it will appear in the cluster info as `<bb_id>.<clone_id>` where `clone_id` is the id associated with this clone. For example, the following profile specifies one cloned block (2) and determines its cluster position as well. ``` f foo p 1 2 c 0 1 2.1 3 2 5 ``` This patch keeps backward-compatibility (retains the behavior for old profile formats). This feature is only introduced for profile version >= 1.	2023-10-11 22:47:13 -07:00
weiguozhi	b6043f9867	[RA] Disable split around hint register if optimize for size (#68619 ) Split a virtual register with hint may generate COPY instructions in multiple cold basic blocks, and increase code size. So disable this split when the function is optimized for size.	2023-10-11 14:57:15 -07:00
Harald van Dijk	8d520973b0	[X86] Use indirect addressing for high 2GB of x32 address space Instructions that take immediate addresses sign-extend their operands, so cannot be used when we actually need zero extension. Use indirect addressing to avoid problems. The functions in the test are a modified versions of the functions by the same names in large-constants.ll, with i64 types changed to i32. Fixes #55061 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124406	2023-10-11 19:20:36 +01:00
chuongg3	d88d9834e9	[AArch64][GlobalISel] Support more types for TRUNC (#66927 ) G_TRUNC will get lowered into trunc(merge(trunc(unmerge), trunc(unmerge))) if the source is larger than 128 bits or the truncation is more than half of the current bit size. Now mirrors ZEXT/SEXT code more closely for vector types.	2023-10-11 16:05:25 +01:00
Anatoly Trosinenko	1d2b558265	[AArch64][PAC] Check authenticated LR value during tail call When performing a tail call, check the value of LR register after authentication to prevent the callee from signing and spilling an untrusted value. This commit implements a few variants of check, more can be added later. If it is safe to assume that executable pages are always readable, LR can be checked just by dereferencing the LR value via LDR. As an alternative, LR can be checked as follows: ; lowered AUT* instruction ; <some variant of check that LR contains a valid address> b.cond break_block ret_block: ; lowered TCRETURN break_block: brk 0xc471 As the existing methods either break the compatibility with execute-only memory mappings or can degrade the performance, they are disabled by default and can be explicitly enabled with a command line option. Individual subtargets can opt-in to use one of the available methods by updating AArch64FrameLowering::getAuthenticatedLRCheckMethod(). Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D156716	2023-10-11 17:38:17 +03:00
Stephen Thomas	720be6c535	[AMDGPU] Add encoding/decoding support for non-result-returning ATOMIC_CSUB instructions (#68684 ) The BUFFER_ATOMIC_CSUB and GLOBAL_ATOMIC_CSUB instructions have encodings for non-value-returning forms, although actually using them isn't supported by hardware. However, these encodings aren't supported by the backend, meaning that they can't even be assembled or disassembled. Add support for the non-returning encodings, but gate actually using them in instruction selection behind a new feature FeatureAtomicCSubNoRtnInsts, which no target uses. This does allow the non-returning instructions to be tested manually and llvm.amdgcn.atomic.csub.ll is extended to cover them. The feature does not gate assembling or disassembling them, this is now not an error, and encoding and decoding tests have been adapted accordingly.	2023-10-11 11:37:27 +01:00
hev	37b93f07cd	[LoongArch] Add some atomic tests (#68766 )	2023-10-11 18:28:04 +08:00
Nikita Popov	0ead1faef0	[PowerPC] Add test for #68783 (NFC)	2023-10-11 12:15:26 +02:00
Harald van Dijk	a21abc782a	[X86] Align i128 to 16 bytes in x86 datalayouts This is an attempt at rebooting https://reviews.llvm.org/D28990 I've included AutoUpgrade changes to modify the data layout to satisfy the compatible layout check. But this does mean alloca, loads, stores, etc in old IR will automatically get this new alignment. This should fix PR46320. Reviewed By: echristo, rnk, tmgross Differential Revision: https://reviews.llvm.org/D86310	2023-10-11 10:23:38 +01:00
Piotr Sobczak	2888fa4313	[AMDGPU] Update test remat-smrd.mir Update test/CodeGen/AMDGPU/remat-smrd.mir: * Convert a negative case of non-dereferenceable invariant load to positive one. * Add new cases for subreg.	2023-10-11 10:19:22 +02:00
Sacha Coppey	776889bc1c	[RISCV] Add Stackmap/Statepoint/Patchpoint support without targets This patch adds stackmap support for RISC-V without targets (i.e. the nop patchable forms). Reviewed By: reames Differential Revision: https://reviews.llvm.org/D123496	2023-10-11 09:18:55 +05:30
Evgenii Kudriashov	255f826d6f	[X86] Fix value-extending/truncating loads and stores of __ptr32/__ptr64 pointers (#67168 ) The value extension and truncation were missed during casting __ptr32/__ptr64 pointers to the default address space. Closes #66873	2023-10-11 05:19:36 +02:00
Wang Pengcheng	f3c92a06b9	[RISCV] Make PostRAScheduler a target feature (#68692 ) This is what AArch64 has done in https://reviews.llvm.org/D20762. Tests are added in macro fusion tests, which uncover a bug that DAG mutations don't take effect.	2023-10-11 10:51:03 +08:00
hev	203ba238e3	[LoongArch] Improve codegen for atomic ops (#67391 ) This PR improves memory barriers generated by atomic operations. Memory barrier semantics of LL/SC: ``` LL: <memory-barrier> + <load-exclusive> SC: <store-conditional> + <memory-barrier> ``` Changes: * Remove unnecessary memory barriers before LL and between LL/SC. * Fix acquire semantics. (If the SC instruction is not executed, then the guarantee of acquiring semantics cannot be ensured. Therefore, an acquire barrier needs to be generated when memory ordering includes an acquire operation.)	2023-10-11 10:24:18 +08:00
Philip Reames	3a6cc52fe3	Revert "[RISCV] Shrink vslideup's LMUL when lowering fixed insert_subvector (#65997 )" This reverts commit b5ff71e261b637ab7088fb5c3314bf71d6e01da7. As described in https://github.com/llvm/llvm-project/issues/68730, this appears to have exposed an existing liveness issue. Revert to green until we can figure out how to address the root cause. Note: This was not a clean revert. I ended up doing it by hand.	2023-10-10 15:13:57 -07:00

... 47 48 49 50 51 ...

52796 Commits