llvm-project

Author	SHA1	Message	Date
Kai Luo	56414220df	[PowerPC] Use 'sync; ld; cmp; bc; isync' for atomic load seq-cst on 32-bit platform (#75905 ) `cmp; bc; isync` is more performant than `lwsync` theoretically. 64-bit platform already features it, now implement it for 32-bit platform.	2023-12-20 10:01:02 +08:00
Jeffrey Byrnes	f1156fb622	[AMDGPU][IGLP]: Add SchedGroupMask::TRANS (#75416 ) Makes constructing SchedGroups of this type easier, and provides ability to create them with __builtin_amdgcn_sched_group_barrier	2023-12-19 16:54:18 -08:00
Craig Topper	05abe8a7e8	[RISCV] Remove Zfbfmin dependency from Zvfbfmin. (#75851 ) Zvfbfmin does not have any scalar operands making this an unnecessary dependency. The spec was just updated to remove this. See `86d7a74f4b` This fixes a correctness issue where Xsfvfwmaccqqq was incorrectly depending on Zfbfmin. The SiFive CPUs that support Xsfvfwmaccqqq do not implement Zfbfmin, but do implement Zvfbfmin based on a previous understanding that it only requires Zve32f. I've added tests for this feature to raise the bar for adding dependencies to it in the future.	2023-12-19 15:07:38 -08:00
Yusra Syeda	0768253c20	[SystemZ][z/OS] Add exception handling for XPLINK (#74638 ) Adds emitting the exception table and the EH registers for XPLINK. --------- Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>	2023-12-19 13:58:33 -05:00
Michael Maitland	571d151dec	[RISCV][MISched] Set EnableIntervals to true for SiFive7 (#75681 ) The SiFive7 scheduler model has been using AcquireAtCycles and ReleaseAtCycles for some time. Without EnableIntervals, the scheduler was not making decisions based on this information. This patch sets EnableIntervals to true, and the test case demonstrates that the VADD instructions can be issued one cycle earlier since the VCQ is not reserved. This leads to better saturation of the SiFive7VA.	2023-12-19 11:03:03 -05:00
Jonas Paulsson	e32e147d6c	[DAGCombiner] Don't drop alignment info of original load. (#75626 ) Pass the original MMO instead of different individual values. getAlign() was used before where actually getOriginalAlign() would have been better, and this patch has the same effect.	2023-12-19 16:30:47 +01:00
Rin	0894c2ee5f	[DAGCombiner] Avoid the pre-truncate of BUILD_VECTOR sources. (#75792 ) Avoid the pre-truncate of BUILD_VECTOR sources when there is more than one use. This can avoid using unnecessary movs later down the instruction selection pipeline.	2023-12-19 15:25:38 +00:00
Antonio Frighetto	9aeb3336fd	[AArch64] Ensure `SplatBitSize` conforms with the original lane width A miscompilation issue has been addressed with improved checking. Fixes: https://github.com/llvm/llvm-project/issues/75822.	2023-12-19 16:03:56 +01:00
Kerry McLaughlin	e9af57dfea	[Clang][SME2] Add builtins for moving multi-vectors to/from ZA (#71191 ) Adds the following SME2 builtins: - svread_hor/ver, - svwrite_hor/ver, - svread_za64, - svwrite_za64 See https://github.com/ARM-software/acle/pull/217	2023-12-19 13:51:10 +00:00
Matt Arsenault	1196975286	AMDGPU: Add gfx11 run line to bf16 test	2023-12-19 17:12:52 +07:00
Mariusz Sikora	a018c8cdbb	GFX12: Add LoopDataPrefetchPass (#75625 ) It is currently disabled by default. It will need experiments on a real HW to tune and decide on the profitability. --------- Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-19 08:32:16 +01:00
Eric Biggers	09058654f6	[RISCV] Remove experimental from Vector Crypto extensions (#74213 ) The RISC-V vector crypto extensions have been ratified. This patch updates the Clang and LLVM support for these extensions to be non-experimental, while leaving the C intrinsics as experimental since the C intrinsics are not yet standardized. Co-authored-by: Brandon Wu <brandon.wu@sifive.com>	2023-12-18 22:04:22 -08:00
James Y Knight	137f785fa6	[AMDGPU] Set MaxAtomicSizeInBitsSupported. (#75185 ) This will result in larger atomic operations getting expanded to `__atomic_*` libcalls via AtomicExpandPass, which matches what Clang already does in the frontend. While AMDGPU currently disables the use of all libcalls, I've changed it to instead disable all of them _except_ the atomic ones. Those are already be emitted by the Clang frontend, and enabling them in the backend allows the same behavior there.	2023-12-18 16:51:06 -05:00
Justin Bogner	4f54d71501	[HLSL][DirectX] Move handling of resource element types into the frontend Rather than shepherding a type name all the way to the backend as a string and attempting to parse it, get the element type out of the AST and store that in the resource annotation metadata directly. Pull Request: https://github.com/llvm/llvm-project/pull/75674	2023-12-18 11:43:52 -07:00
Simon Pilgrim	7b1e4239b3	[DAG] Fold (vt trunc (extload (vt x))) -> (vt load x) (#75229 ) We were only folding cases which remained extloads, but DAG.getExtLoad can also handle the cases which don't need to extend at all (we just can't do truncloads). reduceLoadWidth can handle this for scalar loads, but not for vectors. Noticed while triaging D152928	2023-12-18 16:21:11 +00:00
Nathan Sidwell	d0285a31c8	aarch64: fix testcase (#75723 ) Add missing < %s to RUN line.	2023-12-18 11:02:44 -05:00
Momchil Velikov	fd527def7e	[Clang][SVE2.1] Add floating-point variants of `svrevd_XX` (#75117 )	2023-12-18 15:52:28 +00:00
Ulrich Weigand	82a1bffd34	[SelectionDAG] Do not crash on large integers in CheckInteger (#75787 ) The CheckInteger routine called from TableGen-generated selection logic uses getSExtValue - which will abort if the underlying APInt does not fit into an int64_t. This case is now triggered by the SystemZ back-end since i128 is a legal type on certain machines. While we do not have any regular instructions that take 128-bit immediates (like most other platforms), there are patterns in the .td files that recognize an i128 "xor ..., -1" as a "not". These patterns cause code to be generated that calls the CheckInteger routine on some i128-valued integer, which may trigger the assert. Fix by using trySExtValue instead. Fixes https://github.com/llvm/llvm-project/issues/75710	2023-12-18 14:03:57 +01:00
Serge Pavlov	2f81788067	[ARM][FPEnv] Lowering of fpmode intrinsics (#74054 ) LLVM intrinsics `get_fpmode`, `set_fpmode` and `reset_fpmode` operate control modes, the bits of FP environment that affect FP operations. On ARM these bits are in FPSCR together with the status bits. The implementation of these intrinsics produces code close to that of functions `fegetmode` and `fesetmode` from GLIBC. Pull request: https://github.com/llvm/llvm-project/pull/74054	2023-12-18 18:57:36 +07:00
Ulrich Weigand	a00c4220be	[SystemZ] Fix complex address matching when i128 is legal Complex address matching currently handles truncations, under the assumption that those are no-ops. This is no longer true when i128 is legal. Change the code to only handle actual no-op truncations. Fixes https://github.com/llvm/llvm-project/issues/75708 Fixes https://github.com/llvm/llvm-project/issues/75714	2023-12-18 12:47:45 +01:00
Yeting Kuo	b83b28779e	[RISCV] Make Zhinx and Zvfh imply Zhinxmin and Zvfhmin respectively (#75735 ) Zhinxmin is a subset of Zhinx and Zvfhmin is also a subset of Zvfh.	2023-12-18 11:46:22 +08:00
Arthur Eubanks	68c976bf64	[X86] Fix referencing local tagged globals We should treat the medium code model like the small code model. Classifying non-local references already properly handled this.	2023-12-17 13:49:50 -08:00
melonedo	3eaed9e6f5	[RISCV] Implement intrinsics for XCVbitmanip extension in CV32E40P (#74993 ) Implement XCVbitmanip intrinsics for CV32E40P according to the specification. This commit is part of a patch-set to upstream the vendor specific extensions of CV32E40P that need LLVM intrinsics to implement Clang builtins. Contributors: @CharKeaney, @ChunyuLiao, @jeremybennett, @lewis-revill, @NandniJamnadas, @PaoloS02, @simonpcook, @xingmingjie. Spec: `05481cf0ef/specifications/corev-builtin-spec.md (listing-of-pulp-bit-manipulation-builtins-xcvbitmanip)`. Previously reviewed on Phabricator: https://reviews.llvm.org/D157510. Parallel GCC patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635795.html. Co-authored-by: melonedo <funanzeng@gmail.com>	2023-12-17 19:29:40 +08:00
Carl Ritson	5139299618	[AMDGPU] Track physical VGPRs used for SGPR spills (#75573 ) Physical VGPRs used for SGPR spills need to be tracked independent of WWM reserved registers. The WWM reserved set contains extra registers allocated during WWM pre-allocation pass. This causes SGPR spills allocated after WWM pre-allocation to overlap with WWM register usage, e.g. if frame pointer is spilt during prologue/epilog insertion.	2023-12-17 16:44:16 +09:00
Craig Topper	c26510a2bf	[RISCV] Fix intrinsic names in sf_vfwmacc_4x4x4.ll. NFC The type strings in the intrinsic name were using f16 instead of bf16 for float types. Nothing really checks these strings so everything still worked.	2023-12-16 14:54:50 -08:00
Stefan Pintilie	c398fa009a	Revert "Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"" This reverts commit f4b5be1ecdc85ca4257b739afb8d57e23c7a8030. The above change was breaking the clang-ppc64le-linux-test-suite bot.	2023-12-16 07:30:53 -06:00
Yeting Kuo	5545b25452	[RISCV] Make Zfh imply Zfhmin. (#75576 ) According to spec, the Zfhmin extension is a subset of the Zfh extension.	2023-12-16 11:22:07 +08:00
Arthur Eubanks	b3e353d263	[X86] Don't use rip-relative lea to get a function address in medium static mode (#75656 ) This essentially reverts https://reviews.llvm.org/D140593. Somewhere along the line we properly fixed the medium code model to assume functions are small, so now we get a 32-bit movl as desired.	2023-12-15 15:15:18 -08:00
Paul Kirth	9a578a9f60	Revert "[StackColoring] Delete dead stack slots (#75351 )" (#75655 ) This reverts commit 08b306dc8e7c0b2498f4f194a3c51686d56dbd20. it causes the following assertion failure: llvm/include/llvm/CodeGen/MachineFrameInfo.h:530: int64_t llvm::MachineFrameInfo::getObjectOffset(int) const: Assertion `!isDeadObjectIndex(ObjectIdx) && "Getting frame offset for a dead object?"' failed.	2023-12-15 13:32:39 -08:00
Arthur Eubanks	809ee6cfcf	[X86][test] Update tagged-globals*.ll tests Use update_llc_test_checks.py. Split out jump table tests into separate file since we don't want to check the exact instruction sequence for it.	2023-12-15 12:54:55 -08:00
Ulrich Weigand	59f7f35a90	[SystemZ] ABI support for single-element vector types Support passing and returning values of single-element vector types (i.e. <1 x i128> and <1 x fp128>). Now that i128 is a legal type, supporting these types can be done simply by providing a getRegisterTypeForCallingConv implementation that handles them. Fixes https://github.com/llvm/llvm-project/issues/61291	2023-12-15 19:31:00 +01:00
Philip Reames	e8a15eca92	[RISCV] Prefer whole register loads and stores when VL=VLMAX (#75531 ) If we're lowering a fixed length vector load or store which happens to exactly VLEN in size (when VLEN is exactly known), we can use a whole register load or store instead of the unit strided variants. This doesn't require a vsetvli in some cases, allows additional flexibility of vsetvli cases in others, and doesn't have a runtime dependency on the value of VL.	2023-12-15 09:26:57 -08:00
Craig Topper	93b14c3df1	[RISCV Add some vsetvli insertion test cases with vmv.s.x+reduction. NFC (#75544 ) These test cases where intended to get a single vsetvli by using the vmv.s.x intrinsic with the same LMUL as the reduction. This works for FP, but does not work for integer. I believe #71501 will break this for FP too. Hopefully the vsetvli pass can be taught to fix this.	2023-12-15 08:50:54 -08:00
Mariusz Sikora	414d27419f	[AMDGPU] GFX12: select @llvm.prefetch intrinsic (#74576 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-15 17:15:55 +01:00
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Mirko Brkušanin	07a6d73664	[AMDGPU] CodeGen for GFX12 VFLAT, VSCRATCH and VGLOBAL instructions (#75493 )	2023-12-15 15:01:40 +01:00
Mirko Brkušanin	5879162f7f	[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492 )	2023-12-15 13:45:03 +01:00
Ulrich Weigand	a65ccc1b9f	[SystemZ] Support i128 as legal type in VRs (#74625 ) On processors supporting vector registers and SIMD instructions, enable i128 as legal type in VRs. This allows many operations to be implemented via native instructions directly in VRs (including add, subtract, logical operations and shifts). For a few other operations (e.g. multiply and divide, as well as atomic operations), we need to move the i128 value back to a GPR pair to use the corresponding instruction there. Overall, this is still beneficial. The patch includes the following LLVM changes: - Enable i128 as legal type - Set up legal operations (in SystemZInstrVector.td) - Custom expansion for i128 add/subtract with carry - Custom expansion for i128 comparisons and selects - Support for moving i128 to/from GPR pairs when required - Handle 128-bit integer constant values everywhere - Use i128 as intrinsic operand type where appropriate - Updated and new test cases In addition, clang builtins are updated to reflect the intrinsic operand type changes (which also improves compatibility with GCC).	2023-12-15 12:55:15 +01:00
Mirko Brkušanin	26b14aedb7	[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488 )	2023-12-15 12:40:23 +01:00
Pierre van Houtryve	ef067f5204	[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830 ) Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>	2023-12-15 12:33:32 +01:00
Mirko Brkušanin	a278ac577e	[AMDGPU] CodeGen for SMEM instructions (#75579 )	2023-12-15 12:10:33 +01:00
chuongg3	70579c95bd	[AArch64][GlobalISel] Look into array's element (#74109 ) In AArch64RegisterBankInfo, IsFPOrFPType() does not work correctly with ArrayTypes and StructTypes as it does not not look at their elements. This caused some registers to be selected as gpr instead of fpr.	2023-12-15 10:46:57 +00:00
Mariusz Sikora	229273f538	[AMDGPU] Update permlane test for GFX12 (#75572 )	2023-12-15 11:18:23 +01:00
mohammed-nurulhoque	08b306dc8e	[StackColoring] Delete dead stack slots (#75351 ) deletes slots that have lifetime markers and the lifetime ranges are empty.	2023-12-15 09:58:19 +00:00
Mirko Brkušanin	569ef8ddd9	[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204 )	2023-12-15 10:41:05 +01:00
Carl Ritson	0ed0b7458a	[AMDGPU] Pre-commit test for #75573 . NFC Shows spill allocation overlapping with WWM register use.	2023-12-15 18:29:08 +09:00
Mariusz Sikora	966416b9e8	[AMDGPU][GFX12] Add new v_permlane16 variants (#75475 )	2023-12-15 10:14:38 +01:00
Pierre van Houtryve	f1ea77f7be	[AMDGPU][SIInsertWaitcnts] Set initial state for VS_CNT in non-kernel functions (#75436 ) Split from #72830	2023-12-15 08:31:14 +01:00
Wang Yaduo	c532ba4edd	[RISCV] Support printing immediate of RISCV MCInst in hexadecimal format (#74053 ) Enable the llvm-objdump to disassemble the immediate of RISCV instruction in hexadecimal format with --print-imm-hex flag.	2023-12-14 22:42:11 -08:00
Vitaly Buka	fc3adf74d3	Revert "[RISCV] Support printing immediate of RISCV MCInst in hexadecimal format" (#75561 ) Reverts llvm/llvm-project#74053 Breaks https://lab.llvm.org/buildbot/#/builders/5/builds/39291 Co-authored-by: Wang Yaduo <wangyaduo@linux.alibaba.com> Issue #75563	2023-12-14 22:05:47 -08:00

... 29 30 31 32 33 ...

52796 Commits