llvm-project

Author	SHA1	Message	Date
Jianjian Guan	fd50151180	[RISCV] Only support SPLAT_VECTOR for Zvfhmin when also enable the scalar extension of half fp (#88275 )	2024-04-11 10:23:26 +08:00
Matthias Braun	acb7ddc5cf	[WebAssembly] Remove threadlocal.address when disabling TLS (#88209 ) Remove `llvm.threadlocal.address` intrinsic usage when disabling TLS. This fixes errors revealed by the stricter IR verification introduced in PR #87841.	2024-04-10 16:24:02 -07:00
Farzon Lotfi	05093e2438	[Spirv][HLSL] Add OpAll lowering and float vec support (#87952 ) The main point of this change was to add support for HLSL's all intrinsic. In the process of doing that I found a few issues around creating an `OpConstantComposite` via `buildZerosVal`. First the current code didn't support floats so the process of adding `buildZerosValF` meant I needed a float version of `getOrCreateIntConstVector`. After doing so I renamed both versions to `getOrCreateConstVector`. That meant I needed to create a float type version of `getOrCreateIntCompositeOrNull`. Luckily the type information was low for this function so was able to split it out into a helpwe and rename `getOrCreateIntCompositeOrNull` to `getOrCreateCompositeOrNull` With the exception of type handling differences of the code and Null vs 0 Constant Op codes these functions should be identical. To handle scalar floats I could not use `buildConstantFP` like this PR did: https://github.com/llvm/llvm-project/commit/0a2aaab5aba46#diff-733a189c5a8c3211f3a04fd6e719952a3fa231eadd8a7f11e6ecf1e584d57411R1603 because that would create too many superfluous registers (that causes problems in the validator), I had to create a float version of `getOrCreateConstInt` which I called `getOrCreateConstFP`. similar problems with doing it like this: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/SPIRV/SPIRVBuiltins.cpp#L1540. `buildZerosValF` also has a use of a function `getZeroFP`. This is because half, float, and double scalar values of 0 would collide in `SPIRVDuplicatesTracker<Constant> CT` if you use `APFloat(0.0f)`. `getORCreateConstFP` needed its own version of `getOrCreateConstIntReg` which I called `getOrCreateConstFloatReg` The one difference in this function is `getOrCreateConstFloatReg` returns a bit width so we don't have to call `getScalarOrVectorBitWidth` twice ie when it is used again in `getOrCreateConstFP` for `OpConstantF` `addNumImm`. `getOrCreateConstFloatReg` needed an `assignFloatTypeToVReg` helper which called a `getOrCreateSPIRVFloatType` helper. There was no equivalent IntegerType::get for floats so I handled this with a switch statement on bit widths to get the right LLVM float type. Finally, there is the use of `bool ZeroAsNull = STI.isOpenCLEnv();` This is partly a cosmetic change. When Zeros are treated as nulls, we don't create `OpConstantComposite` vectors which is something we do in the DXCs SPIRV backend. The DXC SPIRV backend also does not use `OpConstantNull`. Finally, I needed a means to test the behavior of the OpConstantNull and `OpConstantComposite` changes and this was one way I could do that via the same tests.	2024-04-10 16:27:44 -04:00
shamithoke	e3ef4612c1	Perform bitreverse using AVX512 GFNI for i32 and i64. (#81764 ) Currently, the lowering operation for bitreverse using Intel AVX512 GFNI only supports byte vectors Extend the operation to i32 and i64. --------- Co-authored-by: shami <shami_thoke@yahoo.com>	2024-04-10 20:22:44 +01:00
Jun Wang	86842e1f72	[AMDGPU] New clang option for emitting a waitcnt instruction after each memory instruction (#79236 ) This patch introduces a new command-line option for clang, namely, amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt instruction is generated after each memory load/store instruction. The counter values are always 0, but which counters are involved depends on the memory instruction. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-04-10 10:47:04 -07:00
Craig Topper	f27f369710	[RISCV] Remove interrupt handler special case from RISCVFrameLowering::determineCalleeSaves. (#88069 ) This code was trying to save temporary argument registers in interrupt handler functions that contain calls. With the exception that all FP registers are saved including the normally callee saved registers. If all of the callees use an FP ABI and the interrupt handler doesn't touch the normally callee saved FP registers, we don't need to save them. It doesn't appear that we need to special case functions with calls. The normal callee saved register handling will already check each of the calls and consider a register clobbered if the call doesn't explicitly say it is preserved. All of the test changes are from the removal of the FP callee saved registers. There are tests for interrupt handlers with F and D extension that use ilp32 or lp64 ABIs that are not affected by this change. They still save the FP callee saved registers as they should. gcc appears to have a bug where the D extension being enabled with the ilp32f or lp64f ABI does not save the FP callee saved regs. The callee would only save/restore the lower 32 bits and clobber the upper bits. LLVM saves the FP callee saved regs in this case and there is an unchanged test for it. The unnecessary save/restore was raised in this thread https://discourse.llvm.org/t/has-bugs-when-optimizing-save-restore-csrs-by-changing-csr-xlen-f32-interrupt/78200/1	2024-04-10 10:28:54 -07:00
David Green	4dcf33b6c2	[AArch64] Cleanup and GISel coverage for lrint tests. NFC	2024-04-10 18:13:57 +01:00
Vyacheslav Levytskyy	335d5d5f47	[SPIRV] Tweak parsing of base type name in builtins (#88255 ) This PR is a small improvement of parsing of base type name in builtins, allowing to understand `unsigned ...` types. The test case that fails without the fix is attached.	2024-04-10 19:04:31 +02:00
Craig Topper	323d3ab257	[RISCV] Optimize undef Even vector in getWideningInterleave. (#88221 ) We recently optimized the code when the Odd vector was undef to fix a poison bug. There are additional optimizations we can do if the even vector is undef. With Zvbb, we can use a single vwsll. Without Zvbb, we can use a vzext.vf2 and a vsll.	2024-04-10 09:08:50 -07:00
Craig Topper	7f1b9adfc8	[RISCV] Add MachineCombiner to fold (sh3add Z, (add X, (slli Y, 6))) -> (sh3add (sh3add Y, Z), X). (#87884 ) This improves a pattern that occurs in 531.deepsjeng_r. Reducing the dynamic instruction count by 0.5%. This may be possible to improve in SelectionDAG, but given the special cases around shXadd formation, it's not obvious it can be done in a robust way without adding multiple special cases. I've used a GEP with 2 indices because that mostly closely resembles the motivating case. Most of the test cases are the simplest GEP case. One test has a logical right shift on an index which is closer to the deepsjeng code. This requires special handling in isel to reverse a DAGCombiner canonicalization that turns a pair of shifts into (srl (and X, C1), C2).	2024-04-10 08:39:56 -07:00
Dinar Temirbulatov	990c4bc95f	[AArch64][SVE2] Generate SVE2 BSL instruction in LLVM for bit-twiddling. (#83514 ) Allow to fold or/and-and to BSL instuction for scalable vectors.	2024-04-10 11:07:59 +01:00
Simon Pilgrim	0e7d14d2e8	[X86] Regenerate mmx-intrinsics.ll test checks	2024-04-10 10:42:01 +01:00
hev	0d17e1f0e5	[LoongArch] Revert `sp` adjustment in prologue (#88110 ) After commit 18c5f3c3 ("[RegisterScavenger][RISCV] Don't search for FrameSetup instrs if we were searching from Non-FrameSetup instrs"), we can revert the `sp` adjustment 4e2364a2 ("[LoongArch] Add emergency spill slot for GPR for large frames") to generate better code, as the issue with `RegScavenger` has been resolved. Fixes #88109	2024-04-10 17:13:25 +08:00
Chia	469caa31e7	[RISCV] Use vwadd.vx for splat vector with extension (#87249 ) This patch allows `combineBinOp_VLToVWBinOp_VL` to handle patterns like `(splat_vector (sext op))` or `(splat_vector (zext op))`. Then we can use `vwadd.vx` and `vwadd.w` for such a case. ### Source code ``` define <vscale x 8 x i64> @vwadd_vx_splat_sext(<vscale x 8 x i32> %va, i32 %b) { %sb = sext i32 %b to i64 %head = insertelement <vscale x 8 x i64> poison, i64 %sb, i32 0 %splat = shufflevector <vscale x 8 x i64> %head, <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer %vc = sext <vscale x 8 x i32> %va to <vscale x 8 x i64> %ve = add <vscale x 8 x i64> %vc, %splat ret <vscale x 8 x i64> %ve } ``` ### Before this patch [Compiler Explorer](https://godbolt.org/z/sq191PsT4) ``` vwadd_vx_splat_sext: sext.w a0, a0 vsetvli a1, zero, e64, m8, ta, ma vmv.v.x v16, a0 vsetvli zero, zero, e32, m4, ta, ma vwadd.wv v16, v16, v8 vmv8r.v v8, v16 ret ``` ### After this patch ``` vwadd_vx_splat_sext vsetvli a1, zero, e32, m4, ta, ma vwadd.vx v16, v8, a0 vmv8r.v v8, v16 ret ```	2024-04-10 15:26:17 +09:00
Noah Goldstein	6c40d463c2	[X86] Use `nneg` flag when trying to convert `uitofp` -> `sitofp` Closes #86694	2024-04-09 23:06:55 -05:00
Noah Goldstein	84a5332a68	[X86] Add tests for `uitofp nneg` -> `sitofp`; NFC	2024-04-09 23:06:55 -05:00
Dinar Temirbulatov	528943f153	[AArch64][SME] Allow memory operations lowering to custom SME functions. (#79263 ) This change allows to lower memcpy, memset, memmove to custom SME version provided by LibRT.	2024-04-09 17:27:46 +01:00
Peter Lafreniere	614a578034	[M68k] Add support for bitwise NOT instruction (#88049 ) Currently the bitwise NOT instruction is not recognized. Add support for using NOT on data registers. This is a partial implementation that puts NOT at the same level of support as NEG currently enjoys. Using not rather than eori cuts the length of the encoded instruction in half or in thirds, leading to a reduction of 4-10 cycles per instruction, on the original 68000. This change includes tests for both bitwise and arithmetic negation.	2024-04-09 09:07:26 -07:00
Sam Tebbs	fb8dbd1fb6	[AArch64] Remove copy in SVE/SME predicate spill and fill (#81716 ) 7dc20ab introduced an extra COPY when spilling and filling a PNR register, which can't be elided as the input (PNR predicate) and output (PPR predicate) register classes differ. The patch adds a new register class that covers both PPR and PNR so that STR_PXI and LDR_PXI can take either of them, removing the need for the copy.	2024-04-09 16:17:27 +01:00
Philip Reames	e47fd09f8e	[RISCV] Use shNadd for scalable stack offsets (#88062 ) If we need to multiply VLENB by 2, 4, or 8 and add it to the stack pointer, we can do so with a shNadd instead of separate shift and add instructions.	2024-04-09 07:29:10 -07:00
Vyacheslav Levytskyy	23b058cb7f	[SPIR-V] Re-implement switch and improve validation of forward calls (#87823 ) This PR fixes issue https://github.com/llvm/llvm-project/issues/87763 and preserves valid CFG in cases when previous scheme failed to generate valid code for a switch statement. The PR hardens one existing test case and adds one more test case as a validation of a new switch generation. Tests are passing spirv-val now. This PR also improves validation of forward calls.	2024-04-09 16:15:44 +02:00
Natalie Chouinard	1e44d9ac5e	[SPIR-V] Map llvm.{min,max}num to GL::N{Min,Max} (#88009 ) SPIR-V intsruction selection was mapping the LLVM float min/max intrinsics to FMin and FMax respectively for GL/Vulkan environments, which does not match the intrinsics' documented treatment of NaN operands. This patch switches the mapping to the correctly matched NMin and NMax operations. Fixes #87072	2024-04-09 09:41:47 -04:00
Simon Pilgrim	961d91abd3	[X86] shuffle-vs-trunc-128.ll - add common AVX2 check prefix	2024-04-09 14:14:01 +01:00
Simon Pilgrim	a4cf479cdf	[X86] shuffle-vs-trunc-128.ll - add BWVL-ONLY/VBMI/VBMI-FAST/VBMI-SLOW check prefixes to recover missing test checks It is VERY annoying that update_llc_test_checks.py silently fails instead of correctly warning when this happens :(	2024-04-09 13:44:01 +01:00
Simon Pilgrim	866a1bc814	[X86] Add test coverage for #88030	2024-04-09 13:23:44 +01:00
Simon Pilgrim	4023329bbf	[X86] collectConcatOps - add ability to recurse through insert_subvector chains Allows us to match insert_subvector(insert_subvector(undef, insert_subvector(insert_subvector(undef, x, 0), y, 1), 0), 0), insert_subvector(insert_subvector(undef, z, 0), w, 1), 2)	2024-04-09 13:23:44 +01:00
Simon Pilgrim	0bbe953aa3	[X86] Fold extract_subvector(cvtps2dq(x),c) -> cvtps2dq(extract_subvector(x,c)) Help unblock #83402	2024-04-09 11:06:18 +01:00
Luke Lau	24e8c6a09b	[RISCV] Convert remaining constant splats in tests to use splat shorthand. NFC (#88099 ) This follows on from #87616, but includes the tests with codegen differences. These are presumably due to the fact that the splat is now a constant expression. They don't seem to affect anything that we were specifically testing for.	2024-04-09 17:15:15 +08:00
Qiu Chaofan	a4558a4a53	[PowerPC] Implement 32-bit expansion for rldimi (#86783 ) rldimi is 64-bit instruction, due to backward compatibility, it needs to be expanded into series of rotate and masking in 32-bit environment. In the future, we may improve bit permutation selector and remove such direct codegen.	2024-04-09 16:43:49 +08:00
Jay Foad	9c58f3a234	[AMDGPU] Fix implicit $vcc operands after parsing MIR (#87781 ) MIParser checks that implicit operands match the instruction definition, so they have to be $vcc even in wave32 mode. Use the mirFileLoaded hook to fix them after MIParser's checks, converting them to $vcc_lo which is what that rest of CodeGen expects. This is all just extending the fixImplicitOperands hack which was introduced with GFX10, but at least it makes it possible to write a MIR test which creates the same instructions that normal CodeGen would generate.	2024-04-09 09:10:45 +01:00
Luke Lau	9c660362c4	[RISCV] Support vwsll in combineBinOp_VLToVWBinOp_VL (#87620 ) If the subtarget has +zvbb then we can attempt folding shl and shl_vl to vwsll nodes. There are few test cases where we still don't pick up the vwsll: - For fixed vector vwsll.vi on RV32, see the FIXME for VMV_V_X_VL in fillUpExtensionSupport for support implicit sign extension - For scalable vector vwsll.vi we need to support ISD::SPLAT_VECTOR, see #87249	2024-04-09 16:10:35 +08:00
Luke Lau	0f20b9b92f	[RISCV] Don't require mask or VL to be the same in combineBinOp_VLToVWBinOp_VL (#87997 ) In NodeExtensionHelper we keep track of the VL and mask of the operand being extended and check that they are the same as the root node's. However for the nodes that we support, none of them have a passthru operand with the exception of RISCV::VMV_V_X_VL, but we check that it's passthru is undef anyway. So it's safe to just discard the extend node's VL and mask and just use the root's instead. (This is the same type of reasoning we use to treat any vmset_vl as an all ones mask) This allows us to match some more cases where we mix VP/non-VP/VL nodes, but these don't seem to appear in practice. The main benefit from this would be to simplify the code.	2024-04-09 16:04:10 +08:00
Luke Lau	d8d131dfa9	[RISCV] Convert more constant splats in tests to splat shorthand. NFC (#87616 ) A handy shorthand for specifying the shufflevector(insertelement(poison, foo, 0), poison, zeroinitializer) splat pattern was introduced in #74620. Some of the RISC-V tests were converted over to use this new form in dbb65dd330cc1696d7ca3dedc7aa9fa12c55a075, this patch handles the rest which didn't have any codegen diffs. This not only converts some constant expressions to the new form, but also instruction sequences that weren't previously constant expressions to constant expressions as well. In some cases this affects codegen, but these have been omitted here and will be handled in a separate PR.	2024-04-09 15:46:38 +08:00
Qiu Chaofan	71eda17a06	[Legalizer] Soften EXTRACT_ELEMENT on ppcf128 (#77412 ) ppc_fp128 values are always split into two f64. Implement soften operation in soft-float mode to handle output f64 correctly.	2024-04-09 10:26:24 +08:00
Alexandre Ganea	ec1af63dde	[Codegen][X86] Fix /HOTPATCH with clang-cl and inline asm (#87639 ) This fixes an edge case where functions starting with inline assembly would assert while trying to lower that inline asm instruction. After this PR, for now we always add a no-op (xchgw in this case) without considering the size of the next inline asm instruction. We might want to revisit this in the future. This fixes Unreal Engine 5.3.2 compilation with clang-cl and /HOTPATCH. Should close https://github.com/llvm/llvm-project/issues/56234	2024-04-08 20:02:19 -04:00
Craig Topper	4e98adf677	[RISCV] Add tests for F/D with non-FP ABI to interrupt-attr.ll. NFC Without a floating point aware ABI for callees, an interrupt handler needs to save all floating point registers even normally callee saved. We are currently unnecessarily saving callee saved FP registers when a floating point ABI is used by the callee. This is different than gcc as noted in this discourse post https://discourse.llvm.org/t/has-bugs-when-optimizing-save-restore-csrs-by-changing-csr-xlen-f32-interrupt/78200/1	2024-04-08 16:12:36 -07:00
Craig Topper	472ea6e015	[RISCV] Resolve CHECK prefix conflict in fixed-vectors-vitofp-constrained-sdnode.ll. NFC	2024-04-08 16:01:18 -07:00
Craig Topper	afc7cc7b12	[RISCV] Fix missing CHECK prefixes in vector lrint test files. NFC All of these test cases had iXLen in their name which got replaced by sed. This prevented FileCheck from finding the function. The other test cases in these files do not have that issue.	2024-04-08 16:01:18 -07:00
Arthur Eubanks	922700df44	Revert "[X86] Change how we treat functions with explicit sections as small/large (#87838 )" This reverts commit e27c3736f975ca463476223c465e4777186f603f. Breaks ExecutionEngine/MCJIT/test-global-ctors.ll on windows, e.g. https://lab.llvm.org/buildbot/#/builders/117/builds/18749.	2024-04-08 23:00:01 +00:00
Craig Topper	89ebb56152	[RISCV] Resolve CHECK prefix conflict in fixed-vectors-vwsll.ll. NFC riscv32 and riscv64 generate different code for one test case so we need RV32 and RV64 CHECK lines.	2024-04-08 15:45:07 -07:00
Arthur Eubanks	e27c3736f9	[X86] Change how we treat functions with explicit sections as small/large (#87838 ) Following #78348, we should treat functions with an explicit section as small, unless the section name is (or has the prefix) ".ltext". Clang emits global initializers into a ".text.startup" section on Linux. If we mix small/medium code model object files with large code model object files, we'll end up mixing sections with and without the large section flag.	2024-04-08 15:40:19 -07:00
Eli Friedman	7ad481e76c	Revert "[AArch64] Add support for -ffixed-x30" (#88019 ) This reverts commit e770153865c53c4fd72a68f23acff33c24e42a08. This wasn't reviewed, and the functionality in question was intentionally rejected the last time it was discussed in https://reviews.llvm.org/D56305 .	2024-04-08 15:16:00 -07:00
Philip Reames	eb26edbbf8	[RISCV] Exploit sh3add/sh2add for stack offsets by shifted 12-bit constants (#87950 ) If we're falling back to generic constant formation in a register + add/sub, we can check if we have a constant which is 12-bits but left shifted by 2 or 3. If so, we can use a sh2add or sh3add to perform the shift and add in a single instruction. This is profitable when the unshifted constant would require two instructions (LUI/ADDI) to form, but is never harmful since we're going to need at least two instructions regardless of the constant value. Since stacks are aligned to 16 bytes by default, sh3add allows addresing (aligned) data out to 2^14 (i.e. 16kb) in at most two instructions w/zba.	2024-04-08 14:53:21 -07:00
Philip Reames	f5cf98c026	[RISCV] Improve test coverage for #87950 Noticed in review that we want both the LUI and LUI/ADDI cases with different behavior for each.	2024-04-08 14:39:37 -07:00
Leonard Grey	c23135c548	-fsanitize=function: fix .subsections_via_symbols (#87527 ) -fsanitize=function emits a signature and function hash before a function. Similar to 7f6e2c9, these can be sheared off when `.subsections_via_symbols` is used. This change uses the same technique 7f6e2c9 introduced for prefixes: emitting a symbol for the metadata, then marking the actual function entry as an .alt_entry symbol.	2024-04-08 16:05:52 -04:00
Daniil Kovalev	89eb1a5a8e	[test][AArch64][CodeGen] Delete redundant check lines (#87965 ) llvm/test/CodeGen/AArch64/elf-globals-pic.ll: Since https://reviews.llvm.org/D91734, elf-globals-static.ll test contains several `CHECK-PIC` lines. They do not seem to bring any value since there are no FileCheck run lines checking against this prefix. The right place for such tests should be elf-globals-pic.ll, which already contains check lines being deleted in this commit. Both elf-globals-pic.ll and elf-globals-static.ll were created after splitting arm64-elf-globals.ll in 6dbd0ea, and having `CHECK-PIC` lines in elf-globals-static.ll seems like an issue occurred because of git thinking that elf-globals-pic.ll is a new file and elf-global-static.ll is a rename of arm64-elf-globals.ll. llvm/test/CodeGen/AArch64/tagged-globals-pic.ll: Similar to elf-globals-pic.ll, contains unneeded `CHECK-SELECTIONDAGISEL` and `CHECK-GLOBALISEL` directives not checked by any FileCheck invocation. These directives are present in tagged-globals-static.ll. Both tests are present in the code tree since fd32639 when tagged-globals.ll was splitted into tagged-globals-{pic\|static}.ll.	2024-04-08 22:27:50 +03:00
Kevin P. Neal	eeedb1e962	[FPEnv][X86] Correct one more strictfp test. Correct a strictfp test to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics This test needed the strictfp attribute added to some function definitions. FP wait instructions now appear as a result. The need for the wait instructions is explained by Andy Kaylor in PR#87791: https://github.com/llvm/llvm-project/pull/87791 Test changes verified with D146845.	2024-04-08 14:39:08 -04:00
Kevin P. Neal	8ccf1c117b	[FPEnv][X86] Correct strictfp tests. (#87791 ) Correct strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics These tests needed the strictfp attribute added to some function definitions. FP wait instructions now appear as a result. Test changes verified with D146845.	2024-04-08 10:14:02 -04:00
Matt Arsenault	8cb642bf18	GlobalISel: Regenerate test checks	2024-04-08 08:32:04 -04:00
Matt Arsenault	acb2a47576	AMDGPU: Regenerate test checks	2024-04-08 08:17:09 -04:00

1 2 3 4 5 ...

52796 Commits