llvm-project

Author	SHA1	Message	Date
Philip Reames	9cf675923a	[RISCVRVVInitUndef] Remove implicit single use assumption for IMPLICIT_DEF The code was written with the implicit assumption that each IMPLICIT_DEF either a) the tied operand, or b) an untied source, but not both. This is true right now, but an upcoming change may allow CSE of IMPLICIT_DEFs in some cases, so let's rewrite the code to handle that possibility. I added an MIR case which demonstrates the multiple use IMPLICIT_DEF. To my knowledge, this is not a reachable configuration from IR right now. As an aside, this makes the structure a much closer match with the sub-reg liveness case, and we can probably just merge these routines. (Future work.) Differential Revision: https://reviews.llvm.org/D156477	2023-07-27 16:25:56 -07:00
Jon Roelofs	3e0cdf332f	Upgrade a rdar://5907648 link to a github issue https://github.com/llvm/llvm-project/issues/64174	2023-07-27 13:37:48 -07:00
Hiroshi Yamauchi	a90228b911	[AArch64][Windows] Fix the slot offset of the swift async context register. This fixes a code gen issue where savings the swift async context register (x22) accidentally overwrites the saved value of another callee-saved register, corrupts its value and causes a crash. Differential Revision: https://reviews.llvm.org/D156391	2023-07-27 12:32:43 -07:00
Craig Topper	a81e1f0fb2	[RISCV] When using vror.vi for left rotate, mask the inverted immediate to SEW. This makes the assembly more readable. Reviewed By: luke Differential Revision: https://reviews.llvm.org/D156348	2023-07-27 12:16:21 -07:00
Matt Arsenault	e5f04830c5	ARM: Use explicit triple in a test to avoid inheriting windows from the host	2023-07-27 13:18:50 -04:00
Daniel Hoekwater	3435a6a0bb	[AArch64] [XRay] Account for XRay event instrs in Branch Relaxation PATCHABLE_TYPED_EVENT_CALL and PATCHABLE_EVENT_CALL are pseudo instructions that expand to XRay sleds, so getInstSizeInBytes should reflect the size of the sleds, not the pseudo-instructions. Differential Revision: https://reviews.llvm.org/D156272	2023-07-27 17:10:58 +00:00
Matt Arsenault	95e5a461f5	AMDGPU: Always custom lower extract_subvector The patterns were ripped out in a4a3ac10cb1a40ccebed4e81cd7e94f1eb71602d so this always needs to be custom lowered. I absolutely hate how difficult it is to write tests for these, I have no doubt there are more of these hidden. Fixes #64142	2023-07-27 08:46:44 -04:00
Jay Foad	2dcf051259	[CodeGen] Store call frame size in MachineBasicBlock Record the call frame size on entry to each basic block. This is usually zero except when a basic block has been split in the middle of a call sequence. This simplifies PEI::replaceFrameIndices which previously had to visit basic blocks in a specific order and had special handling for unreachable blocks. More importantly it paves the way for an equally simple implementation of a backwards version of replaceFrameIndices, which is required to fully convert PrologEpilogInserter to backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D156113	2023-07-27 10:32:00 +01:00
David Green	beabfe747b	[AArch64] Sink splat to fmlal intrinsics Similar to other neon index instructions, it is beneficial to sink the splat to the instruction for fmlal in order for it to create the index.	2023-07-27 10:07:01 +01:00
David Green	509cb33469	[AArch64] Correct the regtype of indexed fmlal The indexed fmlal should use a low numbered register for the index operand, which this fixes by making it V128_lo. Fixes 64104 Differential Revision: https://reviews.llvm.org/D156296	2023-07-27 08:27:03 +01:00
David Green	e012c5cfac	[AArch64] Add test showing incorrect register usage of FMLAL. NFC See D156296	2023-07-27 07:39:10 +01:00
Vitaly Buka	a496c8be6e	Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" And dependent commits. Details in D150388. This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b. No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll Reviewed By: alexfh Differential Revision: https://reviews.llvm.org/D156381	2023-07-26 22:13:32 -07:00
Pravin Jagtap	1462053608	[AMDGPU] Propagate constants for llvm.amdgcn.wave.reduce.umin/umax Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D156077	2023-07-26 23:46:01 -04:00
Nitin John Raj	474cf4feb7	[RISCV][GlobalISel] Test legalization of binary logical instructions with wider types Without any additional tweaking, we can successfully legalize for wider types (i64, i96 for rv32; i128, i192 for rv64) that are integer multiples of XLen. Reviewed By: arsenm, craig.topper Differential Revision: https://reviews.llvm.org/D155639	2023-07-26 15:37:13 -07:00
Matthew Voss	380dbfd8ca	Revert "Reapply [IR] Mark and/or constant expressions as undesirable" This reverts commit 0cab8d20417c0e2ccc1ffc5505e080126f5de8e6. Reverted due to an LTO crash. I've put a reduced test case here: https://github.com/llvm/llvm-project/issues/64114	2023-07-26 12:54:07 -07:00
Luke Lau	33a93a41df	[RISCV] Add SDNode patterns for vwsll.[vv,vx,vi] This reuses the patterns introduced to help lower vnsr[a,l].vx in D155698. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155936	2023-07-26 20:26:35 +01:00
Luke Lau	ce8f094da8	[RISCV] Add patterns for vnsrl.vx where shift amount is truncated Similar to D155698 where the shift amount is extended, this patch extends the ComplexPattern to handle the case where the shift amount has been truncated. Truncations are custom lowered to truncate_vector_vl, and in cases like i64 -> i16 they are truncated by one power of two at a time, so we need to unravel nested layers of them. The pattern can also be reused for Zvbb's vwsll.vx in an upcoming patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155928	2023-07-26 20:26:32 +01:00
Luke Lau	7c652feb95	[RISCV] Add tests for vnsrl.vx where shift amount is truncated Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155927	2023-07-26 20:26:27 +01:00
Kevin P. Neal	33e25cdd48	[FPEnv][X86] Correct strictfp tests. Recommit only the tests that look good this time. Correct X86 strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics Mostly these tests just needed the strictfp attribute on function definitions. After D154991 the constrained intrinsics have the strictfp attribute by default so they don't need it here, but other functions do. Test changes verified with D146845.	2023-07-26 13:35:58 -04:00
Craig Topper	e28307e93a	[RISCV] Handle seteq/setne conditions for CZERO_NEZ/CZERO_EQZ during isel. This removes selectSETCC and adds isel patterns for seteq/setne conditions. This removes the duplication of selectSETCC between lowering and isel. This also gets some cases in xaluo.ll that we missed previously. Reviewed By: wangpc Differential Revision: https://reviews.llvm.org/D156250	2023-07-26 10:06:08 -07:00
Yonghong Song	6c412b6c6f	[BPF] Add a few new insns under cpu=v4 In [1], a few new insns are proposed to expand BPF ISA to . fixing the limitation of existing insn (e.g., 16bit jmp offset) . adding new insns which may improve code quality (sign_ext_ld, sign_ext_mov, st) . feature complete (sdiv, smod) . better user experience (bswap) This patch implemented insn encoding for . sign-extended load . sign-extended mov . sdiv/smod . bswap insns . unconditional jump with 32bit offset The new bswap insns are generated under cpu=v4 for __builtin_bswap. For cpu=v3 or earlier, for __builtin_bswap, be or le insns are generated which is not intuitive for the user. To support 32-bit branch offset, a 32-bit ja (JMPL) insn is implemented. For conditional branch which is beyond 16-bit offset, llvm will do some transformation 'cond_jmp' -> 'cond_jmp + jmpl' to simulate 32bit conditional jmp. See BPFMIPeephole.cpp for details. The algorithm is hueristic based. I have tested bpf selftest pyperf600 with unroll account 600 which can indeed generate 32-bit jump insn, e.g., 13: 06 00 00 00 9b cd 00 00 gotol +0xcd9b <LBB0_6619> Eduard is working on to add 'st' insn to cpu=v4. A list of llc flags: disable-ldsx, disable-movsx, disable-bswap, disable-sdiv-smod, disable-gotol can be used to disable a particular insn for cpu v4. For example, user can do: llc -march=bpf -mcpu=v4 -disable-movsx t.ll to enable cpu v4 without movsx insns. References: [1] https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/ Differential Revision: https://reviews.llvm.org/D144829	2023-07-26 08:37:30 -07:00
Simon Pilgrim	4aa06ba39e	[X86] Cleanup vector-trunc-* test prefixes Add missing SSE2-SSSE3 common prefix to vector-trunc-ssat.ll	2023-07-26 16:27:56 +01:00
Alexander Kornienko	0def4e6b0f	Revert "[AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre" This reverts commit b0093e13fcfdd4eea5bbd7ae57d3d1b82f4135c3 due to a miscompile under MSan. See https://reviews.llvm.org/D152407#4533478 for more details. Reviewed By: asmok-g Differential Revision: https://reviews.llvm.org/D156328	2023-07-26 16:22:24 +02:00
Kevin P. Neal	3a5f8c3af8	Revert "[FPEnv][X86] Correct strictfp tests." This reverts commit d6857060a3b7428d1e9319d85fcef44e4b6b8db7. I'm getting build bot failures due to i128-fpconv-win64-strict.ll.	2023-07-26 09:18:32 -04:00
Kevin P. Neal	f57fb82e0f	[FPEnv][AArch64] Correct strictfp tests. Correct AArch64 strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics Mostly these tests just needed the strictfp attribute on function definitions. I've also removed the strictfp attribute from uses of the constrained intrinsics because it comes by default since D154991, but I only did this in tests I was changing anyway. I have removed attributes added to declare lines of intrinsics. The attributes of intrinsics cannot be changed in a test so I eliminated attempts to do so. Test changes verified with D146845.	2023-07-26 09:14:25 -04:00
Kevin P. Neal	7e0e8b7ace	[FPEnv][PowerPC] Correct strictfp tests. Correct PowerPC strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics Mostly these tests just needed the strictfp attribute on function definitions. I've also removed the strictfp attribute from uses of the constrained intrinsics because it comes by default since D154991, but I only did this in tests I was changing anyway. I have removed attributes added to declare lines of intrinsics. The attributes of intrinsics cannot be changed in a test so I eliminated attempts to do so. Test changes verified with D146845.	2023-07-26 09:12:29 -04:00
Kevin P. Neal	5ad2760ad9	[FPEnv][RISC-V] Correct a strictfp test. Correct a RISC-V strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics All function calls in a strictfp function require the strictfp attribute. Test changes verified with D146845.	2023-07-26 09:10:19 -04:00
Kevin P. Neal	58ad5699e7	[FPEnv][SystemZ] Correct strictfp tests. Correct a SystemZ strictfp test to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics This test, like many others, just needed the function definition corrected. Test changes verified with D146845.	2023-07-26 09:08:46 -04:00
Kevin P. Neal	d6857060a3	[FPEnv][X86] Correct strictfp tests. Correct X86 strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics Mostly these tests just needed the strictfp attribute on function definitions. After D154991 the constrained intrinsics have the strictfp attribute by default so they don't need it here, but other functions do. Test changes verified with D146845.	2023-07-26 09:07:03 -04:00
pvanhout	a8aabba587	[AMDGPU] Fix PromoteAlloca Subvector Stores for Single Elements The previous condition was incorrect in some cases, like storing <2 x i32> into a double. If IndexVal was >0, we ended up never storing anything. Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D156308	2023-07-26 13:21:21 +02:00
pvanhout	6a767fbc36	[AMDGPU] Precommit tests for D156308 Also includes another testcase that's unrelated, it's just a sanity check. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156309	2023-07-26 13:21:20 +02:00
Jolanta Jensen	c67e443895	[AArch64][NFC] Expand coverage of ReplaceWithVeclib testing using SLEEF vector library This patch expands testing coverage for ReplaceWithVeclib pass when SLEEF vector library is used. It adds testing for all LLVM intrinsics which correspond to math functions from libm. llrint, llround and lrint are not included as currently IR verifier pass does not allow to use vector types with them. Differential Revision: https://reviews.llvm.org/D155623	2023-07-26 08:57:56 +00:00
Jim Lin	2f726c22ce	[RISCV] Merge rv32/rv64 vector slideup and slidedown intrinsic tests that have the same content. NFC.	2023-07-26 13:13:55 +08:00
esmeyi	e83b8a5e71	[XCOFF] Enable available_externally linkage for functions. Summary: D80642 added support for emitting AvailableExternally Linkage on AIX. However, an assertion of "Trying to get csect representation of this symbol but none was set." occurred when a function is declared as available_externally. This is due to we missing to generate a csect for the function. This patch fixes it. Reviewed By: hubert.reinterpretcast, shchenz Differential Revision: https://reviews.llvm.org/D156213 Signed-off-by: Esme Yi <esme.yi@ibm.com>	2023-07-25 22:47:11 -04:00
Weining Lu	c56514f21b	Reland "[LoongArch] Support -march=native and -mtune=" As described in [1][2], `-mtune=` is used to select the type of target microarchitecture, defaults to the value of `-march`. The set of possible values should be a superset of `-march` values. Currently possible values of `-march=` and `-mtune=` are `native`, `loongarch64` and `la464`. D136146 has supported `-march={loongarch64,la464}` and this patch adds support for `-march=native` and `-mtune=`. A new ProcessorModel called `loongarch64` is defined in LoongArch.td to support `-mtune=loongarch64`. `llvm::sys::getHostCPUName()` returns `generic` on unknown or future LoongArch CPUs, e.g. the not yet added `la664`, leading to `llvm::LoongArch::isValidArchName()` failing to parse the arch name. In this case, use `loongarch64` as the default arch name for 64-bit CPUs. And these two preprocessor macros are defined: - __loongarch_arch - __loongarch_tune [1]: https://github.com/loongson/LoongArch-Documentation/blob/2023.04.20/docs/LoongArch-toolchain-conventions-EN.adoc [2]: https://github.com/loongson/la-softdev-convention/blob/v0.1/la-softdev-convention.adoc Reviewed By: xen0n, wangleiat Differential Revision: https://reviews.llvm.org/D155824	2023-07-26 10:26:38 +08:00
Qi Hu	ddd7d35c6c	[RegAlloc] Fix assertion failure caused by inline assembly When inline assembly code requests more registers than available, the MachineInstr::emitError function in the RegAllocFast pass emits an error but doesn't stop the pass, and then the compiler crashes later with an assertion failure. This commit, mimicking the RegAllocGreedy pass, assigns a random physical register, and therefore avoids the crash after producing the diagnostic. This problem has been observed for both rustc and clang, while it doesn't occur in gcc.	2023-07-25 19:21:03 -04:00
Corbin Robeck	7a4968b5a3	[AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output In code object 5 (https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata) the AMDGPU backend added the .uses_dynamic_stack bit to the kernel meta data to identity kernels which have compile time indeterminable stack usage (indirect function calls and recursion mainly). This patch adds this information to the output of the kernel-resource-usage remarks. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156040 Author: Corbin Robeck <corbin.robeck@amd.com>	2023-07-25 12:20:13 -07:00
Kevin P. Neal	76c22b18ea	[FPEnv][AMDGPU] Correct strictfp tests. Correct AMDGPU strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics Mostly these tests just needed the strictfp attribute on function definitions. I've also removed the strictfp attribute from uses of the constrained intrinsics because it comes by default since D154991, but I only did this in tests I was changing anyway. I also removed attributes added to declare lines of intrinsics. The attributes of intrinsics cannot be changed in a test so I eliminated attempts to do so. Test changes verified with D146845.	2023-07-25 13:24:46 -04:00
Craig Topper	f6dc75cdd8	[RISCV] Add DAG combine to pull xor with 1 through select idiom that uses czero_eqz/nez. If we are selecting between two setccs that need to be legalized with xor, the select will be legalized first. Detect this pattern so we can pull the xor through to expose it to additional optimizations. We could generalize this to other operations, but those normally get handled in DAG combine before select legalization. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D156159	2023-07-25 09:13:24 -07:00
Craig Topper	b34a8b3a52	[RISCV] Generalize combineAddOfBooleanXor to support any boolean not just setcc. Instead of checking for setcc, look for any 0/1 value. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D156153	2023-07-25 09:04:49 -07:00
Craig Topper	5ff5dac852	[RISCV] Add simple DAG combine to pull xor with 1 through select_cc. If we're selecting the result of two setccs that have been legalized by introducing an xor with 1, we can pull the xor with 1 through the select to enable more optimizations. We could generalize this to other binary operators with identical conditions, but those are usually caught before we legalize the select. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D156144	2023-07-25 09:03:45 -07:00
Weining Lu	212d6aa0da	Revert "[LoongArch] Support -march=native and -mtune=" This reverts commit 92c06114b2ea9900a3364fb395988dfb065758f7.	2023-07-25 23:32:15 +08:00
Nikita Popov	0cab8d2041	Reapply [IR] Mark and/or constant expressions as undesirable This reapplies the change for and, but also marks or as undesirable at the same time. Only handling one of them can cause infinite combine loops due to the asymmetric handling. ----- In preparation for removing support for and/or expressions, mark them as undesirable. As such, we will no longer implicitly create such expressions, but they still exist.	2023-07-25 15:31:45 +02:00
Weining Lu	92c06114b2	[LoongArch] Support -march=native and -mtune= As described in [1][2], `-mtune=` is used to select the type of target microarchitecture, defaults to the value of `-march`. The set of possible values should be a superset of `-march` values. Currently possible values of `-march=` and `-mtune=` are `native`, `loongarch64` and `la464`. D136146 has supported `-march={loongarch64,la464}` and this patch adds support for `-march=native` and `-mtune=`. A new ProcessorModel called `loongarch64` is defined in LoongArch.td to support `-mtune=loongarch64`. `llvm::sys::getHostCPUName()` returns `generic` on unknown or future LoongArch CPUs, e.g. the not yet added `la664`, leading to `llvm::LoongArch::isValidArchName()` failing to parse the arch name. In this case, use `loongarch64` as the default arch name for 64-bit CPUs. And these two preprocessor macros are defined: - __loongarch_arch - __loongarch_tune [1]: https://github.com/loongson/LoongArch-Documentation/blob/2023.04.20/docs/LoongArch-toolchain-conventions-EN.adoc [2]: https://github.com/loongson/la-softdev-convention/blob/v0.1/la-softdev-convention.adoc Differential Revision: https://reviews.llvm.org/D155824	2023-07-25 21:01:51 +08:00
Matt Arsenault	e3fd8f83a8	AMDGPU: Correctly expand f64 sqrt intrinsic rocm-device-libs and llpc were avoiding using f64 sqrt intrinsics in favor of their own expansions. Port the expansion into the backend. Both of these users should be updated to call the intrinsic instead. The library and llpc expansions are slightly different. llpc uses an ldexp to do the scale; the library uses a multiply. Use ldexp to do the scale instead of the multiply. I believe v_ldexp_f64 and v_mul_f64 are always the same number of cycles, but it's cheaper to materialize the 32-bit integer constant than the 64-bit double constant. The libraries have another fast version of sqrt which will be handled separately. I am tempted to do this in an IR expansion instead. In the IR we could take advantage of computeKnownFPClass to avoid the 0-or-inf argument check.	2023-07-25 07:54:11 -04:00
Matt Arsenault	47b3ada432	AMDGPU: Add more sqrt f64 lowering tests Almost all permutations of the flags are potentially relevant.	2023-07-25 07:54:11 -04:00
Paul Walker	74445d652d	[SVE] Add vselect(mla/mls) patterns for cases where a multiplicand is used for the false lanes. Differential Revision: https://reviews.llvm.org/D155972	2023-07-25 10:02:32 +00:00
Jim Lin	4eff7fae60	[RISCV] Merge rv32/rv64 vector narrowing integer right shift intrinsic tests that have the same content. NFC.	2023-07-25 16:09:21 +08:00
LiaoChunyu	620e61c518	[RISCV] Match ext_vl+sra_vl/srl_vl+trunc_vector_vl to vnsra.wv/vnsrl.wv similar to D117454, try to add vl patterns and testcases. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155466	2023-07-25 14:21:02 +08:00
Freddy Ye	6d23a3faa4	[X86] Support -march=graniterapids-d and update -march=graniterapids Reviewed By: pengfei, RKSimon, skan Differential Revision: https://reviews.llvm.org/D155798	2023-07-25 13:48:31 +08:00

1 2 3 4 5 ...

49231 Commits