llvm-project

Author	SHA1	Message	Date
Craig Topper	13fe673301	[RISCV] Move NTLH hint emission into RISCVAsmPrinter.cpp. Rather than having a separate pass to add the hint instructions, emit them directly into the streamer during asm printing. Reviewed By: BeMg, kito-cheng Differential Revision: https://reviews.llvm.org/D149511	2023-05-01 12:05:18 -07:00
Craig Topper	e56c6f3a8c	[RISCV] Prevent lowerVectorStrictFSetcc from creatin an ISD::AND with identical operands. This AND immediately gets legalized to RISCVISD::VMAND_VL and we don't yet have DAG combine to optimize that away. So this is a quick fix to improve generated code.	2023-04-29 21:42:45 -07:00
Ian Douglas Scott	34b37c00ab	[M68k] Add instruction selection support for zext with PCD addressing Instruction selection was failing when trying to zero extend a value loaded from a PC-relative address. This adds support for zero extension using the "program counter indirect with displacement" addressing mode. It also adds a test with code that was previously failing to compile. This fixes a compile error in Rust's libcore. Differential Revision: https://reviews.llvm.org/D149034	2023-04-29 16:27:16 -07:00
David Green	f1961153c2	[ARM] Add predicated shift patterns This uses the patterns defined in MVE_TwoOpPattern to add predicated patterns for vshls/u instructions. Differnetial Revision: https://reviews.llvm.org/D149366	2023-04-29 20:32:54 +01:00
Craig Topper	df017ba9d3	[TargetLowering] Don't use ISD::SELECT_CC in expandFP_TO_INT_SAT. This function gets called for vectors and ISD::SELECT_CC was never intended to support vectors. Some updates were made to support it when this function started getting used for vectors. Overall, using separate ISD::SETCC and ISD::SELECT looks like an improvement even for scalar. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D149481	2023-04-29 10:23:08 -07:00
Joseph Huber	a1da746157	[AMDGPU] Place global constructors in .init_array and .fini_array For the GPU, we emit external kernels that call the initializers and constructors, however if we had a persistent kernel like in the `_start` kernel for the `libc` project, we could initialize the standard way of calling constructors. This patch adds new global variables containing pointers to the constructors to be called. If these are placed in the `.init_array` and `.fini_array` sections, then the backend will handle them specially. The linker will then provide the `__init_array_` and `__fini_array_` sections to traverse them. An implementation would look like this. ``` extern uintptr_t __init_array_start[]; extern uintptr_t __init_array_end[]; extern uintptr_t __fini_array_start[]; extern uintptr_t __fini_array_end[]; using InitCallback = void(int, char , char ); using FiniCallback = void(void); extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void _start(int argc, char argv, char envp) { uint64_t init_array_size = __init_array_end - __init_array_start; for (uint64_t i = 0; i < init_array_size; ++i) reinterpret_cast<InitCallback >(__init_array_start[i])(argc, argv, env); uint64_t fini_array_size = __fini_array_end - __fini_array_start; for (uint64_t i = 0; i < fini_array_size; ++i) reinterpret_cast<FiniCallback >(__fini_array_start[i])(); } ``` Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D149340	2023-04-29 08:40:19 -05:00
Matt Arsenault	bc37be1855	LangRef: Add "dynamic" option to "denormal-fp-math" This is stricter than the default "ieee", and should probably be the default. This patch leaves the default alone. I can change this in a future patch. There are non-reversible transforms I would like to perform which are legal under IEEE denormal handling, but illegal with flushing zero behavior. Namely, conversions between llvm.is.fpclass and fcmp with zeroes. Under "ieee" handling, it is legal to translate between llvm.is.fpclass(x, fcZero) and fcmp x, 0. Under "preserve-sign" handling, it is legal to translate between llvm.is.fpclass(x, fcSubnormal\|fcZero) and fcmp x, 0. I would like to compile and distribute some math library functions in a mode where it's callable from code with and without denormals enabled, which requires not changing the compares with denormals or zeroes. If an IEEE function transforms an llvm.is.fpclass call into an fcmp 0, it is no longer possible to call the function from code with denormals enabled, or write an optimization to move the function into a denormal flushing mode. For the original function, if x was a denormal, the class would evaluate to false. If the function compiled with denormal handling was converted to or called from a preserve-sign function, the fcmp now evaluates to true. This could also be of use for strictfp handling, where code may be changing the denormal mode. Alternative name could be "unknown". Replaces the old AMDGPU custom inlining logic with more conservative logic which tries to permit inlining for callees with dynamic handling and avoids inlining other mismatched modes.	2023-04-29 08:44:59 -04:00
Luo, Yuanke	40222ddcf8	[X86] Fix the vnni machine combine issue. The previous patch (D148980) didn't set the InstrIdxForVirtReg correctly in genAlternativeDpCodeSequence(). It causes vnni lit test failure when LLVM_ENABLE_EXPENSIVE_CHECKS is on.	2023-04-29 13:51:08 +08:00
Craig Topper	578413751c	[RISCV] Add a DAG combine to fold (add (xor (setcc X, Y), 1) -1)->(neg (setcc X, Y)).	2023-04-28 16:52:55 -07:00
Philip Reames	d636bcb6ae	[RISCV] Introduce unaligned-vector-mem feature This allows us to model and thus test transforms which are legal only when a vector load with less than element alignment are supported. This was originally part of D126085, but was split out as we didn't have a good example of such a transform. As can be seen in the test diffs, we have the recently added concat_vector(loads) -> strided_load transform (from D147713) which now benefits from the unaligned support. While making this change, I realized that we actually do support unaligned vector loads and stores of all types via conversion to i8 element type. For contiguous loads and stores without masking, we actually already implement this in the backend - though we don't tell the optimizer that. For indexed, lowering to i8 requires complicated addressing. For indexed and segmented, we'd have to use indexed. All around, doesn't seem worthwhile pursuing, but makes for an interesting observation. Differential Revision: https://reviews.llvm.org/D149375	2023-04-28 08:28:08 -07:00
David Green	d321f3aa64	[ARM] Enable shouldFoldSelectWithIdentityConstant for MVE We already have tablegen patterns for a lot of these, but performing the combine earlier in DAG can help in a few extra cases. Differential Revision: https://reviews.llvm.org/D149269	2023-04-28 14:57:51 +01:00
Jay Foad	56af0e913c	[EarlyCSE] Do not CSE convergent calls in different basic blocks "convergent" is documented as meaning that the call cannot be made control-dependent on more values, but in practice we also require that it cannot be made control-dependent on fewer values, e.g. it cannot be hoisted out of the body of an "if" statement. In code like this, if we allow CSE to combine the two calls: x = convergent_call(); if (cond) { y = convergent_call(); use y; } then we get this: x = convergent_call(); if (cond) { use x; } This is conceptually equivalent to moving the second call out of the body of the "if", up to the location of the first call, so it should be disallowed. Differential Revision: https://reviews.llvm.org/D149348	2023-04-28 14:50:48 +01:00
Jay Foad	5534d1d834	[CSE] Precommit an AMDGPU test case for D149348 Differential Revision: https://reviews.llvm.org/D149349	2023-04-28 14:50:48 +01:00
Daniel Kiss	d75e70d7ae	[AArch64] Add preserve_all calling convention. Clang accepts preserve_all for AArch64 while it is missing form the backed. Fixes #58145 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D135652	2023-04-28 14:55:38 +02:00
David Green	5ff493df29	[ARM] Update and regenerate pred-selectop test. NFC Shift and fdiv tests have been added to show the reverse transform.	2023-04-28 13:47:14 +01:00
Nikita Popov	0659000ff7	[LICM] Don't duplicate instructions just because they're free D37076 makes LICM duplicate instructions into exit blocks if the instruction is free. For GEPs, the motivation appears to be that this allows the GEP to be folded into addressing modes, while non-foldable users outside the loop might prevent this. TBH I don't think LICM is the place to do this (why doesn't CGP apply this heuristic itself?) but at least I understand the motivation. However, the transform is also applied to all other "free" instructions, which are just that (removed during lowering and not "folded" in some way). For such instructions, this transform seems somewhere between useless, counter-productive (undoing CSE/GVN) and actively incorrect. For example, this transform can duplicate freeze instructions, which is illegal. This patch limits the transform to just foldable GEPs, though we might want to drop it from LICM entirely as a followup. This is a small compile-time improvement, because querying TTI cost model for every single instruction is expensive. Differential Revision: https://reviews.llvm.org/D149136	2023-04-28 14:31:23 +02:00
Luke Lau	32dbe0f5c0	[RISCV] Fix labels in fixed-vectors-fp test	2023-04-28 12:01:46 +01:00
Lawrence Benson	cd68e17bc2	[AArch64] Add support for efficient bitcast in vector truncate store. Following the changes in D145301, we now also support the efficient bitcast when storing the bool vector. Previously, this was expanded. Differential Revision: https://reviews.llvm.org/D148316	2023-04-28 11:19:45 +01:00
Alexis Engelke	ab21beaccc	[AArch64][FastISel] Handle CRC32 intrinsics With a similar reason as D148023; some applications make heavy use of the CRC32 intrinsic (e.g., as part of a hash function) and therefore benefit from avoiding frequent SelectionDAG fallbacks. In our application, we get a 2% compile-time improvement. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D148917	2023-04-28 11:29:23 +02:00
Luke Lau	bd6fa8656a	[RISCV] Add tests for illegal fixed length vectors that need widened Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D148518	2023-04-28 10:19:01 +01:00
Enna1	d961f66b28	[hwasan] fix false positive when hwasan-match-all-tag flag is enabled and short granules are used When hwasan-match-all-tag flag is enabled and short granules are used, at the point checking if this is a short tag case, the tag from pointer is stored in X16 register, which breaks the assumption that tag from shadow memory is stored in X16 register, this will cause a false positive. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D149252	2023-04-28 17:00:26 +08:00
Enna1	9baa85271d	[hwasan][test] add test for hwasan-check-memaccess when hwasan-match-all-tag flag and short granules both used Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D149399	2023-04-28 16:57:31 +08:00
Jordan Rupprecht	fbf42f1fe2	Revert "[CodeGenPrepare] Estimate liveness of loop invariants when checking for address folding profitability" This reverts commit 5344d8e10bb7d8672d4bfae8adb010465470d51b. It causes non-determinism when building clang. See the review thread on D143897.	2023-04-27 19:16:32 -07:00
Jeffrey Byrnes	7f0a881e6c	[AMDGPU] Track liveins for max-ilp-sched-strategy Even if optimizing for ILP, it is still useful to track RP to avoid spilling. Given that, we need to maintin consistent liveness state with the RP tracker. This patch makes RP tracking consistent by updating for liveins. Otherwise, we should completely eliminate RP tracking for this scheduler (checkScheduling, initCandidate). Differential Revision: https://reviews.llvm.org/D149358	2023-04-27 16:45:45 -07:00
Nick Desaulniers	012ea747ed	[CodeGen][MachineLastInstrsCleanup] fix INLINEASM_BR hazard If the removable definition resides in an INLINEASM_BR target, the reuseable candidate might not dominate the INLINEASM_BR. bb0: INLINEASM_BR &"" %bb.1 renamable $x8 = MOVi64imm 29273397577910035 B %bb.2 ... bb1: renamable $x8 = MOVi64imm 29273397577910035 renamable $x8 = ADDXri killed renamable $x8, 2048, 0 bb2: Removing the second mov is a hazard when the inline asm branches to bb1. Skip such replacements when the to be removed instruction is in the target of such an INLINEASM_BR instruction. We could get more aggressive about this in the future, but for now simply abort. This is causing a boot failure on linux-4.19.y branches of the LTS Linux kernel for ARCH=arm64 with CONFIG_RANDOMIZE_BASE=y (KASLR) and CONFIG_UNMAP_KERNEL_AT_EL0=y (KPTI). Link: https://reviews.llvm.org/D123394 Link: https://github.com/ClangBuiltLinux/linux/issues/1837 Thanks to @nathanchance for the report, and @ardb for debugging. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D149191	2023-04-27 13:40:00 -07:00
Nick Desaulniers	095a0c67bb	[CodeGen] precommit machine-latecleanup test Demonstrates a hazard in machine-latecleanup. Differential Revision: https://reviews.llvm.org/D149190	2023-04-27 13:39:48 -07:00
Changpeng Fang	1ab8b9ae15	AMDGPU: Define sub-class of SGPR_64 for tail call return Summary: Registers for tail call return should not be clobbered by callee. So we need a sub-class of SGPR_64 (excluding callee saved registers (CSR)) to hold the tail call return address. Because GFX and C calling conventions have different CSR, we need to define the sub-class separately. This work is an extension of D147096 with the consideration of GFX calling convention. Based on the calling conventions, different instructions will be selected with different sub-class of SGPR_64 as the input. Reviewers: arsenm, cdevadas and sebastian-ne Differential Revision: https://reviews.llvm.org/D148824	2023-04-27 10:45:11 -07:00
David Green	4249d609ac	[AArch64] Regenerate trunc-to-tbl and zext-to-tbl tests. NFC The -mattr=+global-isel is not valid syntax, so those lines have been removed. With Global-ISel there is currently missing vector legalization for wide G_EXT, and it does not support BE.	2023-04-27 17:21:13 +01:00
skc7	e016fb57b3	[AMDGPU] Legalize soffset of buffer instructions. Use Waterfall loop logic. Legalize soffset of buffer instructions using waterfall loop. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D141030	2023-04-27 19:36:50 +05:30
Jingu Kang	044f27f62f	[AArch64] Precommit tests for VECTOR_SHUFFLE	2023-04-27 14:44:09 +01:00
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00
Alexis Engelke	7751a91465	[AArch64][FastISel] Handle call with multiple return regs The code closely follows the X86 back-end. Applications that make heavy use of {i64, i64} returns to use two registers strongly benefit from the reduced number of SelectionDAG fallbacks. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D148346	2023-04-27 11:59:33 +02:00
Luo, Yuanke	8f7f9d86a7	[X86] Machine combine vnni instruction. "vpmaddwd + vpaddd" can be combined to vpdpwssd and the latency is reduced after combination. However when vpdpwssd is in a critical path the combination get less ILP. It happens when vpdpwssd is in a loop, the vpmaddwd can be executed in parallel in multi-iterations while vpdpwssd has data dependency for each iterations. If vpaddd is in a critical path while vpmaddwd is not, it is profitable to split vpdpwssd into "vpmaddwd + vpaddd". This patch is based on the machine combiner framework to acheive decision on "vpmaddwd + vpaddd" combination. The typical example code is as below. ``` __m256i foo(int cnt, __m256i c, __m256i b, __m256i *p) { for (int i = 0; i < cnt; ++i) { __m256i a = p[i]; __m256i m = _mm256_madd_epi16 (b, a); c = _mm256_add_epi32(m, c); } return c; } ``` Differential Revision: https://reviews.llvm.org/D148980	2023-04-27 16:42:04 +08:00
Jay Foad	47d3cbcf84	[BranchFolder] Skip redundant IMPLICIT_DEFs of subregs Differential Revision: https://reviews.llvm.org/D148509	2023-04-27 09:40:06 +01:00
Jay Foad	12b70ad68c	[BranchFolder] Precommit AMDGPU test case for D148509	2023-04-27 09:40:06 +01:00
Nicolai Hähnle	1e63f8272e	AMDGPU: Fix an assertion in SIOptimizeVGPRLiveRange As the comment notes, the shader results in an INSERT_SUBREG with "undef" (dead) operand in the Endif block. The same can happen with REG_SEQUENCE. The register is considered dead from a liveness analysis perspective. The correct thing to do seems to be nothing: we keep the undef use of the register, the register allocator should still be able to take the liveness into account correctly. Differential Revision: https://reviews.llvm.org/D149161	2023-04-27 09:39:44 +02:00
Noah Goldstein	ddfee6d0b6	[X86] Support `X86ISD::PCMPEQ` and `X86ISD::PCMPGT` in ComputeKnownBits These functions where missing support but are used enough that it makes sense to track them. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D148963	2023-04-26 23:48:24 -05:00
Yeting Kuo	1855c0a82a	[RISCV] Support vector strict rounding operations. The patch basically models custom lowering of base rounding operations to expand rounding by coverting to ingter and coverting back to FP. The other one thing the patch does is to covert sNan of the source to qNan. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D148519	2023-04-27 11:35:34 +08:00
Craig Topper	5854b39847	[RISCV] Remove the uret instruction. This was part of the N extension which did not make it into version 1.12 of the privilege specification. Reviewed By: jrtc27 Differential Revision: https://reviews.llvm.org/D149308	2023-04-26 17:11:58 -07:00
Matt Arsenault	5b7fa4a48d	VE: Register null MCTargetStreamer	2023-04-26 19:27:11 -04:00
Brad Smith	c30c291887	[SPARC] Lower BR_CC to BPr on 64-bit target whenever possible On 64-bit target, when doing i64 BR_CC where one of the comparison operands is a constant zero, try to fold the compare and BPcc into a BPr instruction. For all integers, EQ and NE comparison are available, additionally for signed integers, GT, GE, LT, and LE is also available. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D142461	2023-04-26 18:56:00 -04:00
Matt Arsenault	bbc7b30fbf	AMDGPU: Remove invalid testcase for enqueue kernel The call didn't have the right calling convention, but calls to kernels are supposed to be illegal anyway.	2023-04-26 17:25:30 -04:00
David Green	d340ef697d	[AArch64][SVE] Generate smull/umull instead of sve v2i64 mul A neon smull/umull should be preferred over a sve v2i64 mul with two extends. It will be both less instructions and a lower cost multiply instruction. Differential Revision: https://reviews.llvm.org/D148248	2023-04-26 22:12:00 +01:00
Craig Topper	3ce3ee6169	[RISCV] Make Zicntr and Zihpm imply Zicsr. Zicntr and Zihpm are names for groups of CSRs so they should imply that CSRs exist. Reviewed By: asb, kito-cheng Differential Revision: https://reviews.llvm.org/D148962	2023-04-26 10:11:14 -07:00
Craig Topper	236898f619	[RISCV] Accept zicntr and zihpm command line options This change adds the definition of the two extensions, but does not either a) make any register definitions conditional on them or b) enabled the extensions by default. This is somewhat analogous to https://reviews.llvm.org/D143953, but with some key differences. The best discussion I can find on status is here: https://github.com/riscv/riscv-profiles/issues/43. These were removed between document version 2.1 and 2.2, but were not defined as new extensions in 2.2. That addition came later - in March 2022. According to https://drive.google.com/file/d/1qa57pePesOiDOrNzxuuGFhCL4Rbi9AYB/view these were ratified in March 2023. Reviewed By: asb, reames Differential Revision: https://reviews.llvm.org/D144215	2023-04-26 10:11:07 -07:00
Mingming Liu	9879e5865a	[InlineAsm][AArch64]Add backend support for flag output parameters - The set of flag is from https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Flag-Output-Operands Before: - ARM64 GCC supports flag output constraints, while Clang doesn't parse condition code, as shown in https://gcc.godbolt.org/z/7jzMEK796 - LLVM ISel won't lower them either (as shown in https://gcc.godbolt.org/z/Pv4PPf56c) After: - Given flag output constraints in LLVM IR, condition code is parsed and flag output is lowered to 'cset'. - Clang parse is not added in this patch. Differential Revision: https://reviews.llvm.org/D149032	2023-04-26 09:18:41 -07:00
Jay Foad	22516593ae	[AMDGPU] Add GFX11 ds_min_f32 / ds_max_f32 tests	2023-04-26 17:09:12 +01:00
Paul Kirth	bface3947e	[RISCV] Make SCS prologue interrupt safe on RISC-V Prior to this patch the SCS prologue used the following instruction sequence. ``` s[w\|d] ra, 0(gp) addi gp, gp, [4\|8] ``` The problem with this sequence is that an interrupt occurring between the store and the increment could clobber the value just written to the SCS. https://reviews.llvm.org/D84414#inline-813203 pointed out a similar issues that could have affected the epilogue. This patch changes the instruction sequence in the prologue to: ``` addi gp, gp, [4\|8] s[w\|d] ra, -[4\|8](gp) ``` The downside to this is that there is now a data dependency between the add and the store. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D149099	2023-04-26 15:58:09 +00:00
Joe Nash	f8ec7a0944	[AMDGPU] Delete test for illegal v_cndmask_b16_dpp There are no VOP2 or VOP2 with dpp forms of v_cndmask_b16. Delete the test. NFC. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D149184	2023-04-26 09:50:44 -04:00
Janek van Oirschot	124acb7ca3	[AMDGPU] Fix negative offset values interpretation in getMemOperandsWithOffset for DS The offset values may result in an erroneous scheduling of a load before write for a memory location if the offset values are represented as negative values in MIR, despite actually being unsigned values. This representation in MIR happens as SelectionDAG::getConstant could go through APInt to represent the encoding which assumes the MSB of the encoding as a sign-bit, regardless of whether it is supposed to be a signed value. The 8-bit negative (interpreted) value gets cast to an unsigned 32 bit value in getMemOperandsWithOffset used for comparisons in areMemAccessesTriviallyDisjoint eventually leading to an erroneous schedule in the machine scheduler. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D149080	2023-04-26 14:10:25 +01:00

1 2 3 4 5 ...

47896 Commits