llvm-project

Author	SHA1	Message	Date
Guillaume Chatelet	12ccdd67aa	[NFC] Use proper getSliceAlign type in SROA	2022-06-10 12:37:41 +00:00
Sanjay Patel	6fedc6a2b4	Revert "[InstCombine] add narrowing transform for low-masked binop with zext operand" This reverts commit afa192cfb6049a15c5542d132d500b910b802c74. This can cause an infinite loop as shown with an example in the post-commit thread.	2022-06-10 08:25:10 -04:00
Ivan Kosarev	60d6fbb621	[AMDGPU][GFX9][GFX10] Support base+soffset+offset SMEM atomics. Resolves a part of https://github.com/llvm/llvm-project/issues/38652 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D127314	2022-06-10 13:22:41 +01:00
David Sherwood	8daaea206b	[InstCombine] Use +0.0 instead of -0.0 as the FP identity for some folds In foldSelectIntoOp we sometimes transform a select of a fadd into a fadd of a select, where we select between data and an identity value. For both fadd and fsub the identity is always -0.0, but if the nsz flag is set on the select instruction we can use +0.0 instead. Doing so then triggers other optimisations, such as when folding the select of masked load into a new masked load. Differential Revision: https://reviews.llvm.org/D126774	2022-06-10 12:42:34 +01:00
Bin Cheng	8b360c69e9	[FuncSpec]Fix assertion failure when value is not added to solver This patch improves the fix in D110529 to prevent from crashing on value with byval attribute that is not added in SCCP solver. Authored-by: sinan.lin@linux.alibaba.com Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D126355	2022-06-10 18:45:53 +08:00
Dmitry Preobrazhensky	f8aba9995a	[AMDGPU][MC][GFX1013] Enable image_msaa_load Differential Revision: https://reviews.llvm.org/D127198	2022-06-10 13:42:05 +03:00
David Sherwood	007917b95c	[MVE] Fold fadd(select(..., +0.0)) into a predicated fadd We already have patterns for matching fadd(select(..., -0.0)), but an upcoming patch will lead to patterns using +0.0 as the identity instead of -0.0. I'm adding support for these patterns now to avoid any regressions for MVE. Differential Revision: https://reviews.llvm.org/D127275	2022-06-10 11:09:55 +01:00
Nikita Popov	d77f944832	[LoopInfo] Add getOutermostLoop() (NFC) This is a recurring pattern, add an API function for it.	2022-06-10 11:48:21 +02:00
David Green	4a5cb957a1	[AggressiveInstcombine] Conditionally fold saturated fptosi to llvm.fptosi.sat This adds a fold for aggressive instcombine that converts smin(smax(fptosi(x))) into a llvm.fptosi.sat, providing that the saturation constants are correct and the cost of the llvm.fptosi.sat is lower. Unfortunately, a llvm.fptosi.sat cannot always be converted back to a smin/smax/fptosi. The llvm.fptosi.sat intrinsic is more defined that the original, which produces poison if the original fptosi was out of range. The llvm.fptosi.sat will saturate any value, so needs to be expanded to a fptosi(fpmin(fpmax(x))), which can be worse for codegeneration depending on the target. So this change thais conditional on the backend reporting that the llvm.fptosi.sat is cheaper that the original smin+smax+fptost. This is a change to the way that AggressiveInstrcombine has worked in the past. Instead of just being a canonicalization pass, that canonicalization can be dependant on the target in certain specific cases. Differential Revision: https://reviews.llvm.org/D125755	2022-06-10 09:36:09 +01:00
Nikita Popov	c10921fa1a	[CGP] Also freeze ctlz/cttz operand when despeculating D125887 changed the ctlz/cttz despeculation transform to insert a freeze for the introduced branch on zero. While this does fix the "branch on poison" issue, we may still get in trouble if we pick a different value for the branch and for the ctz argument (i.e. non-zero for the branch, but zero for the ctz). To avoid this, we should use the same frozen value in both positions. This does cause a regression in RISCV codegen by introducing an additional sext. The DAG looks like this: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %3 t4: i64 = AssertSext t2, ValueType:ch:i32 t23: i64 = freeze t4 t9: ch = CopyToReg t0, Register:i64 %0, t23 t16: ch = CopyToReg t0, Register:i64 %4, Constant:i64<32> t18: ch = TokenFactor t9, t16 t25: i64 = sign_extend_inreg t23, ValueType:ch:i32 t24: i64 = setcc t25, Constant:i64<0>, seteq:ch t28: i64 = and t24, Constant:i64<1> t19: ch = brcond t18, t28, BasicBlock:ch<cond.end 0x8311f68> t21: ch = br t19, BasicBlock:ch<cond.false 0x8311e80> I don't see a really obvious way to improve this, as we can't push the freeze past the AssertSext (which may produce poison). Differential Revision: https://reviews.llvm.org/D126638	2022-06-10 09:46:10 +02:00
Jay Foad	6c372daa84	[AMDGPU] New GFX11 intrinsic llvm.amdgcn.s.sendmsg.rtn Add new intrinsic and codegen support for the s_sendmsg_rtn_b32 and s_sendmsg_rtn_b64 instructions. Differential Revision: https://reviews.llvm.org/D127315	2022-06-10 08:15:23 +01:00
Jay Foad	b0a3849439	[AMDGPU] Update dlc usage for GFX11 In GFX10 dlc controlled L1 cache bypass. In GFX11 it has been repurposed to control MALL NOALLOC, and glc controls L1 as well as L0 cache bypass. Update the documentation and SIMemoryLegalizer accordingly. Set dlc for nontemporal and volatile accesses. Differential Revision: https://reviews.llvm.org/D127405	2022-06-10 08:10:34 +01:00
Sunho Kim	6d67f7a329	[JITLink][EHFrameSupport] Remove CodeAlignmentFactor and DataAlignmentFactor validation. Removes CodeAlignmentFactor and DataAlignmentFactor validation in EHFrameEdgeFixer. I observed some of aarch64 elf files generated by clang contains CIE record with code_alignment_factor = 4 or data_alignment_factor = -8. code_alignment_factor and data_alignment_factor are used by call fram instruction that should be correctled handled by libunwind. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D127062	2022-06-10 15:29:20 +09:00
Yeting Kuo	f68cad9087	[RISCV] Lower VLEFF/VLSEGFF SDNodes to MachineInstrs with VL outputs. The patch is a replacement of D125199. PseudoReadVL with vtype has worry for computing same vtypes of VLEFF/VLSEGFF in two different places, DAGToDAG and InsertVSETVLI. VLEFF/VLSEGFF MI with VL output still could provide the vtype of VLEFF/VLSEGFF to the users of its VL. The patch names the new pseudo as original VLEFF/VLSEGFF name suffixed "_VL" and expand them in RISCVInsertVSETVLI pass. This patch also reverts commit 4537aae0d57e17c217c192d8977012ba475b130c, "[RISCV] Make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF.". Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126794	2022-06-10 13:57:10 +08:00
Peter S. Housel	2be5abb7e9	[ORC][ORC_RT] Handle ELF .init_array with non-default priority ELF-based platforms currently support defining multiple static initializer table sections with differing priorities, for example .init_array.0 or .init_array.100; the default .init_array corresponds to a priority of 65535. When building a shared library or executable, the system linker normally sorts these sections and combines them into a single .init_array section. This change adds the capability to recognize ELF static initializers with priorities other than the default, and to properly sort them by priority, to Orc and the Orc runtime. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D127056	2022-06-09 22:47:58 -07:00
Peter S. Housel	1aa71f8679	[ORC][ORC_RT] Integrate ORC platforms with LLJIT and lli This change enables integrating orc::LLJIT with the ORCv2 platforms (MachOPlatform and ELFNixPlatform) and the compiler-rt orc runtime. Changes include: - Adding SPS wrapper functions for the orc runtime's dlfcn emulation functions, allowing initialization and deinitialization to be invoked by LLJIT. - Changing the LLJIT code generation default to add UseInitArray so that .init_array constructors are generated for ELF platforms. - Integrating the ORCv2 Platforms into lli, and adding a PlatformSupport implementation to the LLJIT instance used by lli which implements initialization and deinitialization by calling the new wrapper functions in the runtime. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D126492	2022-06-09 22:47:58 -07:00
Sunho Kim	87c4268329	[JITLink][ELF][AArch64] Implement Procedure Linkage Table. Implements Procedure Linkage Table (PLT) for ELF/AARCH64. The aarch64 linux calling convention also uses r16 as the intra-procedure-call scratch register same as MachO/ARM64. We can use the same stub sequence for this reason. Also, BR regiseter doesn't touch X30 register. External function call by BL instruction (touched by CALL26 relocation) will set X30 to the original PC + 4, which is the intended behavior. External function call by B instruction (touched by JUMP26 relocation) doesn't requite to set X30, so the patch will be correct in this case too. Reference: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#611general-purpose-registers Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D127061	2022-06-10 14:44:33 +09:00
Sunho Kim	e093e42107	[ORC][AArch64] Add initial support for aarch64 in ELFNixPlatform. Adds the aarch64 support in ELFNixPlatform. These are few simple changes, but it allows us to use the orc runtime in ELF/AARCH64 backend. It succesfully run the static initializers of stdlibc++ iostream so that "cout << Hello world" testcase starts to work. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D127060	2022-06-10 13:37:36 +09:00
Sunho Kim	175f22d6c3	[JITLink][ELF][AArch64] Implement R_AARCH64_JUMP26 Implements R_AARCH64_JUMP26. We can use the same generic aarch64 Branch26 edge since B instruction and BL nstruction have the same sized&offseted immediate field, and the relocation address calculation is the same. Reference: ELF for the ARM ® 64-bit Architecture Tabel 4-10, ARM Architecture Reference Manual ® ARMv8, for ARMv8-A architecture profile C6.2.24, C6.2.31 Reviewed By: sgraenitz Differential Revision: https://reviews.llvm.org/D127059	2022-06-10 11:35:42 +09:00
chenglin.bi	de7a6ae1ff	[InstCombine] Optimize shl+lshr+and conversion pattern if `C1` and `C3` are pow2 and `Log2(C3)+C2 < BitWidth`: ((C1 << X) >> C2) & C3 -> X == (Log2(C3)+C2-Log2(C1)) ? C3 : 0; https://alive2.llvm.org/ce/z/Pus5bd Fix issue https://github.com/llvm/llvm-project/issues/55739 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D126617	2022-06-10 09:36:58 +08:00
Sunho Kim	51a41f23b6	[JITLink][AArch64] Fix overflow range of Page21 fixup edge. Allowed range for Page21 relocation is -2^32 <= X < 2^32 in both ELF and MachO. `09c2b7c35a/llvm/lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldMachOAArch64.h (L210)` (MachO) ELF for the ARM ® 64-bit Architecture (AArch64) Table 4-9 (ELF) Reviewed By: sgraenitz Differential Revision: https://reviews.llvm.org/D126387	2022-06-10 10:30:19 +09:00
Philip Reames	28be4b7454	[RISCV] Simplify InstrInfo access in doPeepholeMaskedRVV [nfc]	2022-06-09 17:02:40 -07:00
Craig Topper	8bbcb98848	[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates. For an addition with simm14 and simm15 immediates with 2 or 3 trailing bits, we can use a shXadd instruction and an addi to do the addition. This patch teaches RISCVMergeBaseOffset to see through this pattern. I don't think the sh1add case occurs because we use two addis for that, but I implemented it for completeness. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127376	2022-06-09 16:07:35 -07:00
Philip Reames	206f10d3f6	Plumb InstructionCost through unroll costing Teach the unroller(s) how to handle an invalid cost. This avoids crashes when the backend can't provide a cost due to either a fundemental limitation or an unimplemented cost model case. Differential Revision: https://reviews.llvm.org/D127305	2022-06-09 15:42:53 -07:00
Philip Reames	f85c5079b8	Pipe potentially invalid InstructionCost through CodeMetrics Per the documentation in Support/InstructionCost.h, the purpose of an invalid cost is so that clients can change behavior on impossible to cost inputs. CodeMetrics was instead asserting that invalid costs never occurred. On a target with an incomplete cost model - e.g. RISCV - this means that transformations would crash on (falsely) invalid constructs - e.g. scalable vectors. While we certainly should improve the cost model - and I plan to do so in the near future - we also shouldn't be crashing. This violates the explicitly stated purpose of an invalid InstructionCost. I updated all of the "easy" consumers where bailouts were locally obvious. I plan to follow up with loop unroll in a following change. Differential Revision: https://reviews.llvm.org/D127131	2022-06-09 15:17:24 -07:00
Sanjay Patel	afa192cfb6	[InstCombine] add narrowing transform for low-masked binop with zext operand https://alive2.llvm.org/ce/z/hRy3rE As shown in D123408, we can produce this pattern when moving cast around, and we already have a related fold for a binop with a constant operand.	2022-06-09 16:59:26 -04:00
Simon Pilgrim	7ac33b8aac	[X86] Remove !VT.is128BitVector() check. NFCI. The code is inside a if(VT.is256BitVector() \|\| VT.is512BitVector()) condition	2022-06-09 21:39:45 +01:00
Jay Foad	ffe86e3bdd	[AMDGPU] Update SIInsertHardClauses for GFX11 Changes for GFX11: - Clauses may not mix instructions of different types, and there are more types. For example image instructions with and without a sampler are now different types. - The max size of a clause is explicitly documented as 63 instructions. Previously it was implicitly assumed to be 64. This is such a tiny difference that it does not seem worth making it conditional on the subtarget. - It can be beneficial to clause stores as well as loads. Differential Revision: https://reviews.llvm.org/D127391	2022-06-09 21:29:56 +01:00
Simon Pilgrim	72a049d778	[X86][AVX2] LowerINSERT_VECTOR_ELT - support v4i64 insertion as BLENDI(X, SCALAR_TO_VECTOR(Y))	2022-06-09 21:18:10 +01:00
Pengxuan Zheng	064db24311	[Object][COFF] Fix section name parsing error when the name field is not null-padded Some object files produced by Mirosoft tools contain sections whose name field is not fully null-padded at the end. Microsoft's dumpbin is able to print the section name correctly, but this causes parsing errors with LLVM tools. So far, this issue only seems to happen when the section name is longer than 8 bytes. In this case, the section name field contains a slash (/) followed by the offset into the string table, but the name field is not fully null-padded at the end. Reviewed By: mstorsjo Differential Revision: https://reviews.llvm.org/D127369	2022-06-09 12:58:28 -07:00
Joe Nash	be1082c6d5	[AMDGPU] gfx11 VOPC instructions Supports encoding existing instrutions on gfx11 and MC support for the new VOPC dpp instructions. Patch 19/N for upstreaming of AMDGPU gfx11 architecture Depends on D126978 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126989	2022-06-09 15:22:42 -04:00
Tim Northover	b89bcefa62	Reapply: Add an error message to the default SIGPIPE handler UNIX03 conformance requires utilities to flush stdout before exiting and raise an error if writing fails. Flushing already happens on a call to exit and thus automatically on a return from main. Write failure is then detected by LLVM's default SIGPIPE handler. The handler already exits with a non-zero code, but conformance additionally requires an error message. First reapply attempt I hadn't noticed the test had changed, hopefully this goes better.	2022-06-09 20:13:45 +01:00
Stanislav Mekhanoshin	23db8e4b43	[AMDGPU] Use v_mad_u64_u32 for IMAD32 Nic Curtis done the experiments to prove it is faster than a separate mul and add. Fixes: SWDEV-332806 Differential Revision: https://reviews.llvm.org/D127253	2022-06-09 11:39:49 -07:00
Tim Northover	4badd4d40d	Revert "Add an error message to the default SIGPIPE handler" It broke PPC bots.	2022-06-09 19:01:28 +01:00
Stanislav Mekhanoshin	5c974d086c	[AMDGPU] Fix hazard handling of v_cmpx to permlane - VOP3 and SDWA forms of V_CMPX were not handled - Hazard only exists if the compare defines EXEC (i.e. V_CMPX) forwarded to the permlane. Differential Revision: https://reviews.llvm.org/D127344	2022-06-09 10:33:54 -07:00
Ahmed Bougacha	c68b469e07	[AArch64][SVE] Don't crash on pre-legalizer types in extload combine. This was assuming the vector types were MVTs, but they don't have to be. Note that the concrete output of the test isn't very useful, since it's dominated by nonsensical calling convention lowering for the weird types. Differential Revision: https://reviews.llvm.org/D126505	2022-06-09 10:33:21 -07:00
Kito Cheng	4b11f90903	[RISCV] Fix missing stack pointer recover In order to make sure the stack point is right through the EH region, we also need to restore stack pointer from the frame pointer if we don't preserve stack space within prologue/epilogue for outgoing variables, normally it's just checking the variable sized object is present or not is enough, but we also don't preserve that at prologue/epilogue when have vector objects in stack. Example to show what happened: ``` try { sp adjust for outgoing args. // 1. Sp changed. func_call // 2. Exception raised sp restore // Oh, not restored } catch { // 3. And now we are here. } // 4. Prepare to return!, restore return address from stack, but...sp is wrong. // 5. Screw up! ``` Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D126861	2022-06-09 23:38:50 +08:00
Johannes Doerfert	6555558a80	Revert "[Attributor] Replace AAValueSimplify with AAPotentialValues" This reverts commit da50dab1ae111e9e6cb0248a47a038b17f798705. Patch broke AMD GPU OpenMP offload buildbots. https://lab.llvm.org/buildbot/#/builders/193/builds/13246	2022-06-09 17:04:01 +02:00
Johannes Doerfert	da50dab1ae	[Attributor] Replace AAValueSimplify with AAPotentialValues For the longest time we used `AAValueSimplify` and `genericValueTraversal` to determine "potential values". This was problematic for many reasons: - We recomputed the result a lot as there was no caching for the 9 locations calling `genericValueTraversal`. - We added the idea of "intra" vs. "inter" procedural simplification only as an afterthought. `genericValueTraversal` did offer an option but `AAValueSimplify` did not. Thus, we might end up with "too much" simplification in certain situations and then gave up on it. - Because `genericValueTraversal` was not a real `AA` we ended up with problems like the infinite recursion bug (#54981) as well as code duplication. This patch introduces `AAPotentialValues` and replaces the `AAValueSimplify` uses with it. `genericValueTraversal` is folded into `AAPotentialValues` as are the instruction simplifications performed in `AAValueSimplify` before. We further distinguish "intra" and "inter" procedural simplification now. `AAValueSimplify` was not deleted as we haven't ported the re-materialization of instructions yet. There are other differences over the former handling, e.g., we may not fold trivially foldable instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2` but if an operand would be simplified to `i32 1` we would fold it still. We are also even more aware of function/SCC boundaries in CGSCC passes, which is good. Fixes: https://github.com/llvm/llvm-project/issues/54981	2022-06-09 16:48:53 +02:00
Johannes Doerfert	94841c713f	[Attributor] Try to delete stores and simplify stored values By default we should try to eliminate unused stores and simplify values stored while we are at it.	2022-06-09 16:48:53 +02:00
Johannes Doerfert	a3273c0c06	[Attributor] Ensure to use the proper liveness AA When determining liveness via Attributor::isAssumedDead(...) we might end up without a liveness AA or with one pointing into another function. Neither is helpful and we will avoid both from now on.	2022-06-09 16:48:53 +02:00
Philip Reames	0e29a80fdc	[RISCV] Add cost model for reverse shuffle The majority of the cost appears to be forming the indices vector. Differential Revision: https://reviews.llvm.org/D127141	2022-06-09 07:21:40 -07:00
Florian Hahn	20d798bd47	Recommit "[SCEV] Look through single value PHIs." (take 3) This reverts commit 1fbdbb559569641f6d509b569966901c8fb02b63. All known issues surfaced by this patch should have been fixed now. The fixes included fixing issues with SCEV expansion in LV and DA's reliance on LCSSA phis.	2022-06-09 15:20:10 +01:00
Simon Moll	b8c2781ff6	[NFC] format InstructionSimplify & lowerCaseFunctionNames Clang-format InstructionSimplify and convert all "FunctionName"s to "functionName". This patch does touch a lot of files but gets done with the cleanup of InstructionSimplify in one commit. This is the alternative to the less invasive clang-format only patch: D126783 Reviewed By: spatel, rengolin Differential Revision: https://reviews.llvm.org/D126889	2022-06-09 16:10:08 +02:00
Simon Pilgrim	7dbfcfa735	[DAG] combineInsertEltToShuffle - if EXTRACT_VECTOR_ELT fails to match an existing shuffle op, try to replace an undef op if there is one. This should fix a number of shuffle regressions in D127115 where the re-ordered combines mean we fail to fold a EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT sequence into a BUILD_VECTOR if we extract from more than one vector source.	2022-06-09 14:56:14 +01:00
Johannes Doerfert	ae10b8a582	[Attributor][FIX] Give registered simplification callbacks precedence We accidentally checked for constants before we looked for registered simplification callbacks. The latter needs to take precedence though.	2022-06-09 15:31:53 +02:00
Benjamin Kramer	0abb472fff	AMDGPU/GISel: Remove unused variable. NFC.	2022-06-09 13:43:47 +02:00
Johannes Doerfert	982053e85e	[Attributor][NFC] Improve debug code and comments	2022-06-09 13:41:23 +02:00
Johannes Doerfert	0ece283f03	[Attributor] Add checks needed as we strengthen value simplify	2022-06-09 13:41:23 +02:00
Johannes Doerfert	393be12b74	[Attributor] Look at base values for align, nonnull, and deref Stripping bitcasts and 0-geps helps normalization and minimizes the impact of a follow up change.	2022-06-09 13:41:23 +02:00

1 2 3 4 5 ...

158830 Commits