llvm-project

Author	SHA1	Message	Date
Sameer Sahasrabuddhe	bd7a4d7b27	Restore "[LLVM] move verification of convergence control to a class template"" The refactored template can now be used with MachineVerifier. Resubmitted after fixing build errors: - Shared libraries build failed with undefined references due to "extern template" declarations. - Modules build failed due to a cycle dependence between llvm/ADT and llvm/IR. The Generic*Impl.h files should be in llvm/IR to prevent this. Differential Revision: https://reviews.llvm.org/D156522 This restores commit 93a3706711fd46d4d487640d91b16c2eec747c9e. Originally reverted in 466bd9981150906552a1f2308e3c9065bfcb6741.	2023-08-03 10:36:57 +05:30
Philip Reames	660b740e4b	[DAG] Support store merging of vector constant stores Ran across this when making a change to RISCV memset lowering. Seems very odd that manually merging a store into a vector prevents it from being further merged. Differential Revision: https://reviews.llvm.org/D156349	2023-08-02 14:41:46 -07:00
Danila Kutenin	49d41de578	MachineSink: Fix strict weak ordering in GetAllSortedSuccessors CodeGen/X86/pseudo_cmov_lower2.ll fails using libc++ debug mode (D150264) without this change. Reviewed By: MaskRay, aeubanks Differential Revision: https://reviews.llvm.org/D155811	2023-08-02 12:52:55 -07:00
Jay Foad	8f973d5c45	[DebugInfo] Fix crash when printing malformed DBG machine instructions MachineVerifier does not check that DBG_VALUE, DBG_VALUE_LIST and DBG_INSTR_REF have the expected number of operands, so printing them (e.g. with -print-after-all) should not crash. Differential Revision: https://reviews.llvm.org/D156226	2023-08-02 08:28:20 +01:00
Jordan Rupprecht	f5b5a30858	Revert "[CodeGenPrepare][NFC] Update the dominator tree instead of rebuilding it" This reverts commit 0b1d1cdb89322c277baf5221218a830195fef9d4. It causes a clang crash. Details will be posted to D153638.	2023-08-01 23:08:55 -07:00
Jon Roelofs	ed83797f3c	[Intrinsics][ObjC] Mark objc_retain and friends as thisreturn. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-retain rdar://79869679 Differential revision: https://reviews.llvm.org/D105671	2023-08-01 18:02:00 -07:00
Momchil Velikov	0b1d1cdb89	[CodeGenPrepare][NFC] Update the dominator tree instead of rebuilding it Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D153638	2023-08-01 18:07:03 +01:00
Jay Foad	11fbdd27fd	[CodeGen] Make use of isSubRegisterEq and isSuperRegisterEq. NFC.	2023-08-01 14:46:26 +01:00
Francesco Petrogalli	cd921e0fd7	[MISched] Do not erase resource booking history for subunits. When dealing with the subunits of a resource group, we should reset the subunits availability at the first avaiable cycle of the resource that contains the subunits. Previously, the reset operation was returning cycle 0, effectively erasing the booking history of the subunits. Without this change, when using intervals for models have make use of subunits, the erasing of resource booking for subunits can raise the assertion "A resource is being overwritten" in `ResourceSegments::add`. The test added in the patch is one of such cases. Reviewed By: andreadb Differential Revision: https://reviews.llvm.org/D156530	2023-08-01 14:00:37 +02:00
Sameer Sahasrabuddhe	466bd99811	Revert "[LLVM] move verification of convergence control to a class template" This reverts commit 93a3706711fd46d4d487640d91b16c2eec747c9e. The "extern template" declaration of CycleInfo caused problems in a shared build when CycleInfo was removed from Verifier.cpp. There needs to be an explicit instantiation corresponding to an extern template in every SO.	2023-08-01 17:00:39 +05:30
Jay Foad	eaca8c2edf	[PEI][PowerPC] Switch to backwards frame index elimination This adds support for reprocessing new instructions that were generated by the target's eliminateFrameIndex. Backwards frame index elimination uses backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D156690	2023-08-01 07:56:11 +01:00
Sameer Sahasrabuddhe	93a3706711	[LLVM] move verification of convergence control to a class template The refactored template can now be used with MachineVerifier. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156522	2023-08-01 11:21:48 +05:30
Matt Arsenault	4d42e8b5d1	Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd. The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG.	2023-07-31 20:15:45 -04:00
Matt Arsenault	161c0d506b	RegisterCoalescer: Remove dubious dropping of implicit virtual register defs Don't understand why this would either be OK or necessary, but doesn't appear to happen in any tests. This was introduced way back in 76e66c31a0481e72d1ff86c56028d850b6c33cff https://reviews.llvm.org/D156265	2023-07-31 19:16:11 -04:00
David Green	778fa4edaf	[AArch64] Add some basic handling for bf16 constants. This adds some basic handling for bf16 constants, attempting to treat them a lot like fp16 constants where it can. Zero immediates get lowered to FMOVH0, others either get lowered to FMOVWHr(MOVi32imm) or use FMOVHi if they can. Without fp16 they get expanded. This may not always be optimal, but fixes a gap in our lowering. See llvm/test/CodeGen/AArch64/f16-imm.ll for the equivalent fp16 test. Differential Revision: https://reviews.llvm.org/D156649	2023-07-31 21:31:56 +01:00
David Blaikie	4e429fd2a7	Few linter fixes size() > 0 -> !empty indentation mismatched names on parameters in decls/defs const on value return types	2023-07-31 18:52:57 +00:00
Matt Arsenault	d6f9428e46	GlobalISel: Pass MachineIRBuilder to applyMappingImpl The target should not have to construct MachineIRBuilders during RegBankSelect (we should perhaps hide the constructors for it). The pass should own the builder setup with the desired CSE configuration (although currently the pass does not use the CSE builder, which is what I want to fix). https://reviews.llvm.org/D156479	2023-07-31 10:03:38 -04:00
Jay Foad	0ef39e33d7	[StackColoring] Fix typo in comment	2023-07-31 11:35:57 +01:00
Simon Pilgrim	076bee1020	[DAG] getNode() - fold (zext (trunc (assertzext x))) -> (assertzext x) If the pre-truncated value was the same width as the extension, and the assertzext guarantees that the extended bits are already zero, then skip the zext/trunc 'zero_extend_inreg' pattern. Addresses several regressions noticed in D155472	2023-07-31 10:43:11 +01:00
Simon Tatham	60b98363c7	Retain all jump table range checks when using BTI. This modifies the switch-statement generation in SelectionDAGBuilder, specifically the part that generates case clusters of type CC_JumpTable. A table-based branch of any kind is at risk of being a JOP gadget, if it doesn't range-check the offset into the table. For some types of table branch, such as Arm TBB/TBH, the impact of this is limited because the value loaded from the table is a relative offset of limited size; for others, such as a MOV PC,Rn computed branch into a table of further branch instructions, the gadget is fully general. When compiling for branch-target enforcement via Arm's BTI system, many of these table branch idioms use branch instructions of types that do not require a BTI instruction at the branch destination. This avoids the need to put a BTI at the start of each case handler, reducing the number of available gadgets //with// BTIs (i.e. ones which could be used by a JOP attack in spite of the BTI system). But without a range check, the use of a non-BTI-requiring branch also opens up a larger range of followup gadgets for an attacker's use. A defence against this is to avoid optimising away the range check on the table offset, even if the compiler believes that no out-of-range value should be able to reach the table branch. (Rationale: that may be true for values generated legitimately by the program, but not those generated maliciously by attackers who have already corrupted the control flow.) The effect of keeping the range check and branching to an unreachable block is that no actual code is generated at that block, so it will typically point at the end of the function. That may still cause some kind of unpredictable code execution (such as executing data as code, or falling through to the next function in the code section), but even if so, there will only be //one// possible invalid branch target, rather than giving an attacker the choice of many possibilities. This defence is enabled only when branch target enforcement is in use. Without branch target enforcement, the range check is easily bypassed anyway, by branching in to a location just after it. But with enforcement, the attacker will have to enter the jump table dispatcher at the initial BTI and then go through the range check. (Or, if they don't, it's because they //already// have a general BTI-bypassing gadget.) Reviewed By: MaskRay, chill Differential Revision: https://reviews.llvm.org/D155485	2023-07-31 10:39:50 +01:00
Francesco Petrogalli	c4b21d57bc	[llc] Add the command line option `-sched-model-force-enable-intervals`. The option is used to force the use of resource intervals in the machine scheduler, effectively ignoring the value of `EnableIntervals` in the instance of the `SchedMachineModel`. Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D156540	2023-07-31 10:10:18 +02:00
Sameer Sahasrabuddhe	d9847cde48	[GlobalISel] convergent intrinsics Introduced the convergent equivalent of the existing G_INTRINSIC opcodes: - G_INTRINSIC_CONVERGENT - G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS Out of the targets that currently have some support for GlobalISel, the patch assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154766	2023-07-31 12:15:39 +05:30
Jay Foad	e2e3f06813	Revert "[MachineScheduler] Track physical register dependencies per-regunit" This reverts commit 1a54671d5405a39de362e9692ce963c0638023bc. It was causing lit test failures in a LLVM_ENABLE_EXPENSIVE_CHECKS build.	2023-07-29 18:05:25 +01:00
Jay Foad	1a54671d54	[MachineScheduler] Track physical register dependencies per-regunit Change the scheduler's physical register dependency tracking from registers-and-their-aliases to regunits. This has a couple of advantages when subregisters are used: - The dependency tracking is more accurate and creates fewer useless edges in the dependency graph. An AMDGPU example, edited for clarity: SU(0): $vgpr1 = V_MOV_B32 $sgpr0 SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1 SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0 There is a data dependency on $vgpr1 from SU(0) to SU(1) and from SU(1) to SU(2). But the old dependency tracking code also added a useless edge from SU(0) to SU(2) because it thought that SU(0)'s def of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1. - On targets like AMDGPU that make heavy use of subregisters, each register can have a huge number of aliases - it can be quadratic in the size of the largest defined register tuple. There is a much lower bound on the number of regunits per register, so iterating over regunits is faster than iterating over aliases. The LLVM compile-time tracker shows a tiny overall improvement of 0.03% on X86. I expect a larger compile-time improvement on targets like AMDGPU. Differential Revision: https://reviews.llvm.org/D156552	2023-07-29 15:34:53 +01:00
Wael Yehia	9d4e8c09f4	[XCOFF] Do not put MergeableCStrings in their own section The current implementation generates a csect with a ".rodata.str.x.y" prefix for a MergeableCString variable definition. However, a reference to such variable does not get the prefix in its name because there's not enough information in the containing IR. In particular, without seeing the initializer and absent of some other indicators, we cannot tell that the referenced variable is a null- terminated string. When the AIX codegen in llvm was being developed, the prefixing was copied from ELF without having the linker take advantage of the info. Currently, the AIX linker does not have the capability to merge MergeableCString variables. If such feature would ever get implemented, the contract between the linker and compiler would have to be reconsidered. Here's the before and after of this change: ``` @a = global i64 320255973571806, align 8 @strA = unnamed_addr constant [7 x i8] c"hello\0A\00", align 1 ;; Mergeable1ByteCString @strB = unnamed_addr constant [8 x i8] c"Blahah\0A\00", align 1 ;; Mergeable1ByteCString @strC = unnamed_addr constant [2 x i16] [i16 1, i16 0], align 2 ;; Mergeable2ByteCString @strD = unnamed_addr constant [2 x i16] [i16 1, i16 1], align 2 ;; !isMergeableCString @strE = external unnamed_addr constant [2 x i16], align 2 -fdata-sections: .text extern .rodata.str1.1strA .text extern strA 0 SD RO 0 SD RO .text extern .rodata.str1.1strB .text extern strB 0 SD RO 0 SD RO .text extern .rodata.str2.2strC ===> .text extern strC 0 SD RO 0 SD RO .text extern strD .text extern strD 0 SD RO 0 SD RO .data extern a .data extern a 0 SD RW 0 SD RW undef extern strE undef extern strE 0 ER UA 0 ER UA -fno-data-sections: .text unamex .rodata.str1.1 .text unamex .rodata 0 SD RO 0 SD RO .text extern strA .text extern strA 0 LD RO 0 LD RO .text extern strB .text extern strB 0 LD RO 0 LD RO .text unamex .rodata.str2.2 ===> .text extern strC 0 SD RO 0 LD RO .text extern strC .text extern strD 0 LD RO 0 LD RO .text unamex .rodata .data unamex .data 0 SD RO 0 SD RW .text extern strD .data extern a 0 LD RO 0 LD RW .data unamex .data undef extern strE 0 SD RW 0 ER UA .data extern a 0 LD RW undef extern strE 0 ER UA ``` Reviewed by: David Tenty, Fangrui Song Differential Revision: https://reviews.llvm.org/D156202	2023-07-29 03:24:21 +00:00
Arthur Eubanks	f800c1f3b2	[PEI] Don't zero out noreg operands A tail call may have $noreg operands. Fixes a crash. Reviewed By: xgupta Differential Revision: https://reviews.llvm.org/D156485	2023-07-28 10:23:17 -07:00
Jay Foad	945123384e	[PEI][ARM] Switch to backwards frame index elimination This adds better support for call frame pseudos that adjust SP in PEI::replaceFrameIndicesBackward. Running frame index elimination backwards is preferred because it can do backwards register scavenging (on targets that require scavenging) which does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D156434	2023-07-28 17:32:51 +01:00
Evgenii Kudriashov	c13e310fa7	[DAGCombine] Support truncated constants for fptosi.sat combining Closes https://github.com/llvm/llvm-project/issues/56779 Reviewed By: RKSimon, dmgreen Differential Revision: https://reviews.llvm.org/D152926	2023-07-28 18:54:39 +03:00
Jay Foad	a54320392c	[CodeGen] Clean up ScheduleDAGInstrs::addPhysRegDeps Small refactorings, cosmetic changes, clean up some naming. NFCI.	2023-07-28 15:37:19 +01:00
Jay Foad	2dcf051259	[CodeGen] Store call frame size in MachineBasicBlock Record the call frame size on entry to each basic block. This is usually zero except when a basic block has been split in the middle of a call sequence. This simplifies PEI::replaceFrameIndices which previously had to visit basic blocks in a specific order and had special handling for unreachable blocks. More importantly it paves the way for an equally simple implementation of a backwards version of replaceFrameIndices, which is required to fully convert PrologEpilogInserter to backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D156113	2023-07-27 10:32:00 +01:00
Vitaly Buka	a496c8be6e	Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" And dependent commits. Details in D150388. This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b. No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll Reviewed By: alexfh Differential Revision: https://reviews.llvm.org/D156381	2023-07-26 22:13:32 -07:00
Sameer Sahasrabuddhe	b14e30f10d	[LLVM] refactor GenericSSAContext and its specializations Fix the GenericSSAContext template so that it actually declares all the necessary typenames and the methods that must be implemented by its specializations SSAContext and MachineSSAContext. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156288	2023-07-27 09:54:50 +05:30
Pranav Kant	6f305e0658	[DAGCombiner] Limit graph traversal to cap compile times hasPredecessorHelper method, that is used by DAGCombiner to combine to pre-indexed and post-indexed load/stores, is a major source of slowdown while compiling a large function with MSan enabled on Arm. This patch caps the DFS-graph traversal for this method to 8192 which cuts compile time by 50% (4m -> 2m compile time) at the cost of less overall nodes combined. Here's the summary of pre-index DAG nodes created and time it took to compile the pathological case with different MaxDepth limit: 1. With MaxDepth = 0 (unlimited): 1800, took 4m 2. With MaxDepth = 32k, 560, took 2m31s 3. With MaxDepth = 8k, 139, took 2m. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D154885	2023-07-26 17:29:38 +00:00
DianQK	30f2170a78	Revert "[DebugInfo] Fix potential CU mismatch for attachRangesOrLowHighPC" This reverts commit d20e4a1d68aa8e14c4e524e4d4eeb4445acac401. After committing 2ee4d0386c783f58abe708298228de648239b435, We don't support subprogram definitions nested within `DICompositeType` when doing LTO builds. For a detailed discussion, see https://reviews.llvm.org/D152095.	2023-07-26 19:58:00 +08:00
Zhongyunde	05aae0839f	Reland [AArch64][NFC] Call the API getVScaleRange directly Use the maximum 64 for BitWidth of getVScaleRange to avoid returning an empty range. the previous changes bring in a Buildbot failure because MinSVEVectorSize = MinSVEVectorSize. error: explicitly assigning value of variable of type 'unsigned int' to itself [-Werror,-Wself-assign] Reviewed By: sdesmalen, nikic, dmgreen Differential Revision: https://reviews.llvm.org/D155708	2023-07-26 18:55:31 +08:00
Jay Foad	6fcad9cf93	[DAGCombiner] Simplify foldAndOrOfSETCC. NFC. Pull out repeated hasOneUse checks. Simplify some conditions. Reduce indentation. Differential Revision: https://reviews.llvm.org/D156220	2023-07-26 10:22:55 +01:00
Zhongyunde	ebaac2b2d6	Revert "[AArch64][NFC] Call the API getVScaleRange directly" This reverts commit 67005c8e6fa9464f8bc436305a422071013ae499.	2023-07-26 16:44:14 +08:00
Zhongyunde	67005c8e6f	[AArch64][NFC] Call the API getVScaleRange directly Use the maximum 64 for BitWidth of getVScaleRange to avoid returning an empty range. Reviewed By: sdesmalen, nikic, dmgreen Differential Revision: https://reviews.llvm.org/D155708	2023-07-26 15:54:04 +08:00
esmeyi	e83b8a5e71	[XCOFF] Enable available_externally linkage for functions. Summary: D80642 added support for emitting AvailableExternally Linkage on AIX. However, an assertion of "Trying to get csect representation of this symbol but none was set." occurred when a function is declared as available_externally. This is due to we missing to generate a csect for the function. This patch fixes it. Reviewed By: hubert.reinterpretcast, shchenz Differential Revision: https://reviews.llvm.org/D156213 Signed-off-by: Esme Yi <esme.yi@ibm.com>	2023-07-25 22:47:11 -04:00
Qi Hu	ddd7d35c6c	[RegAlloc] Fix assertion failure caused by inline assembly When inline assembly code requests more registers than available, the MachineInstr::emitError function in the RegAllocFast pass emits an error but doesn't stop the pass, and then the compiler crashes later with an assertion failure. This commit, mimicking the RegAllocGreedy pass, assigns a random physical register, and therefore avoids the crash after producing the diagnostic. This problem has been observed for both rustc and clang, while it doesn't occur in gcc.	2023-07-25 19:21:03 -04:00
Craig Topper	1f5a1b8952	[DAGCombiner] Minor improvements to foldAndOrOfSETCC. NFC Reduce the scope of some variables. Replace an if with an assertion. Reviewed By: kmitropoulou Differential Revision: https://reviews.llvm.org/D156140	2023-07-25 00:20:06 -07:00
Matt Arsenault	0d797b71eb	RegisterCoaleser: Fix empty subrange verifier error In this example an implicit def had live-out undef subrange defs. After coalescing with the def from a previous block, the undef-defed lanes are no longer live out of the block in the new interval. An empty subrange was tenatively created for these lanes, but it must be deleted.	2023-07-24 12:18:34 -04:00
Matt Arsenault	2a53b6c06b	RegisterCoalescer: Fix verifier error on redef of subregister for live out implicit_defs A live out implicit_def wasn't deleted, but the subranges weren't correctly updated. The main range was correct but the def corresponding to the initial main range def instruction was missing from the lanes redefined in another block. The written lanes are not quite the same as the valid lanes in the case of an implicit_def. Fixes verifier error in blender. There is an additional verifier in some of the testcase variants where an empty subrange remains.	2023-07-24 12:18:34 -04:00
WANG Rui	595d5f36f4	[DAGCombine] Canonicalize operands for visitANDLike During the construction of SelectionDAG, there are no explicit canonicalization rules to adjust the order of operands for AND nodes. This may prevent the optimization in DAGCombiner::visitANDLike from being triggered. This patch canonicalizes the operands before matches, which can be observed to improve optimization on the RISC-V target architecture. Canonicalize: ``` and(x, add) -> and(add, x) ``` Signed-off-by: WANG Rui <wangrui@loongson.cn> Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D154760	2023-07-24 16:52:04 +08:00
Antonio Frighetto	2dea969d83	[clang][CodeGen] Introduce `-frecord-command-line` for MachO Allow clang driver command-line recording when targeting MachO object files as well. Reviewed-by: sgraenitz Differential Revision: https://reviews.llvm.org/D155716	2023-07-24 09:24:59 +02:00
David Green	6edc9a7662	[AArch64][GISel] Additional FPExt vector lowering Similar to D155311, this adds lowering for more vector cases for FPExt Differential Revision: https://reviews.llvm.org/D155601	2023-07-23 16:58:13 +01:00
Amaury Séchet	88452508f3	[DAG] Improve carry reconstruction in combineCarryDiamond. The gain is usually suffiscient to go the extra mile and reconstruct a carry in some cases. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D154533	2023-07-22 22:49:48 +00:00
Craig Topper	a815f03f9b	[LegalizeTypes] Use report_fatal_error instead of llvm_unreachable in the default case of some type legalization handlers. These can be triggered by in various ways when intrinsics are used wrong or a target doesn't correctly not support something. Using a fatal error prevents strange behavior like infinite loops. We already do this for some of the vector type legalization handles.	2023-07-22 11:05:24 -07:00
Daniel Hoekwater	0315fca912	[AArch64] Move branch relaxation after bbsection assignment Because branch relaxation needs to factor in if branches target a block in the same section or a different one, it needs to run after the Basic Block Sections / Machine Function Splitting passes. Because Jump table compression relies on block offsets remaining fixed after the table is compressed, we must also move the JT compression pass. The only tests affected are ones enforcing just the ordering and the a few that have basic block ids changed because RenumberBlocks hasn't run yet. Differential Revision: https://reviews.llvm.org/D153829	2023-07-21 20:24:52 +00:00
Simon Pilgrim	ae60706da0	[DAG] SimplifyDemandedBits - call ComputeKnownBits for constant non-uniform ISD::SRL shift amounts We only attempted to determine KnownBits for uniform constant shift amounts, but ComputeKnownBits is able to handle some non-uniform cases as well that we can use as a fallback.	2023-07-21 14:52:57 +01:00

1 2 3 4 5 ...

34405 Commits