llvm-project

Author	SHA1	Message	Date
Sameer Sahasrabuddhe	9c1b82599d	[AAPointerInfo] handle multiple offsets in PHI Previously reverted in 8b446ea2ba39e406bcf940ea35d6efb4bb9afe95 Reapplying because this commit is NOT DEPENDENT on the reverted commit fc21f2d7bae2e0be630470cc7ca9323ed5859892, which broke the ASAN buildbot. See https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for more information. The arguments to a PHI may represent a recurrence by eventually using the output of the PHI itself. This is now handled by checking for cycles in the control flow. If a PHI is not in a recurrence, it is now able to report multiple offsets instead of conservatively reporting unknown. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D138991	2022-12-18 10:51:20 +05:30
Christudasan Devadasan	40ba0942e2	[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time. This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs. Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which isn an unproblematic case. This patch also implements the whole wave spills which might occur if RA spills any live range of virtual registers involved in the whole wave operations. Earlier, we had been hand-picking registers for such machine operands. But now with SGPR spills into virtual VGPR lanes, we are exposing them to the allocator. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D124196	2022-12-17 11:56:32 +05:30
Christudasan Devadasan	29247824f5	[AMDGPU][SIFrameLowering] Use the right frame register in CSR spills Unlike the callee-saved VGPR spill instructions emitted by `PEI::spillCalleeSavedRegs`, the CS VGPR spills inserted during emitPrologue/emitEpilogue require the exec bits flipping to avoid clobbering the inactive lanes of VGPRs used for SGPR spilling. Currently, these spill instructions are referenced from the SP at function entry and when the callee performs a stack realignment, they ended up getting incorrect stack offsets. Even if we try to adjust the offsets, the FP-SP becomes a runtime entity with dynamic stack realignment and the offsets would still be inaccurate. To fix it, use FP as the frame base in the spill instructions whenever the function has FP. The offsets obtained for the CS objects would always be the right values from FP. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134949	2022-12-17 11:52:36 +05:30
Christudasan Devadasan	7a72a93580	[AMDGPU] Preserve only the inactive lanes of scratch vgprs In general, a callee is free to use a scratch register without preserving its previous state. However, the VGPR used for SGPR spilling can potentially have its inactive lanes overwritten by the writelane instructions. When the function returns, it can cause unexpected behavior if the VGPR value is not preserved appropriately. The current scheme to preserve the inactive lanes of such scratch VGPRs is not done rightly. It preserves all lanes and causes the outgoing values (if any) getting overwritten by the epilog restores. It then corrupts the return value. To avoid such situation with scratch VGPRs, this patch ensures we preserve only their inactive lanes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134526	2022-12-17 11:51:43 +05:30
Christudasan Devadasan	20a940f1e2	[AMDGPU][SIFrameLowering] Unify PEI SGPR spill saves and restores There is a lot of customization and eventually code duplication in the frame lowering that handles special SGPR spills like the one needed for the Frame Pointer. Incorporating any additional SGPR spill currently makes it difficult during PEI. This patch introduces a new spill builder to efficiently handle such spill requirements. Various spill methods are special handled using a separate class. Reviewed By: sebastian-ne, scott.linder Differential Revision: https://reviews.llvm.org/D132436	2022-12-17 11:50:25 +05:30
Christudasan Devadasan	b25b4c0ab4	[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI SILowerSGPRSpills pass handles the lowering of SGPR spills into VGPR lanes. Some SGPR spills are handled later during PEI. There is a common function used in both places to find the free VGPR lane. This patch eliminates that dependency to find the free VGPR by handling it separately for PEI. It is a prerequisite patch for a future work to allow SGPR spills to virtual VGPR lanes during SILowerSGPRSpills. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D124195	2022-12-17 11:49:41 +05:30
Christudasan Devadasan	5ebe91fcb2	[AMDGPU] Correctly set IsKill flag for VGPR spills in the prolog We always assume the vector register is dead or killed while inserting the VGPR spills in the prolog. It is not always true. Used the entry block liveIn data while setting the flag. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D124194	2022-12-17 11:48:44 +05:30
Christudasan Devadasan	5692a7e84e	[AMDGPU] Callee must always spill writelane VGPRs Since the writelane instruction used for SGPR spills can modify inactive lanes, the callee must preserve the VGPR this instruction modifies even if it was marked Caller-saved. Reviewed By: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D124192	2022-12-17 11:11:42 +05:30
Mitch Phillips	525d6c54b5	Revert "[AAPointerInfo] handle multiple offsets in PHI" This reverts commit 88db516af69619d4326edea37e52fc7321c33bb5. Reason: This change is dependent on a commit that needs to be rolled back because it broke the ASan buildbot. See https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for more information.	2022-12-16 17:55:48 -08:00
Mitch Phillips	7928a6387f	Revert "Revert "[AAPointerInfo] handle multiple offsets in PHI"" This reverts commit 12696d302d146ffe616eecab3feceba9d29be2db. Reason: This change is dependent on a commit that needs to be rolled back because it broke the ASan buildbot. See https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for more information.	2022-12-16 17:55:38 -08:00
Mitch Phillips	8b446ea2ba	Revert "[AAPointerInfo] handle multiple offsets in PHI" This reverts commit 179ed8871101cd197e0a719a3629cd5077b1a999. Reason: This change is dependent on a commit that needs to be rolled back because it broke the ASan buildbot. See https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for more information.	2022-12-16 17:54:44 -08:00
Jeffrey Byrnes	4d2faf043b	[AMDGPU][SIFrameLowering] Mark VGPR used for AGPR spills as reserved Presently, there is an issue on MI100 (and probably other architecture) where the VGPR used for AGPR copies clobbers VGPR used for AGPR spill. AFAICT this is because in processFunctionBeforeFrameIndicesReplaced we think the VGPR register for AGPR spill is unused. This patch aims to correct that. This is a WIP while I work out issues with producing a good test. For now, I'm curious if this is generally a good / bad idea. Differential Revision: https://reviews.llvm.org/D139673	2022-12-16 12:00:51 -08:00
Roman Lebedev	96d3c82645	Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 3)" While the PPC litte-endian miscompile did get addressed by https://reviews.llvm.org/D140046 the PPV big-endian bots are still unhappy. https://lab.llvm.org/buildbot/#/builders/93/builds/12560 This reverts commit 7bd358bcb4e358b4351c69e02ef76939e08acdc7.	2022-12-16 22:58:41 +03:00
Roman Lebedev	cfd594f8bb	[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 3) * This is a recommit of 3c4d2a03968ccf5889bacffe02d6fa2443b0260f, * which was reverted in 25f01d593ce296078f57e872778b77d074ae5888, because it exposed a miscompile in PPC backend, which was resolved in https://reviews.llvm.org/D140089 / cb3f415cd2019df7d14683842198bc4b7a492bc5. * which was a recommit of cf624b23bc5d5a6161706d1663def49380ff816a, * which was reverted in 5cfc22cafe3f2465e0bb324f8daba82ffcabd0df, because the cut-off on the number of vector elements was not low enough, and it triggered both SDAG SDNode operand number assertions, 5and caused compile time explosions in some cases. Let's try with something really REALLY conservative first, just to get somewhere, and try to bump it later. FIXME: should this respect TTI reg width * num vec regs? Original commit message: Now, there's a big caveat here - these bytes are abstract bytes, not the i8 we have in LLVM, so strictly speaking this is not exactly legal, see e.g. https://github.com/AliveToolkit/alive2/issues/860 ^ the "bytes" "could" have been a pointer, and loading it as an integer inserts an implicit ptrtoint. But at the same time, InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()` would expand a memtransfer of 1/2/4/8 bytes into integer-typed load+store, so this isn't exactly a new problem. Note that in memory, poison is byte-wise, so we really can't widen elements, but SROA seems to be inconsistent here. Fixes #59116.	2022-12-16 19:27:38 +03:00
Jay Foad	e2bcdca527	[AMDGPU] Generate permlane test checks	2022-12-16 11:30:01 +00:00
Matt Arsenault	b5edd522d1	AMDGPU/GlobalISel: Do not create readfirstlane with non-s32 type We should probably handle any 32-bit type here, but the intrinsic definition and selection pattern currently do not. Avoids a few lit tests failures when switched on by default.	2022-12-15 21:44:07 -05:00
Alexander Timofeev	2877b87666	[AMDGPU] Lower VGPR to physical SGPR COPY to S_MOV_B32 if VGPR contains the compile time constant Sometimes we have a constant value loaded to VGPR. In case we further need to rematrerialize it in the physical scalar register we may avoid VGPR to SGPR copy replacing it with S_MOV_B32. Reviewed By: JonChesterfield, arsenm Differential Revision: https://reviews.llvm.org/D139874	2022-12-16 00:38:10 +01:00
Christudasan Devadasan	229c466bc8	[AMDGPU] Test fixup Changing cast_lds_gv into a kernel function to lower the LDS usage appropriately. The LDS lowering is currently won't happen for orphan device functions.	2022-12-15 23:36:55 +05:30
Ron Lieberman	38f1abef86	Revert "[SelectionDAG] Do not second-guess alignment for alloca" Breaks amdgpu buildbot https://lab.llvm.org/buildbot/#/builders/193 23491 This reverts commit ffedf47d8b793e07317f82f9c2a5f5425ebb71ad.	2022-12-15 10:55:18 -06:00
Andrew Savonichev	ffedf47d8b	[SelectionDAG] Do not second-guess alignment for alloca Alignment of an alloca in IR can be lower than the preferred alignment on purpose, but this override essentially treats the preferred alignment as the minimum alignment. The patch changes this behavior to always use the specified alignment. If alignment is not set explicitly in LLVM IR, it is set to DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign. Tests are changed as well: explicit alignment is increased to match the preferred alignment if it changes output, or omitted when it is hard to determine the right value (e.g. for pointers, some structs, or weird types). Differential Revision: https://reviews.llvm.org/D135462	2022-12-15 18:18:12 +03:00
Juan Manuel MARTINEZ CAAMAÑO	4d852374b1	[DAGCombine] Fix always true condition in combineShiftToMULH Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D139550	2022-12-15 13:04:42 +01:00
Sameer Sahasrabuddhe	179ed88711	[AAPointerInfo] handle multiple offsets in PHI Previously reverted in 12696d302d146ffe616eecab3feceba9d29be2db The arguments to a PHI may represent a recurrence by eventually using the output of the PHI itself. This is now handled by checking for cycles in the control flow. If a PHI is not in a recurrence, it is now able to report multiple offsets instead of conservatively reporting unknown. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D138991	2022-12-15 12:23:50 +05:30
Sameer Sahasrabuddhe	12696d302d	Revert "[AAPointerInfo] handle multiple offsets in PHI" This reverts commit 88db516af69619d4326edea37e52fc7321c33bb5.	2022-12-15 10:14:39 +05:30
Sameer Sahasrabuddhe	88db516af6	[AAPointerInfo] handle multiple offsets in PHI The arguments to a PHI may represent a recurrence by eventually using the output of the PHI itself. This is now handled by checking for cycles in the control flow. If a PHI is not in a recurrence, it is now able to report multiple offsets instead of conservatively reporting unknown. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D138991	2022-12-15 08:48:38 +05:30
Matt Arsenault	fb639b0ce0	AMDGPU: Update test	2022-12-14 16:47:25 -05:00
Matt Arsenault	c16a58b36c	Attributes: Add function getter to parse integer string attributes The most common case for string attributes parses them as integers. We don't have a convenient way to do this, and as a result we have inconsistent missing attribute and invalid attribute handling scattered around. We also have inconsistent radix usage to getAsInteger; some places use the default 0 and others use base 10. Update a few of the uses, but there are quite a lot of these.	2022-12-14 13:12:35 -05:00
Jay Foad	113aafbf23	[AMDGPU] Clean up SReg classes Remove unused LO16 classes SReg_LO16_XM0_XEXEC, SReg_LO16_XEXEC_HI and SReg_LO16_XM0. Simplify the definition of SReg_32. Add SReg_32_XEXEC and use it to improve SReg_1_XEXEC which previously excluded M0 for no good reason. Improve SReg_1 which previously excluded EXEC_HI for no good reason. Differential Revision: https://reviews.llvm.org/D140012	2022-12-14 15:57:56 +00:00
Sameer Sahasrabuddhe	6a2305484e	[AAPointerInfo] track multiple constant offsets for each use An expression of the form `gep(base, select(pred, const1, const2))` can result in a set of offsets instead of just one. PointerInfo can now track these sets instead of conservatively modeling them as Unknown. In general, AAPointerInfo now uses AAPotentialConstantValues to examine the operands of the GEP. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D138646	2022-12-13 22:27:25 +05:30
Pierre van Houtryve	9fa46200ea	[AMDGPU] Add `.workgroup_processor_mode` to v5 MD Adds Workgroup Processor Mode (WGP) to the HSA Metadata for Code Object v5/GFX10+. The field is already present as an asm directive and in the compute program resource register but is also needed in the MD. Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D139931	2022-12-13 10:44:52 -05:00
Pierre van Houtryve	678d8946ba	[AMDGPU] Add bf16 storage support - [Clang] Declare AMDGPU target as supporting BF16 for storage-only purposes on amdgcn - Add Sema & CodeGen tests cases. - Also add cases that D138651 would have covered as this patch replaces it. - [AMDGPU] Add BF16 storage-only support - Support legalization/dealing with bf16 operations in DAGIsel. - bf16 as a type remains illegal and is represented as i16 for storage purposes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139398	2022-12-13 10:34:26 -05:00
Matt Arsenault	4f63f9739c	AMDGPU: Add sanity test if amdgcn.device.{init\|fini} already exists	2022-12-12 22:18:37 -05:00
Matt Arsenault	d647e252b8	InstSimplify: Add basic folding of llvm.is.fpclass intrinsic Copied from the existing llvm.amdgcn.class handling; eventually I will fold that to the generic intrinsic when legal. The tests should probably move into an instsimplify only test.	2022-12-12 21:54:04 -05:00
Doru Bercea	aea5980e26	Emit CAS loop for min/max atomics.	2022-12-12 11:42:30 -06:00
Jay Foad	81084bfa2c	[AMDGPU] Make use of !listremove. NFCI. This only affects the order of implicit operands in some MIR tests. Differential Revision: https://reviews.llvm.org/D139829	2022-12-12 17:01:04 +00:00
Sameer Sahasrabuddhe	2fdeb27790	Revert "[AAPointerInfo] track multiple constant offsets for each use" Assertion fired in openmp-offload-amdgpu-runtime: https://lab.llvm.org/buildbot/#/builders/193/builds/23177 This reverts commit c2a0baad1fbb21fe111fef83ec93c2d7923b9b0c.	2022-12-12 15:39:18 +05:30
Nikita Popov	243acd5dcb	[BasicAA] Remove support for PhiValues analysis BasicAA currently has an optional dependency on the PhiValues analysis. However, at least with our current pipeline setup, we never actually make use of it. It's possible that this used to work with the legacy pass manager, but I'm not sure of that either. Given that this analysis has not actually been in use for a long time, and nobody noticed or complained, I think we should drop support for it and focus on one code path. It is worth noting that analysis quality for the non-PhiValues case has significantly improved in the meantime. If we really wanted to make use of PhiValues, the right way would probably be to pass it in via AAQI in places we want to use it, rather than using an optional pass manager dependency (which are an unpredictable PITA and should really only ever be used for analyses that are only preserved and not used). Differential Revision: https://reviews.llvm.org/D139719	2022-12-12 09:47:30 +01:00
Sameer Sahasrabuddhe	c2a0baad1f	[AAPointerInfo] track multiple constant offsets for each use An expression of the form `gep(base, select(pred, const1, const2))` can result in a set of offsets instead of just one. PointerInfo can now track these sets instead of conservatively modeling them as Unknown. In general, AAPointerInfo now uses AAPotentialConstantValues to examine the operands of the GEP. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D138646	2022-12-12 13:36:45 +05:30
Austin Kerbow	f9c76a1198	[AMDGPU] Update MFMASmallGemmOpt with better performing stategy Based on experiments this does better with target small GEMM kernels. Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D139227	2022-12-09 19:03:51 -08:00
Ariel Burton	6b2829dd87	Allow epilogue_begin to be emitted when generating DWARF We identify epilogue code by looking for instructions tagged with FrameDestroy. A function may have more than one epilogue, e.g., because of early returns or code duplicated during optimization. We need only track the current block, and emit epilogie_begin at most once per block. We reduce the number of entries in the line table by combining epilogue_begin with other flags instead of emitting a separate entry just for epilogue_begin. Reviewed By: dblaikie, aprantl Differential Revision: https://reviews.llvm.org/D133376	2022-12-09 20:17:37 +00:00
Matt Arsenault	9cc0779c4e	AMDGPU: Erase llvm.global_ctors/global_dtors after lowering We should be able to run the pass multiple times without breaking anything. If we still need to track these for some reason, we could replace with new entries for the kernels.	2022-12-09 14:25:32 -05:00
Matt Arsenault	f23f26032d	AMDGPU: Port AMDGPUCtorDtorLowering to new PM	2022-12-09 13:43:38 -05:00
Matt Arsenault	41c96e9483	AMDGPU: Fix not emitting code for exotic constructor types This was simply ignoring any entries that weren't direct function calls. This really should have been erroring on anything unexpected. We should be able to handle calling just about anything these days, so just call anything.	2022-12-09 13:22:12 -05:00
Pierre van Houtryve	3612d9eaac	[GISel] Rework trunc/shl combine in a generic trunc/shift combine This combine only handled left shifts, but now it can handle right shifts as well. It handles right shifts conservatively and only truncates them to the size returned by TLI. AMDGPU benefits from always lowering shifts to 32 bits for instance, but AArch64 would rather keep them at 64 bits. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136319	2022-12-09 04:46:45 -05:00
Roman Lebedev	d4c4bd6b20	[NFC] Port codegen AMDGPU tests that invoke opt to `-passes=` syntax	2022-12-09 01:04:46 +03:00
Roman Lebedev	b1a9584818	[opt] Disincentivize new tests from using old pass syntax Over the past day or so, i've took a large swing at our tests, and reduced the number of tests that were still using the old syntax from ~1800 to just 200. Left to handle: (as it is seen in this patch) * Transforms/LSR * Transforms/CGP * Transforms/TypePromotion * Transforms/HardwareLoops * Analysis/* * some misc. I think this is the right point to start actively refusing to honor the old syntax, except for the old tests, to prevent the old syntax from creeping back in. Thus, let's add temporary default-off flag, and if it is not passed refuse to accept old syntax. The tests that still need porting are annotated with this flag. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D139647	2022-12-08 23:54:03 +03:00
Sebastian Neubauer	1fe65d866c	[AMDGPU] Add test that spills WWM CSRs twice Add a test to show a deficit in the current wwm/spilling code that creates double saves and restores for v40 and v41. This case came up in D124193. Differential Revision: https://reviews.llvm.org/D139626	2022-12-08 14:25:41 +01:00
Bjorn Pettersson	51ee10747d	[test] Remove duplicate RUN lines A few more that I missed in commit 3528e63d89305907b3d6e. There could be more duplicates remaining, since I've only focused on exactly duplicated "RUN: opt" lines (ignoring multi line RUN lines ending with '\').	2022-12-08 12:47:24 +01:00
Johannes Doerfert	f6e3a89cc0	[AMDGPU] Annotate the intrinsics to be default and nocallback Differential Revision: https://reviews.llvm.org/D135155	2022-12-07 14:25:25 -08:00
Jon Chesterfield	d77ae7f251	[amdgpu] Reimplement LDS lowering Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variable basis. Unit tests are set up to pass with the default lowering of "module" or "hybrid" with this patch defaulting to "module", which will be a less dramatic codegen change relative to the current. This reflects the sparsity of test coverage for the table lowering method. Hybrid is better than module in every respect and will be default in a subsequent patch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139433	2022-12-07 22:02:54 +00:00
Joe Nash	bbfbec94b1	[AMDGPU] Enable OMod on more VOP3 instructions OMod was disabled if OpSel was enabled, but that restriction is more specific than necessary. Any VOP3 with float operands can use OMod. On GFX11, FMAC_F16_e64 can use op_sel. Previously, SIFoldOperands and convertToThreeAddress were accidentally correct when they reinterpreted the zero OMod operand on V_FMAC_F16_e64 as the OpSel operand on V_FMA_F16_gfx9_e64. Now we explicitly add op_sel if required. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139469	2022-12-07 13:30:33 -05:00

1 2 3 4 5 ...

6024 Commits