llvm-project

Author	SHA1	Message	Date
Matt Arsenault	b4d44322d9	AMDGPU/GlobalISel: Add missing test for implicit_def regbankselect	2023-01-06 08:58:10 -05:00
Matt Arsenault	6fe85933d4	AMDGPU/GlobalISel: Add wave32 checks to bool test	2023-01-06 08:58:10 -05:00
Nikita Popov	60442f0d44	[CodeGen] Convert some tests to opaque pointers (NFC) These are mostly MIR tests, which I did not handle during previous conversions.	2023-01-05 13:21:20 +01:00
Jay Foad	0d518ae50c	[GlobalISel] New combine to commute constant operands to the RHS Differential Revision: https://reviews.llvm.org/D140907	2023-01-05 11:12:40 +00:00
Diana Picus	6ee4f253b2	[GlobalISel] Add G_BUILD_VECTOR[_TRUNC] to CSE Add G_BUILD_VECTOR and G_BUILD_VECTOR_TRUNC to the list of opcodes in `shouldCSEOpc`. This simplifies the code generated for vector splats. Differential Revision: https://reviews.llvm.org/D140965	2023-01-05 10:15:31 +01:00
Diana Picus	61c5775b36	[GlobalISel] Precommit a test for D140965 Add a test for CSE-ing G_BUILD_VECTOR. This will be enabled in D140965.	2023-01-05 09:59:27 +01:00
Mirko Brkusanin	a80edb7fc9	[AMDGPU][GlobalISel] Fix mapping G_FREEZE Differential Revision: https://reviews.llvm.org/D140416	2022-12-21 15:25:04 +01:00
Christudasan Devadasan	a3028239a7	Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs" This reverts commit 40ba0942e2ab1107f83aa5a0ee5ae2980bf47b1a.	2022-12-21 16:17:42 +05:30
Leon Clark	daa022ca57	Enable roundeven. Add support for roundeven and implement appropriate tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D137954	2022-12-20 15:40:20 +00:00
Jessica Del	5ee13e6c65	[AMDGPU] Wide multiplies tests for D140208 These tests show suboptimal code generation that will be improved by the changes in D140208	2022-12-20 12:08:36 +01:00
Matt Arsenault	0dc4bdd888	GlobalISel: Enable CSE of G_SELECT Stop trying to delete a select in one combine since it would be deleting the CSE'd instruction if that happened.	2022-12-19 21:26:47 -05:00
Nikita Popov	bdf2fbba9c	[AMDGPU] Convert some tests to opaque pointers (NFC)	2022-12-19 12:41:13 +01:00
Matt Arsenault	012a85296b	AMDGPU/GlobalISel: Use ptrtoint to legalize constant 32-bit addrspacecast This was trying to merge 2 32-bit pointers into a 64-bit pointer. The artifact combiner was assuming merges to pointers use scalar sources, and ended up inserting invalid bitcast from a pointer to a scalar. It should probably be a verifier error to have pointer merge sources with a pointer result. Fixes verifier errors with EXPENSIVE_CHECKS.	2022-12-18 13:15:58 -05:00
Matt Arsenault	9d6003c764	AMDGPU: Lower addrspacecast on gfx6 Fixes inconsistent handling of constant-32bit case. Turns out we can lower all the casts just fine, it's just accessing the flat results that's a problem.	2022-12-18 08:02:45 -05:00
Christudasan Devadasan	40ba0942e2	[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time. This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs. Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which isn an unproblematic case. This patch also implements the whole wave spills which might occur if RA spills any live range of virtual registers involved in the whole wave operations. Earlier, we had been hand-picking registers for such machine operands. But now with SGPR spills into virtual VGPR lanes, we are exposing them to the allocator. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D124196	2022-12-17 11:56:32 +05:30
Christudasan Devadasan	29247824f5	[AMDGPU][SIFrameLowering] Use the right frame register in CSR spills Unlike the callee-saved VGPR spill instructions emitted by `PEI::spillCalleeSavedRegs`, the CS VGPR spills inserted during emitPrologue/emitEpilogue require the exec bits flipping to avoid clobbering the inactive lanes of VGPRs used for SGPR spilling. Currently, these spill instructions are referenced from the SP at function entry and when the callee performs a stack realignment, they ended up getting incorrect stack offsets. Even if we try to adjust the offsets, the FP-SP becomes a runtime entity with dynamic stack realignment and the offsets would still be inaccurate. To fix it, use FP as the frame base in the spill instructions whenever the function has FP. The offsets obtained for the CS objects would always be the right values from FP. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134949	2022-12-17 11:52:36 +05:30
Christudasan Devadasan	7a72a93580	[AMDGPU] Preserve only the inactive lanes of scratch vgprs In general, a callee is free to use a scratch register without preserving its previous state. However, the VGPR used for SGPR spilling can potentially have its inactive lanes overwritten by the writelane instructions. When the function returns, it can cause unexpected behavior if the VGPR value is not preserved appropriately. The current scheme to preserve the inactive lanes of such scratch VGPRs is not done rightly. It preserves all lanes and causes the outgoing values (if any) getting overwritten by the epilog restores. It then corrupts the return value. To avoid such situation with scratch VGPRs, this patch ensures we preserve only their inactive lanes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134526	2022-12-17 11:51:43 +05:30
Christudasan Devadasan	b25b4c0ab4	[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI SILowerSGPRSpills pass handles the lowering of SGPR spills into VGPR lanes. Some SGPR spills are handled later during PEI. There is a common function used in both places to find the free VGPR lane. This patch eliminates that dependency to find the free VGPR by handling it separately for PEI. It is a prerequisite patch for a future work to allow SGPR spills to virtual VGPR lanes during SILowerSGPRSpills. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D124195	2022-12-17 11:49:41 +05:30
Christudasan Devadasan	5692a7e84e	[AMDGPU] Callee must always spill writelane VGPRs Since the writelane instruction used for SGPR spills can modify inactive lanes, the callee must preserve the VGPR this instruction modifies even if it was marked Caller-saved. Reviewed By: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D124192	2022-12-17 11:11:42 +05:30
Matt Arsenault	b5edd522d1	AMDGPU/GlobalISel: Do not create readfirstlane with non-s32 type We should probably handle any 32-bit type here, but the intrinsic definition and selection pattern currently do not. Avoids a few lit tests failures when switched on by default.	2022-12-15 21:44:07 -05:00
Jay Foad	113aafbf23	[AMDGPU] Clean up SReg classes Remove unused LO16 classes SReg_LO16_XM0_XEXEC, SReg_LO16_XEXEC_HI and SReg_LO16_XM0. Simplify the definition of SReg_32. Add SReg_32_XEXEC and use it to improve SReg_1_XEXEC which previously excluded M0 for no good reason. Improve SReg_1 which previously excluded EXEC_HI for no good reason. Differential Revision: https://reviews.llvm.org/D140012	2022-12-14 15:57:56 +00:00
Pierre van Houtryve	3612d9eaac	[GISel] Rework trunc/shl combine in a generic trunc/shift combine This combine only handled left shifts, but now it can handle right shifts as well. It handles right shifts conservatively and only truncates them to the size returned by TLI. AMDGPU benefits from always lowering shifts to 32 bits for instance, but AArch64 would rather keep them at 64 bits. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136319	2022-12-09 04:46:45 -05:00
Jon Chesterfield	d77ae7f251	[amdgpu] Reimplement LDS lowering Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variable basis. Unit tests are set up to pass with the default lowering of "module" or "hybrid" with this patch defaulting to "module", which will be a less dramatic codegen change relative to the current. This reflects the sparsity of test coverage for the table lowering method. Hybrid is better than module in every respect and will be default in a subsequent patch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139433	2022-12-07 22:02:54 +00:00
Joe Nash	bbfbec94b1	[AMDGPU] Enable OMod on more VOP3 instructions OMod was disabled if OpSel was enabled, but that restriction is more specific than necessary. Any VOP3 with float operands can use OMod. On GFX11, FMAC_F16_e64 can use op_sel. Previously, SIFoldOperands and convertToThreeAddress were accidentally correct when they reinterpreted the zero OMod operand on V_FMAC_F16_e64 as the OpSel operand on V_FMA_F16_gfx9_e64. Now we explicitly add op_sel if required. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139469	2022-12-07 13:30:33 -05:00
Mirko Brkusanin	fe42ebe442	[AMDGPU][GlobalISel] Fix legalizing image intrinsics for new types We no longer need to increase vector size to 16 for intrinsics that use more than 8 vgprs for addr. There is no image intrinsic that needs more than 12 so all currently existing cases will be covered. Using incorrect size was causing an error in instruction selection because instructions were updated to require new types (9x32, 10x32, 11x32, 12x32). Differential Revision: https://reviews.llvm.org/D139546	2022-12-07 18:20:58 +01:00
Justin Bogner	bcfdaa96f5	[AMDGPU] Handle `min(max(x, y), max(min(x, y), z))` in med3 combines Differential Revision: https://reviews.llvm.org/D139508	2022-12-06 22:59:43 -08:00
Justin Bogner	916ae0a060	[AMDGPU] Handle nnan and fast on the call in fpmed3 patterns We were only allowing these med3 patterns if the operands were known to not be NaN, but we should also allow it if the calls to max/min have the `nnan` or `fast` flags. Differential Revision: https://reviews.llvm.org/D139506	2022-12-06 22:57:52 -08:00
Justin Bogner	f1353d2f1a	[AMDGPU] Precommit GISel test for min(max(x, y), max(min(x, y), z)) -> med3 These combines will be added by https://reviews.llvm.org/D139508	2022-12-06 22:57:39 -08:00
Justin Bogner	6ec24c1d5c	[AMDGPU] Precommit GISel test showing missing med3 combines These combines will be added by https://reviews.llvm.org/D139506	2022-12-06 22:22:47 -08:00
Nico Weber	a862d09a92	Revert "[amdgpu] Reimplement LDS lowering" This reverts commit 982017240d7f25a8a6969b8b73dc51f9ac5b93ed. Breaks check-llvm, see https://reviews.llvm.org/D139433#3974862	2022-12-06 12:01:36 -05:00
Jon Chesterfield	982017240d	[amdgpu] Reimplement LDS lowering Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variable basis. Unit tests are set up to pass with the default lowering of "module" or "hybrid" with this patch defaulting to "module", which will be a less dramatic codegen change relative to the current. This reflects the sparsity of test coverage for the table lowering method. Hybrid is better than module in every respect and will be default in a subsequent patch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139433	2022-12-06 16:28:15 +00:00
Jonas Paulsson	5ecd363295	Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions." This reverts commit 122efef8ee9be57055d204d52c38700fe933c033. - Patch fixed to not reuse definitions from predecessors in EH landing pads. - Late review suggestions (by MaskRay) have been addressed. - M68k/pipeline.ll test updated. - Init captures added in processBlock() to avoid capturing structured bindings. - RISCV has this disabled for now. Original commit message: A new pass MachineLateInstrsCleanup is added to be run after PEI. This is a simple pass that removes redundant and identical instructions whenever found by scanning the MF once while keeping track of register definitions in a map. These instructions are typically immediate loads resulting from rematerialization, and address loads emitted by target in eliminateFrameInde(). This is enabled by default, but a target could easily disable it by means of 'disablePass(&MachineLateInstrsCleanupID);'. This late cleanup is naturally not "optimal" in removing instructions as it is done by looking at phys-regs, but still quite effective. It would be desirable to improve other parts of CodeGen and avoid these redundant instructions in the first place, but there are no ideas for this yet. Differential Revision: https://reviews.llvm.org/D123394 Reviewed By: RKSimon, foad, craig.topper, arsenm, asb	2022-12-05 12:53:50 -06:00
Jonas Paulsson	122efef8ee	Revert "Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."" This reverts commit 17db0de330f943833296ae72e26fa988bba39cb3. Some more bots got broken - need to investigate.	2022-12-05 00:52:00 +01:00
Jonas Paulsson	17db0de330	Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions." Init captures added in processBlock() to avoid capturing structured bindings, which caused the build problems (with clang). RISCV has this disabled for now until problems relating to post RA pseudo expansions are resolved.	2022-12-03 14:15:15 -06:00
Matt Arsenault	0fb74d0ff8	AMDGPU: Fix broken attribute usage in test	2022-12-01 21:32:20 -05:00
Jonas Paulsson	8ef4632681	Revert "[CodeGen] Add new pass for late cleanup of redundant definitions." Temporarily revert and fix buildbot failure. This reverts commit 6d12599fd4134c1da63198c74a25490d28c733f6.	2022-12-01 13:29:24 -05:00
Jonas Paulsson	6d12599fd4	[CodeGen] Add new pass for late cleanup of redundant definitions. A new pass MachineLateInstrsCleanup is added to be run after PEI. This is a simple pass that removes redundant and identical instructions whenever found by scanning the MF once while keeping track of register definitions in a map. These instructions are typically immediate loads resulting from rematerialization, and address loads emitted by target in eliminateFrameInde(). This is enabled by default, but a target could easily disable it by means of 'disablePass(&MachineLateInstrsCleanupID);'. This late cleanup is naturally not "optimal" in removing instructions as it is done by looking at phys-regs, but still quite effective. It would be desirable to improve other parts of CodeGen and avoid these redundant instructions in the first place, but there are no ideas for this yet. Differential Revision: https://reviews.llvm.org/D123394 Reviewed By: RKSimon, foad, craig.topper, arsenm, asb	2022-12-01 13:21:35 -05:00
Pierre van Houtryve	a88deb4b65	[AMDGPU] Use aperture registers instead of S_GETREG Fixes a longstanding TODO in the codebase where we were using S_GETREG + shift to do something that could simply be done with an inline constant (register). Patch based on D31874 by @kzhuravl Depends on D137767 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D137542	2022-11-30 12:25:10 +00:00
Nicolai Hähnle	b7f44f7cf9	AMDGPU: Remove ImagePSV and move images to addrspace 7 Following up on the removal of BufferPSV in commit 43b86bf992 ("AMDGPU: Remove BufferPseudoSourceValue") It is unclear what exactly the right address space for images should be. They seem morally closest to buffers, so that's what I went with. In practical terms, address space 7 is better than address space 0 because it can't alias with LDS. Differential Revision: https://reviews.llvm.org/D138949	2022-11-30 11:32:34 +01:00
Ron Lieberman	ca856fff1c	Revert "enable code-object-version=5" very sorry wrong repo. This reverts commit d882ba7aeac4b496dccd1b10cb58bd691786b691.	2022-11-29 15:21:09 -06:00
Nicolai Hähnle	43b86bf992	AMDGPU: Remove BufferPseudoSourceValue The use of a PSV for buffer intrinsics is misleading because it may be misinterpreted as all buffer intrinsics accessing the same address in memory, which is clearly not true. Instead, build MachineMemOperands without a pointer value but with an address space, so that address space-based alias analysis can still work. There is a lot of test churn because previously address space 4 (constant address space) was used as an address space for buffer intrinsics. This doesn't make much sense and seems to have been an accident -- see the change in AMDGPUTargetMachine::getAddressSpaceForPseudoSourceKind. Differential Revision: https://reviews.llvm.org/D138711	2022-11-29 22:15:11 +01:00
Ron Lieberman	d882ba7aea	enable code-object-version=5	2022-11-29 15:11:57 -06:00
Brendon Cahoon	b32a5666a8	[AMDGPU] Unify uniform return and divergent unreachable blocks This patch fixes a "failed to annotate CFG" error in SIAnnotateControlFlow. The problem occurs when there are divergent and uniform unreachable/return blocks in the same region. In this case, AMDGPUUnifyDivergentExitNodes does not create a unified block so the region contains multiple exits. StructurizeCFG does not work properly when there are multiple exits, so the neccessary CFG transformations do not occur along divergent control flow. Subsequently, SIAnnotateControlFlow processes the path to the divergent exit block, but may only partially process blocks along a unform control flow path to another exit block. This patch fixes the bug by creating a single exit block when there is a divergent exit block in the function. Differential revision: https://reviews.llvm.org/D136892	2022-11-29 13:25:56 -06:00
Mateja Marjanovic	595a08847a	[AMDGPU] Add support for new LLVM vector types Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384. Differential Revision: https://reviews.llvm.org/D138205	2022-11-29 17:02:04 +01:00
Matt Arsenault	8e0fadda10	AMDGPU: Bulk update all GlobalISel tests to use opaque pointers	2022-11-28 11:51:36 -05:00
Matt Arsenault	1ab9fa6f0d	AMDGPU/GlobalISel: Fix hardcoded virtual register numbers in test	2022-11-28 08:41:31 -05:00
Ivan Kosarev	ec8ede8177	[AMDGPU][CodeGen] Support raw format TFE buffer loads other than byte, short and d16 ones. Differential Revision: https://reviews.llvm.org/D138215	2022-11-24 10:50:26 +00:00
Ruiling Song	0eaf6759ae	[AMDGPU][InsertWaits] No wait for WAW for global/scratch_load global/scratch_load will return in order they are issued. No need to insert a s_waitcnt for WAW hazard. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D138476	2022-11-23 09:57:50 +08:00
Pierre van Houtryve	9e7febb4f7	[AMDGPU][GISel] Select llvm.amdgcn.fcmp intrinsics Adds FP CCs opcodes/selection logic, including src mods selection Depends on D136591, D136448 Resolves #58326 (https://github.com/llvm/llvm-project/issues/58326) Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D136592	2022-11-22 14:18:58 +00:00
Pierre van Houtryve	a751676f98	[AMDGPU][GISel] Add llvm.amdgcn.icmp selection Add missing logic to select i16 variants and enable GISel testing. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136448	2022-11-22 08:26:50 +00:00

1 2 3 4 5 ...

1776 Commits