llvm-project

Author	SHA1	Message	Date
Austin Kerbow	62bcfcb5a5	[AMDGPU] Add llvm.amdgcn.s.setprio intrinsic Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D120976	2022-03-12 22:15:42 -08:00
Stanislav Mekhanoshin	31f215ab0c	[AMDGPU] Support v_mov_b64 in dpp combine Differential Revision: https://reviews.llvm.org/D121411	2022-03-11 11:37:32 -08:00
Nikita Popov	3ed643ea76	[AMDGPUPromoteAlloca] Make compatible with opaque pointers This mainly changes the handling of bitcasts to not check the types being casted from/to -- we should only care about the actual load/store types. The GEP handling is also changed to not care about types, and just make sure that we get an offset corresponding to a vector element. This was a bit of a struggle for me, because this code seems to be pretty sensitive to small changes. The end result seems to produce strictly better results for the existing test coverage though, because we can now deal with more situations involving bitcasts. Differential Revision: https://reviews.llvm.org/D121371	2022-03-11 09:20:51 +01:00
Stanislav Mekhanoshin	c7f25b6fd4	[AMDGPU] Updated some tests to run on gfx940. NFC.	2022-03-10 12:34:24 -08:00
alex-t	d159b4444c	[AMDGPU] Enable divergence predicates for negative inline constant subtraction We have a pattern that undo sub x, c -> add x, -c canonicalization since c is more likely an inline immediate than -c. This patch enables it to select scalar or vector subtracion by the input node divergence. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D121360	2022-03-10 15:03:22 +03:00
Nikita Popov	eb4037ff42	[AMDGPU] Fix regenerated test checks (NFC) I used the wrong build to generate the checks, sorry :(	2022-03-10 11:56:17 +01:00
Nikita Popov	611da6b582	[AMDGPU] Regenerate test checks (NFC)	2022-03-10 11:53:45 +01:00
Nikita Popov	eaac3484ab	[AMDGPU] Regenerate test checks (NFC) Also rename variables to avoid file check clash.	2022-03-10 11:32:45 +01:00
Carl Ritson	3cb9af1be2	[MachineSink] Pre-commit test for D121277. NFC.	2022-03-10 11:33:06 +09:00
Xiang1 Zhang	c31014322c	TLS loads opimization (hoist) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120000	2022-03-10 09:29:06 +08:00
Stanislav Mekhanoshin	0be6fd44f3	[SDAG] Use MMO flags in MemSDNode folding SDNodes with different target flags may now be folded together rightfully resulting in the assertion in the refineAlignment. Folding nodes with different target flags may result in the wrong load instructions produced at least on the AMDGPU. Fixes: SWDEV-326805 Differential Revision: https://reviews.llvm.org/D121335	2022-03-09 14:25:22 -08:00
Changpeng Fang	0f20a35b9e	AMDGPU: Set up User SGPRs for queue_ptr only when necessary Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In current implementation, user SGPRs are set up unnecessarily for some cases. If the target has aperture registers, queue_ptr is not needed to reference aperture bases. For trap handling, if target suppots getDoorbellID, queue_ptr is also not necessary. Futher, code object version 5 introduces new kernel ABI which passes queue_ptr as an implicit kernel argument, so user SGPRs are no longer necessary for queue_ptr. Based on the trap handling document: https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table, llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap in the backend. Reviewers: sameerds, arsenm Fixes: SWDEV-307189 Differential Revision: https://reviews.llvm.org/D119762	2022-03-09 10:14:05 -08:00
Stanislav Mekhanoshin	33fb23f728	[AMDGPU] Merge flat with global in the SILoadStoreOptimizer Flat can be merged with flat global since address cast is a no-op. A combined memory operation needs to be promoted to flat. Differential Revision: https://reviews.llvm.org/D120431	2022-03-09 10:04:37 -08:00
Vang Thao	28322c2514	[AMDGPU] Add scheduler pass to rematerialize trivial defs Add a new pass in the pre-ra AMDGPU scheduler to check if sinking trivially rematerializable defs that only has one use outside of the defining block will increase occupancy. If we can determine that occupancy can be increased, then rematerialize only the minimum amount of defs required to increase occupancy. Also re-schedule all regions that had occupancy matching the previous min occupancy using the new occupancy. This is based off of the discussion in https://reviews.llvm.org/D117562. The logic to determine the defs we should collect and determining if sinking would be beneficial is mostly the same. Main differences is that we are no longer limiting it to immediate defs and the def and use does not have to be part of a loop. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D119475	2022-03-09 09:34:33 -08:00
Jay Foad	28f67aed9d	[AMDGPU] Fix some confusing check prefixes. NFC. Tahiti is SI/GFX6. Kaveri and Hawaii are CI/GFX7. Fiji is VI/GFX8.	2022-03-09 17:05:49 +00:00
Venkata Ramanaiah Nalamothu	04fff547e2	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm, ronlieb Differential Revision: https://reviews.llvm.org/D114652	2022-03-09 12:18:02 +05:30
Arthur Eubanks	b81d5baa0f	[test] Use new PM for -aa-eval tests	2022-03-08 14:15:53 -08:00
Stanislav Mekhanoshin	9eabea3968	[AMDGPU] Set noclobber metadata on loads instead of cast to constant A load via pointer cast to constant will return true from pointsToConstantMemory which is not necessarily so. Fixes: SWDEV-326463 Differential Revision: https://reviews.llvm.org/D121172	2022-03-07 23:13:02 -08:00
Christudasan Devadasan	0d849b8249	AMDGPU: Skip folding REG_SEQUENCE if found unknown regclasses for its users Use TII::getRegClass to return a valid regclass or a nullptr if the RC is unknown for a given OpIdx. This fixes a potential crash occurred while getting the RC from a variadic instruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D120813	2022-03-08 10:11:57 +05:30
Stanislav Mekhanoshin	932f628121	[AMDGPU] new gfx940 fp atomics Differential Revision: https://reviews.llvm.org/D121028	2022-03-07 12:32:02 -08:00
Stanislav Mekhanoshin	e7b362d75d	[AMDGPU] Add v_mov_b64 gfx940 opcode Differential Revision: https://reviews.llvm.org/D121023	2022-03-07 12:07:12 -08:00
Stanislav Mekhanoshin	2c830c8fab	[AMDGPU] gfx940: support V_FMAMK_F32 and V_FMAAK_F32 Differential Revision: https://reviews.llvm.org/D120769	2022-03-07 11:31:01 -08:00
Venkata Ramanaiah Nalamothu	e1069c1288	[AMDGPU] Ensure return address is save/restored if clobbered or when function has calls This test is to make sure the return address registers, if clobbered in the function or when the function has calls, are save/restored irrespective of whether the IPRA is enabled/disabled. This test is found to be not save/restore the return address registers, when clobbered in the function, with the corresponding downstream changes of D114652. The test could not be reduced further as the register allocator needs enough register pressure so that it allocates the return address registers as well. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D120922	2022-03-07 22:01:32 +05:30
Austin Kerbow	8d0c34fd4f	[AMDGPU] Omit unnecessary waitcnt before barriers It is not necessary to wait for all outstanding memory operations before barriers on hardware that can back off of the barrier in the event of an exception when traps are enabled. Add a new subtarget feature which tracks which HW has this ability. Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D120544	2022-03-07 08:23:53 -08:00
Jay Foad	e8e301ed92	[AMDGPU] Extra test cases in hard-clauses.mir Add some cases where different kinds of instruction might be combined in the same hard clause.	2022-03-04 12:46:59 +00:00
Jay Foad	b79840a472	[AMDGPU] Regenerate checks in hard-clauses.mir	2022-03-04 12:46:59 +00:00
Aakanksha	840695814a	[AMDGPU] Add gfx1036 target Differential Revision: https://reviews.llvm.org/D120846	2022-03-02 23:26:38 +00:00
Stanislav Mekhanoshin	ad786f5a5c	[AMDGPU] Fix 3 tests with expensive checks. NFC. Image instructions are now not available for all targets anymore, so a generic target cannot use it.	2022-03-02 14:56:48 -08:00
Stanislav Mekhanoshin	2e2e64df4a	[AMDGPU] Add gfx940 target This is target definition only. Differential Revision: https://reviews.llvm.org/D120688	2022-03-02 13:54:48 -08:00
Jay Foad	5ddfedc956	[AMDGPU] Fix deleting of move-immediate instructions after folding SIInstrInfo::FoldImmediate tried to delete move-immediate instructions after folding them into their only use. This did not work because it was checking hasOneNonDBGUse after doing the fold, at which point there should be no uses. This seems to have no effect on codegen, it just means less stuff for DCE to clean up later. Differential Revision: https://reviews.llvm.org/D120815	2022-03-02 16:11:16 +00:00
Jay Foad	8bed52c9eb	[AMDGPU] Make more use of madmk/fmamk instructions In convertToThreeAddress handle VOP2 mac/fmac instructions with a literal src0 operand, since these are prime candidates for converting to madmk/fmamk. Previously this would only happen if src0 (or src1) was a register defined by a move-immediate instruction, but in many cases these operands have already been folded because SIFoldOperands runs before TwoAddressInstructionPass. Differential Revision: https://reviews.llvm.org/D120736	2022-03-02 10:22:10 +00:00
Xiang1 Zhang	65588a0776	Revert "TLS loads opimization (hoist)" Revert for more reviews This reverts commit 30e612ebdfb0f243eb63d93487790a53c26ae873.	2022-03-02 14:10:11 +08:00
Abinav Puthan Purayil	8b4ab01c38	[AMDGPU] Select no-return atomic ops in BUFInstructions.td This change adds the selection of no-return buffer_* instructions in tblgen. The motivation for this is to get the no-return atomic isel working without relying on post-isel hooks so that GlobalISel can start selecting them (once GlobalISelEmitter allows no return atomic patterns like how DAGISel does). This change handles the selection of no-return mubuf_atomic_cmpswap in tblgen without changing the extract_subreg generation for the return variant. This handling was done by the post-isel hook. Differential Revision: https://reviews.llvm.org/D120538	2022-03-02 08:25:28 +05:30
Xiang1 Zhang	30e612ebdf	TLS loads opimization (hoist) Reviewed By: Wang Pheobe, Topper Craig Differential Revision: https://reviews.llvm.org/D120000	2022-03-02 10:37:24 +08:00
Jay Foad	289339140e	[AMDGPU] Handle legacy multiply-accumulate opcodes in convertToThreeAddress Handle V_MAC_LEGACY_F32 and V_FMAC_LEGACY_F32 in convertToThreeAddress, to avoid the need for an extra mov instruction in some cases. Differential Revision: https://reviews.llvm.org/D120704	2022-03-01 16:58:00 +00:00
Jay Foad	f9c545e1e2	[AMDGPU] Fix test_fmaak_otherimm_src0_f64 test Judging by the name, and comparing with the f32 version, this was supposed to be testing that FMAC with a non-inlinable constant operand did not get converted to FMA.	2022-03-01 16:35:19 +00:00
Joe Nash	fa55ac6c27	[UpdateTestChecks][AMDGPU] Run test update script NFC. Run the mir test auto-update script. These tests haven't been updated since the script changed from inserting CHECK to CHECK-NEXT.	2022-03-01 10:45:03 -05:00
Jay Foad	68895098d1	[AMDGPU] Preserve src2_modifiers in convertToThreeAddress Found by code inspection. I don't think it makes a difference with current codegen, because if any source modifiers were present we would have selected mad/fma instead of mac/fmac in the first place. Differential Revision: https://reviews.llvm.org/D120709	2022-03-01 14:48:25 +00:00
Jay Foad	3a32a445ae	[AMDGPU] Precommit tests for D120709	2022-03-01 11:15:33 +00:00
Stanislav Mekhanoshin	517171ce20	[AMDGPU] Extend SILoadStoreOptimizer to handle flat load/stores TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120351	2022-02-28 11:27:30 -08:00
Changpeng Fang	ca62b1db9f	[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg Summary: Emit metadata for hidden_heap_v1 kernarg Reviewers: sameerds, b-sumner Fixes: SWDEV-307188 Differential Revision: https://reviews.llvm.org/D119027	2022-02-25 10:45:35 -08:00
Carl Ritson	565af157ef	[AMDGPU] Extend pre-emit peephole to redundantly masked VCC Extend pre-emit peephole for S_CBRANCH_VCC[N]Z to eliminate redundant S_AND operations against EXEC for V_CMP results in VCC. These occur after after register allocation when VCC has been selected as the comparison destination. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D120202	2022-02-25 10:18:31 +09:00
Jay Foad	05d79e3562	[AMDGPU] Divergence-driven instruction selection for bitreverse Differential Revision: https://reviews.llvm.org/D119702	2022-02-24 20:21:59 +00:00
Stanislav Mekhanoshin	3279e44063	[AMDGPU] Extend SILoadStoreOptimizer to handle global stores TODO: merge flat load/stores. TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120346	2022-02-24 11:09:51 -08:00
Stanislav Mekhanoshin	cefa1c5ca9	[AMDGPU] Fix combined MMO in load-store merge Loads and stores can be out of order in the SILoadStoreOptimizer. When combining MachineMemOperands of two instructions operands are sent in the IR order into the combineKnownAdjacentMMOs. At the moment it picks the first operand and just replaces its offset and size. This essentially loses alignment information and may generally result in an incorrect base pointer to be used. Use a base pointer in memory addresses order instead and only adjust size. Differential Revision: https://reviews.llvm.org/D120370	2022-02-24 10:47:57 -08:00
Jay Foad	719bac55df	[MIRParser] Diagnose too large align values in MachineMemOperands When parsing MachineMemOperands, MIRParser treated the "align" keyword the same as "basealign". Really "basealign" should specify the alignment of the MachinePointerInfo base value, and "align" should specify the alignment of that base value plus the offset. This worked OK when the specified alignment was no larger than the alignment of the offset, but in cases like this it just caused confusion: STW killed %18, 4, %stack.1.ap2.i.i :: (store (s32) into %stack.1.ap2.i.i + 4, align 8) MIRPrinter would never have printed this, with an offset of 4 but an align of 8, so it must have been written by hand. MIRParser would interpret "align 8" as "basealign 8", but I think it is better to give an error and force the user to write "basealign 8" if that is what they really meant. Differential Revision: https://reviews.llvm.org/D120400 Change-Id: I7eeeefc55c2df3554ba8d89f8809a2f45ada32d8	2022-02-24 15:32:08 +00:00
Jay Foad	aa1e5fbc9b	[AMDGPU] Fix permissions on test files	2022-02-24 12:17:54 +00:00
Nikita Popov	a266af7211	[InstCombine] Canonicalize SPF to min/max intrinsics Now that integer min/max intrinsics have good support in both InstCombine and other passes, start canonicalizing SPF min/max to intrinsic min/max. Once this sticks, we can stop matching SPF min/max in various places, and can remove hacks we have for preventing infinite loops and breaking of SPF canonicalization. Differential Revision: https://reviews.llvm.org/D98152	2022-02-24 09:01:20 +01:00
Jay Foad	e66b1b7385	[AMDGPU] Split fp min/max atomics test. NFC. Split out f32 buffer, f64 buffer and image atomics. This just makes it easier to test subtargets that only have some of these instructions. Differential Revision: https://reviews.llvm.org/D120407	2022-02-23 15:00:49 +00:00
Stanislav Mekhanoshin	939d62c185	[AMDGPU] Pre-commit load/store combine tests. NFC.	2022-02-22 16:28:44 -08:00

1 2 3 4 5 ...

5335 Commits