llvm-project

Author	SHA1	Message	Date
Joe Nash	b67ea3d0c9	[AMDGPU] Allow no-modifier operands in cvtDPP NFC, since no instructions have their AsmMatchConverter changed, but prepares for that to happen. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103046 Change-Id: I6afefad899076de7b9a412374d09b95b29e012fa	2021-05-25 10:58:06 -04:00
Joe Nash	67c3707b31	[AMDGPU] More accurate names for dpp operand types NFC. Renames the variable in the dpp input operand generators from DstRC to OldRC, because that is what it actually sets. Also documents the importance of setting HasModifiers = 0 in the dpp8 asm string. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D103047 Change-Id: Ice69ae38f644de7f228a75ca47c43e88b1f7d9e1	2021-05-25 10:35:25 -04:00
Christudasan Devadasan	e3b8e6d482	[AMDGPU] Remove dead declaration (NFC).	2021-05-25 16:04:04 +05:30
Christudasan Devadasan	90d784053f	AMDGPU/GlobalISel: Legalize G_[SU]DIVREM instructions Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D100726	2021-05-25 10:51:07 +05:30
Stanislav Mekhanoshin	4902885863	[AMDGPU] Request module used variables from LDS lowering as internal I do not see any practical difference but technically used.* variables are internal and a call to getGlobalVariable misses true as a second argument. NFC as far as I can tell. Differential Revision: https://reviews.llvm.org/D102884	2021-05-20 20:55:47 -07:00
Stanislav Mekhanoshin	748db5bfac	[AMDGPU] Fix module LDS selection Accesses to global module LDS variable start from null, but kernel also thinks its variables start address is null. Fixed by not using a null as an address. Differential Revision: https://reviews.llvm.org/D102882	2021-05-20 15:59:01 -07:00
Jay Foad	092a3ce569	[AMDGPU] Fix typo in comment	2021-05-18 10:15:49 +01:00
Stanislav Mekhanoshin	45764efb69	[AMDGPU] Do not check denorm for LDS FP atomic with unsafe flag This is already how it is handled for global and flat atomics. Differential Revision: https://reviews.llvm.org/D102366	2021-05-17 16:53:09 -07:00
Stanislav Mekhanoshin	f4c0fdc6c9	[AMDGPU] Set unused dst_sel to '?' in the encoding This is to allow disasm with any bits in the unused fields. Differential Revision: https://reviews.llvm.org/D102526	2021-05-17 08:38:52 -07:00
Jay Foad	472f856714	[AMDGPU] Tweak VOP3_INTERP16 profile Set the output register class based on the output type, instead of hard-coding VGPR_32. I think this is more correct. It doesn't make any difference at the moment because we use the same class for 16- and 32-bit results, but it might in future if we make more use of true 16-bit register classes. Differential Revision: https://reviews.llvm.org/D102622	2021-05-17 15:28:00 +01:00
Brendon Cahoon	3f7b7e7393	[AMDGPU] Update SCC defs to VCC when uses are changed to VCC The FixSGPRCopies pass converts instructions to VALU when removing illegal VGPR to SGPR copies. Instructions that use SCC are changed to use VCC instead. When that happens, the pass must also change instructions that define SCC to define VCC. The pass was not changing the SCC definition when an ADDC is converted due to a input that is a VGPR to SGPR copy. But, the initial ADD insruction, which define SCC, is not converted. This causes a compilation failure due to a use of an undefined physical register. This patch adds code that inserts the SCC definition in the MoveToVALU worklist when a SCC use is converted to a VCC use. Differential Revision: https://reviews.llvm.org/D102111	2021-05-14 18:05:05 -04:00
Stanislav Mekhanoshin	6fb02596a2	[AMDGPU] Add support for architected flat scratch Add support for the readonly flat Scratch register initialized by the SPI. Differential Revision: https://reviews.llvm.org/D102432	2021-05-14 10:53:48 -07:00
Matt Arsenault	c7cff08f79	AMDGPU: Fix assert when rewriting saddr d16 loads moveOperands does not handle moving tied operands since it would generally have to fixup the tied operand references. Avoid the assert by untying and retying after the modification. These in place modifications really aren't managable.	2021-05-14 13:24:19 -04:00
Jay Foad	7f81c5a5ba	[AMDGPU] getMemOperandsWithOffset: add vaddr operand for stack access BUF instructions A consequence is that checkInstOffsetsDoNotOverlap can now distinguish sp+offset from fp+offset, so it knows that it shouldn't try to work out whether the accesses overlap just by comparing the offsets. For example in these two instructions: MIR: BUFFER_STORE_DWORD_OFFSET %0:vgpr_32(s32), $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into stack + 4, addrspace 5) %4:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0.alloca, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from `i8 addrspace(5)* undef`, addrspace 5) ISA: buffer_store_dword v0, off, s[0:3], s32 offset:4 buffer_load_dword v0, off, s[0:3], s34 Differential Revision: https://reviews.llvm.org/D73957	2021-05-14 10:10:43 +01:00
David Stuttard	31b62aa162	[AMDGPU] Fix codegen of image intrinsics for g16 and a16 For gfx10 gradient (g16) and address (a16) can be independent. Previous implementation assumed that a16 implied g16. There are some other changes that fix the verification (as well as asm/disasm) that are required for the included test to pass - the XFAIL will be removed in those changes. This also includes required fixes for GlobalISel Differential Revision: https://reviews.llvm.org/D102066 Change-Id: I7d171cc90994de05f41669b66a6d0ffa2ed05d09	2021-05-14 09:28:15 +01:00
David Stuttard	72d570ca08	[AMDGPU][AsmParser/Disassembler] Correct A16 and G16 handling A16 support for image instructions assembly/disassembly (gfx10) was missing Also refactor MIMG op addr size calcs to common function We'd got 3 places where the same operation was being done. One test is now marked XFAIL until a related codegen patch is in place Differential Revision: https://reviews.llvm.org/D102231 Change-Id: I7e86e730ef8c71901457855cba570581f4f576bb	2021-05-14 09:25:44 +01:00
Carl Ritson	9cf6ff7aff	[AMDGPU] Do not clause NSA instructions To ensure correct behaviour NSA instructions should not be claused. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D102211	2021-05-14 12:54:56 +09:00
Matt Arsenault	85394d9ed7	AMDGPU/GlobalISel: Don't hardcode stack alignment in assert message	2021-05-13 19:00:13 -04:00
Matt Arsenault	6a70874d27	AMDGPU/GlobalISel: Implement tail calls Or at least the sibling call cases which the DAG already handles.	2021-05-13 18:57:42 -04:00
Aakanksha Patil	464e4dc50f	[AMDGPU] Add gfx1034 target Differential Revision: https://reviews.llvm.org/D102306	2021-05-13 14:25:18 -04:00
Stanislav Mekhanoshin	8f98356bb5	[AMDGPU] Only allow global fp atomics with unsafe option Previously we were allowing to use FP atomics without -amdgpu-unsafe-fp-atomics option if a scope is less then system. This is not safe just as well if we have UC memory. This change only allows global and flat FP atomics with the unsafe option. Consequentially that makes a check for denorm mode redundant since we skip it with the unsafe option and do not have a way to produce these instructions without it anyway. Differential Revision: https://reviews.llvm.org/D102347	2021-05-13 08:52:20 -07:00
Stanislav Mekhanoshin	bd00106d1e	[AMDGPU] Refactor shouldExpandAtomicRMWInIR(). NFC. This is logic simplification for better readability. Differential Revision: https://reviews.llvm.org/D102371	2021-05-12 16:39:03 -07:00
Baptiste Saleil	5885f1a4cb	[AMDGPU] Disable the SIFormMemoryClauses pass at -O1 This patch disables the SIFormMemoryClauses pass at -O1. This pass has a significant impact on compilation time, so we only want it to be enabled starting from -O2. Differential Revision: https://reviews.llvm.org/D101939	2021-05-12 11:51:59 -04:00
Craig Topper	44e0e91db0	[ValueTypes] Rename MVT::getVectorNumElements() to MVT::getVectorMinNumElements(). Fix some misuses of getVectorNumElements() getVectorNumElements() returns a value for scalable vectors without any warning so it is effectively getVectorMinNumElements(). By renaming it and making getVectorNumElements() forward to it, we can insert a check for scalable vectors into getVectorNumElements() similar to EVT. I didn't do that in this patch because there are still more fixes needed, but I was able to temporarily do it and passed the RISCV lit tests with these changes. The changes to isPow2VectorType and getPow2VectorType are copied from EVT. The change to TypeInfer::EnforceSameNumElts reduces the size of AArch64's isel table. We're now considering SameNumElts to require the scalable property to match which removes some unneeded type checks. This was motivated by the bug I fixed yesterday in 80b9510806cf11c57f2dd87191d3989fc45defa8 Reviewed By: frasercrmck, sdesmalen Differential Revision: https://reviews.llvm.org/D102262	2021-05-12 07:46:45 -07:00
Julien Pagès	46adccc5cc	[AMDGPU] Improve Codegen for build_vector Improve the code generation of build_vector. Use the v_pack_b32_f16 instruction instead of v_and_b32 + v_lshl_or_b32 Differential Revision: https://reviews.llvm.org/D98081 Patch by Julien Pagès!	2021-05-12 14:17:44 +01:00
Piotr Sobczak	a4db7025a9	[AMDGPU] Remove assert Remove assert introduced in D101177, following post-commit feedback.	2021-05-12 14:52:37 +02:00
Piotr Sobczak	68137ef568	[AMDGPU] Skip invariant loads when avoiding WAR conflicts No need to handle invariant loads when avoiding WAR conflicts, as there cannot be a vector store to the same memory location. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D101177	2021-05-12 10:57:05 +02:00
Matt Arsenault	cc79aaced0	AMDGPU: Fix SILoadStoreOptimizer for gfx90a This was hardcoding the register class to use for the newly created pointer registers, violating the aligned VGPR requirement.	2021-05-11 21:26:43 -04:00
Matt Arsenault	a15ed701ab	AMDGPU: Fix assert on constant load from addrspacecasted pointer This was trying to create a bitcast between different address spaces.	2021-05-11 20:12:20 -04:00
Matt Arsenault	24e2e5df0e	GlobalISel: Split ValueHandler into assignment and emission classes Currently the ValueHandler handles both selecting the type and location for arguments, as well as inserting instructions needed to handle them. Split this so that the determination of the argument handling is independent of the function state. Currently the checks for tail call compatibility do not follow the full assignment logic, so it misses cases where arguments require nontrivial legalization. This should help avoid targets ending up in a buggy state where the argument evaluation may change in different contexts.	2021-05-11 19:50:12 -04:00
Austin Kerbow	4433f4601e	[AMDGPU] Fix extra waitcnt being added with BUFFER_INVL2 The waitcnt pass would increment the number of vmem events for some buffer invalidates that were not handled by the pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D102252	2021-05-11 13:17:33 -07:00
Aakanksha Patil	c58912eca7	Fix typo "Execpt" in comments Differential Revision: https://reviews.llvm.org/D101858	2021-05-11 10:47:01 -04:00
Piotr Sobczak	09fe84abb4	[AMDGPU] Move code sinking before structurizer Moving code sinking pass before structurizer creates more sinking opportunities. The extra flow edges introduced by the structurizer can have adverse effects on sinking, because the sinking pass prefers moving instructions to blocks with unique predecessors and the structurizer destroys that property in some cases. A notable example is moving high-latency image instructions across kills. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D101115	2021-05-11 14:07:23 +02:00
Petar Avramovic	f6985a197e	AMDGPU/GlobalISel: Use destination register bank in applyMappingLoad Large loads on target that does not useFlatForGlobal have to be split in regbankselect. This did not happen in case when destination had vgpr bank and address had sgpr bank. Instead of checking if address bank is sgpr check bank of the destination. Differential Revision: https://reviews.llvm.org/D101992	2021-05-10 10:18:30 +02:00
Arthur Eubanks	34a8a437bf	[NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose Printing pass manager invocations is fairly verbose and not super useful. This allows us to remove DebugLogging from pass managers and PassBuilder since all logging (aside from analysis managers) goes through instrumentation now. This has the downside of never being able to print the top level pass manager via instrumentation, but that seems like a minor downside. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D101797	2021-05-07 21:51:47 -07:00
Sebastian Neubauer	13c0316239	[AMDGPU] Restrict immediate scratch offsets gfx9 does not work with negative offsets, gfx10 works only with aligned negative offsets, but not with unaligned negative offsets. This is slightly more conservative than needed, gfx9 does support negative offsets when a VGPR address is used and gfx10 supports negative, unaligned offsets when an SGPR address is used, but we do not make use of that with this patch. Differential Revision: https://reviews.llvm.org/D101292	2021-05-07 14:51:32 +02:00
David Stuttard	606d4e8061	AMDGPU: Correct const_index_stride for wave 32 for PAL ABI Retrying after revert and fix (removed implicit def flag from operand). Now passes with expensive_checks enabled. Since there is a single scratch resource descriptor for all shaders, if there is a wave32 and a wave64 shader (for instance for VsFs pairs) then the const_index_stride will be incorrect for wave32 shaders. Differential Revision: https://reviews.llvm.org/D101830 Change-Id: Ie3b8b2921237968caca91527dd0c97b1b0cc0360	2021-05-07 13:42:57 +01:00
Simon Pilgrim	280aa3415e	[DAG] Add a generic expansion for SHIFT_PARTS opcodes using funnel shifts Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts. Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication. I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to). NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly. Differential Revision: https://reviews.llvm.org/D101987	2021-05-07 13:12:30 +01:00
David Stuttard	793b4b2603	Revert "AMDGPU: Correct const_index_stride for wave 32 for PAL ABI" This reverts commit 442de0c1adf36bfddb5fb66b442bba8999fa733b.	2021-05-07 12:49:17 +01:00
David Stuttard	442de0c1ad	AMDGPU: Correct const_index_stride for wave 32 for PAL ABI Since there is a single scratch resource descriptor for all shaders, if there is a wave32 and a wave64 shader (for instance for VsFs pairs) then the const_index_stride will be incorrect for wave32 shaders. Differential Revision: https://reviews.llvm.org/D101830 Change-Id: Id8de5566b0d1a07a814e2e7db016df9d20bf6d2c	2021-05-07 12:19:49 +01:00
Sebastian Neubauer	98e5ede604	[AMDGPU] Serialize MFInfo::ScavengeFI Serialize ScavengeFI from SIMachineFunctionInfo into yaml. ScavengeFI is not used outside of the PrologEpilogInserter, so this shouldn't change anything. Differential Revision: https://reviews.llvm.org/D101367	2021-05-07 11:15:25 +02:00
Stanislav Mekhanoshin	c714d03785	[AMDGPU] Expose __builtin_amdgcn_perm for v_perm_b32 Differential Revision: https://reviews.llvm.org/D102022	2021-05-06 16:17:33 -07:00
Stanislav Mekhanoshin	28f1d018b1	[AMDGPU] Fix 64 bit DPP validation AMDGPUAsmParser::isSupportedDPPCtrl() was failing to correctly find a DPP register operand, regadless of the position it is always src0. Moved this check into a new validateDPP() method where we have full instruction already. In particular it was failing to reject this case: v_cvt_u32_f64 v5, v[0:1] quad_perm:[0,2,1,1] row_mask:0xf bank_mask:0xf Essentially it was broken for any case where size of dst and src0 differ. It also improves the diagnostics with a proper error message. The check in the InstPrinter also drops verification of the dst register as it does not have anything to do with the dpp operand. Differential Revision: https://reviews.llvm.org/D101930	2021-05-06 08:40:26 -07:00
Austin Kerbow	172d746e16	[AMDGPU][NFC] Fix typos in SIFormMemoryClauses description NFC.	2021-05-06 07:47:39 -07:00
Jay Foad	9e026273b0	[AMDGPU] SIInsertHardClauses: move more stuff into the class. NFC.	2021-05-06 14:47:54 +01:00
Carl Ritson	67cfefebbb	[AMDGPU] Fix WQM failure with single block inactive demote Instruction test for inactive kill/demote needs to be based on actual opcode not whether instruction would be lowered to demote. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D101966	2021-05-06 21:02:26 +09:00
Jay Foad	7c706af03b	[AMDGPU] SIFoldOperands: clean up tryConstantFoldOp First clean up the strange API of tryConstantFoldOp where it took an immediate operand value, but no indication of which operand it was the value for. Second clean up the loop that calls tryConstantFoldOp so that it does not have to restart from the beginning every time it folds an instruction. This is NFCI but there are some minor changes caused by the order in which things are folded. Differential Revision: https://reviews.llvm.org/D100031	2021-05-06 09:55:22 +01:00
Stanislav Mekhanoshin	ab90ae6f47	[AMDGPU] Switch AnnotateUniformValues to MemorySSA This shall speedup compilation and also remove threshold limitations used by memory dependency analysis. It also seem to fix the bug in the coalescer_remat.ll where an SMRD load was used in presence of a potentially clobbering store. Fixes: SWDEV-272132 Differential Revision: https://reviews.llvm.org/D101962	2021-05-05 18:34:41 -07:00
Austin Kerbow	6617a5a5ea	[AMDGPU] Move insertion of function entry waitcnt later This allows tracking these as preexisting waitcnt. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D101380	2021-05-05 17:58:38 -07:00
Austin Kerbow	f5199d7ae0	[AMDGPU] Revise handling of preexisting waitcnt Preexisting waitcnt may not update the scoreboard if the instruction being examined needed to wait on fewer counters than what was encoded in the old waitcnt instruction. Fixing this results in the elimination of some redudnat waitcnt. These changes also enable combining consecutive waitcnt into a single S_WAITCNT or S_WAITCNT_VSCNT instruction. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100281	2021-05-05 17:21:33 -07:00

... 3 4 5 6 7 ...

6209 Commits