llvm-project

Author	SHA1	Message	Date
Petar Avramovic	cbc378ecb8	GlobalISel: Artifact combine merge-like and unmerges into merge-like Recognize when sub-vectors have been split to elements which are used to build large vector. This happens when instructions have different vector sizes available. For example a few arithmetic instruction are required to process all elements of larger vector that can be stored using one instruction. Differential Revision: https://reviews.llvm.org/D109242	2022-10-24 13:33:06 +02:00
Petar Avramovic	e6c778f861	GlobalISel: Artifact combine merge-like and unmerge into unmerge Recognize when source could have been unmerged to pieces with DstTy without having to split source to smaller elements and then merge small elements into DstTy pieces. This happens when vector was meant to be split to sub-vectors but there was leftover. At this point artifact combiner have already dealt with leftover and we can continue to use sub-vectors. Differential Revision: https://reviews.llvm.org/D109241	2022-10-24 13:33:05 +02:00
Petar Avramovic	f1aa598046	GlobalISel: Artifact combine merge-like and unmerge into copy Recognize copy that is represented as split of a source register to elements that were reassembled to another register with the same type. Differential Revision: https://reviews.llvm.org/D109240	2022-10-24 13:33:05 +02:00
Petar Avramovic	51b98db487	GlobalISel: Precommit for artifact combine patches Differential Revision: https://reviews.llvm.org/D117655	2022-10-24 13:33:05 +02:00
Pierre van Houtryve	ed5fe7f3a1	[AMDGPU][GISel] Re-enable some working tests These tests had been commented out but seem to not be crashing. Not sure if codegen is perfect in each of them, but even if it's not I think it's better to put a TODO to fix codegen than remove the test outright, unless codegen is plain wrong (then I'd still rather XFAIL rather than hide it) Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136341	2022-10-21 06:39:40 +00:00
Peter Rong	c2e7c9cb33	[CodeGen] Using ZExt for extractelement indices. In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`. This is because IRTranslator uses SExt for indices. In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt. This change includes both documentation, SelectionDAG and IRTranslator. We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86 This patch fixes issue #57452. Differential Revision: https://reviews.llvm.org/D132978	2022-10-15 15:45:35 -07:00
Jessica Paquette	0f1a51e173	[GlobalISel] Allow vectors in redundant or + add combines We support KnownBits for vectors, so we can enable these. https://godbolt.org/z/r9a9W4Gj1 Differential Revision: https://reviews.llvm.org/D135719	2022-10-11 15:31:09 -07:00
Pierre van Houtryve	4d815bfae0	[GISel] Add redundant bitcast folding combine Simply folds away bitcasts that cancel each other. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135146	2022-10-11 15:03:08 +00:00
Weining Lu	42b70793a1	Reland "[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC" Reference: https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html k: A memory operand whose address is formed by a base register and (optionally scaled) index register. m: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as st.w and ld.w. ZB: An address that is held in a general-purpose register. The offset is zero. ZC: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as ll.w and sc.w. Note: The INLINEASM SDNode flags in below tests are updated because the new introduced enum `Constraint_k` is added before `Constraint_m`. llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll llvm/test/CodeGen/X86/callbr-asm-kill.mir This patch passes `ninja check-all` on a X86 machine with all official targets and the LoongArch target enabled. Differential Revision: https://reviews.llvm.org/D134638	2022-10-11 19:51:48 +08:00
Pierre van Houtryve	36c3833783	[GISel] Add Trunc/Lshr/BuildVector Folding Similar to the current "Trunc/BuildVector" folding - which folds low element extracts of BuildVectors, folds hi element extracts done using bitshifts. For D134354 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135148	2022-10-07 08:44:03 +00:00
Pierre van Houtryve	bb71079e30	[AMDGPU][GISel] Add missing V2S16 BUILD_VECTOR_TRUNC legalization Previously we would be unable to legalize V2S16 BUILD_VECTOR_TRUNC on GFX8 & below as the custom legalization was missing. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135149	2022-10-06 06:48:53 +00:00
Carl Ritson	c316332e17	[Sink] Allow sinking of invariant loads across critical edges Invariant loads can always be sunk. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D135133	2022-10-06 09:21:12 +09:00
Pierre van Houtryve	c93104073c	[AMDGPU] Always lower SHUFFLE_VECTOR Make it illegal, remove InstructionSelector logic for it Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134967	2022-10-04 14:23:17 +00:00
jeff	f4e6149d82	[AMDGPU] Use V_PERM to match buildvectors when inputs are not canonicalized (i.e. can't use V_PACK) If we can not prove that f16 operands of a buildvector are canonicalized, then we can not lower into a V_PACK. In this scenario, we would previously lower into some combination of and(sdwa), shr, or. This patch allows for matching into V_PERM instead. Change-Id: Ifa4a74fdb81ef44f22ba490c7fdf81ec8aebc945	2022-10-03 12:58:29 -07:00
Pierre van Houtryve	d8258508d4	[AMDGPU][GISel] Update `isCanonicalized` Recognize more opcodes in the function. Fixes some regressions introduced in D134857 for fdiv.f16 too. Depends on D134857 Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D134862	2022-09-30 14:13:35 +00:00
Pierre van Houtryve	7388520d1c	[GISel] Add more cases to isKnownNeverNaN Make it even with the DAG implementation as of D134854 Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D134857	2022-09-30 14:10:56 +00:00
Pierre van Houtryve	653beae5a1	[AMDGPU][GISel] Add Identity BUILD_VECTOR Combines Folds-away BUILD_VECTOR-related noops in the post-legalizer combiner. Depends on D134433 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134953	2022-09-30 14:07:13 +00:00
Pierre van Houtryve	9a67a6b72a	[AMDGPU][GISel] Legalize V2S16 G_BUILD_VECTOR Preparation patch for D134354 to make V2S16 G_BUILD_VECTOR legal. Also removes RegBankInfo's scalarization of small BUILD_VECTORs, replacing it with InstructionSelector logic instead. This allows for V2S16 BUILD_VECTOR instructions to survive all the way to ISel so we can select FMA/MAD_MIX instructions in D134354. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134433	2022-09-30 14:04:53 +00:00
Jessica Paquette	1eb49bbab6	[GlobalISel][CallLowering] Use hasRetAttr for return flags on CallBases Given something like this: ``` declare signext i16 @signext_callee() define i32 @caller() { %res = call i16 @signext_callee() ... } ``` CallLowering would miss that signext_callee's return value is sign extended, because it isn't on the call. Use hasRetAttr on the CallBase to allow us to catch this. (This now inserts G_ASSERT_SEXT/G_ASSERT_ZEXT like in the original review.) Differential Revision: https://reviews.llvm.org/D86228	2022-09-28 19:38:24 -07:00
Jay Foad	ddfa0f62d8	[AMDGPU] Add GFX11 feature for subtargets with more VGPRs The full complement of physical VGPRs for GFX11 is 50% more than GFX10. Some subtargets have this, others stay the same as GFX10. This affects occupancy calculations. Differential Revision: https://reviews.llvm.org/D134522	2022-09-23 20:18:23 +01:00
Petar Avramovic	6db7921b65	AMDGPU: Use tablegen patterns for buffer global and flat atomic fadd Remove manual selection for atomic fadd from global-isel. Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD which corresponds to llvm-ir's atomicrmw fadd instruction. global and flat atomic fadd patterns changes: Split rtn/no-rtn patterns Add missing patterns or fix predicates Remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors). Patterns now check addrspace of pointer, added patterns for flat intrinsic. with global addrspace pointer that selects into global atomic instruction. buffer atomic fadd patterns changes: Rdit patterns to import into global-isel. Remove gfx6/gfx7 _addr64 and _offset patterns. Remove patterns that can't be reached (same pattern but different feature). Differential Revision: https://reviews.llvm.org/D130579	2022-09-23 17:52:10 +02:00
Petar Avramovic	5cee9047d5	AMDGPU: Improve atomicrmw fadd selection Use same atomicrmw fadd expansion rules for gfx908, gfx940 and gfx11 as for gfx90a. Add missing globalisel legalizer support for flat atomicrmw fadd f32 on gfx940 and gfx11. Isel support for gfx11 will be added in D130579. Differential Revision: https://reviews.llvm.org/D131560	2022-09-23 17:52:10 +02:00
Petar Avramovic	48968c47b0	AMDGPU: Add detailed buffer, global and flat atomic fadd tests Precommit for D130579 that will remove manual selection and use patterns from td files. Tests are grouped based on target features. All patterns have rtn and no-rtn versions. buffer atomics patterns are selected based on the intrinsic used (raw or struct) and the offset operand (imm or vgpr): _offset raw with imm offset _offen raw with vgpr offset (or large imm offset) _idxen struct with imm offset _bothen struct with vgpr offset (or large imm offset) global and flat atomics are selected via intrinsic or the atomicrmw fadd. atomicrmw tests have amdgpu-unsafe-fp-atomics=true and non-system scope since they get expanded otherwise. atomicrmw fadd does not support vector type, test float and double. global atomics patterns are selected based on address type via (global or flat) intrinsic or atomicrmw fadd with global address(addrspace(1)). 'no suffix' vgpr addrspace(1) address _saddr sgpr addrspace(1)* address flat atomics patterns are selected via (flat)intrinsic or atomicrmw fadd with flat address (* - address space 0). Differential Revision: https://reviews.llvm.org/D131561	2022-09-23 17:52:10 +02:00
Joe Nash	b982ba2a6e	[AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C Due to the encoding changes in GFX11, we had a hack in place that disables the use of VGPRs above 128. This patch removes the need for that hack. We introduce a new register class VGPR_32_Lo128 which is used for 16-bit operands of VOP1, VOP2, and VOPC instructions. This register class only has the low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1, VOP2, and VOPC instructions are correctly limited to use the first 128 VGPRs, while the other instructions can freely use all 256. We introduce new pseduo-instructions used on GFX11 which have the suffix t16 (True 16) to use the VGPR_32_Lo128 register class. Reviewed By: foad, rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D133723	2022-09-20 09:56:28 -04:00
Matt Arsenault	69153d6c0a	AMDGPU: Use GlobalPriority for largest register tuples Only do this for 16 and 32 register tuples, although we might want to extend to 8 tuples. It's incredibly expensive to spill these, and doing so majorly interferes with the ability to allocate anything else in the function. The lit tests show mostly sizeable improvements with a handful of tiny regressions with large vectors.	2022-09-15 11:45:02 -04:00
Jay Foad	3743f9afeb	[AMDGPU] Add GFX11 globalisel test coverage for fptosi/fptoui	2022-09-13 10:51:02 +01:00
Jay Foad	210e6a993d	[GlobalISel] Simplify extended add/sub to add/sub with carry Simplify extended add/sub (with carry-in and carry-out) to add/sub with carry (with carry-out only) if carry-in is known to be zero. Differential Revision: https://reviews.llvm.org/D133702	2022-09-12 17:05:44 +01:00
Joe Nash	8604904e68	[AMDGPU] Separate check lines for some GFX11 16-bit codegen tests NFC. Pre-commits test changes to have a separate CHECK line where GFX11 behavior will diverge from previous subtargets in a future patch.	2022-09-12 09:38:34 -04:00
Matt Arsenault	bb70b5d406	CodeGen: Set MODereferenceable from isDereferenceableAndAlignedPointer Previously this was assuming piontsToConstantMemory implies dereferenceable.	2022-09-12 08:38:35 -04:00
Jay Foad	8901f7cebc	[AMDGPU] Fix crash legalizing G_EXTRACT_VECTOR_ELT with negative index Fixes https://github.com/llvm/llvm-project/issues/57408 Differential Revision: https://reviews.llvm.org/D132938	2022-09-09 15:53:34 +01:00
Justin Bogner	a81c7dbf0d	[AMDGPU] Drop _oneuse checks from med3 patterns We use _oneuse checks to make sure combines won't accidentally increase code size, but this prevents the optimization in cases where we happen to want to clamp multiple values to the same range It's safe to drop these checks for two reasons: 1. The pattern of max/min operations for med3 is complicated enough it's unlikely to come up by accident, so this will still only fire when appropriate to do so 2. Even if every intermediate is used and we don't save a single operation, we still won't end up with more operations since the med3 replaces the final max/min. In pathological cases we could potentially end up with a larger encoding size or possibly slightly increased vgpr pressure, but the risk of that is low, especially considering the upside. Differential Revision: https://reviews.llvm.org/D132621	2022-09-07 16:31:49 -07:00
Justin Bogner	f9433161f5	[AMDGPU] Precommit two tests showing missed combines to v_med3	2022-08-30 11:56:09 -07:00
Pierre van Houtryve	59cf9dd923	[AMDGPU][GISel] Enable Selection of ADD3 for G_PTR_ADD Allows things like `(G_PTR_ADD (G_PTR_ADD a, b), c)` to be simplified into a single ADD3 instruction instead of two adds. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D131254	2022-08-24 14:44:19 +00:00
Luo, Yuanke	5159be3c9b	(Reland) [fastalloc] Support allocating specific register class in fastalloc This reverts commit 853bb192c407f5d9e75a5fd55cc089151530cbd3.	2022-08-20 13:25:34 +08:00
Luo, Yuanke	853bb192c4	Revert "(Reland) [fastalloc] Support allocating specific register class in fastalloc" This reverts commit 30f9e6ebd30b79d13f99eaca4d829e0da07186b3.	2022-08-15 20:33:15 +08:00
Luo, Yuanke	30f9e6ebd3	(Reland) [fastalloc] Support allocating specific register class in fastalloc Reland commit 719658d078c4 The base RA support infrastructure that only allow a specific register class be allocated in RA pss. Since greedy RA, basic RA derived from base RA, they all allow allocating specific register class. Fast RA doesn't support allocating register for specific register class. This patch is to enable ShouldAllocateClass in fast RA, so that it can support allocating register for specific register class. Differential Revision: https://reviews.llvm.org/D131825	2022-08-13 13:57:34 +08:00
Yaxun (Sam) Liu	e780648a15	[AMDGPU] Unify unreachable intrinsics si-annotate-control-flow does depth first traversal of BB's of a function to insert amdgcn if intrinsics for conditional branches so that isel can generate correct instructions later. si-annotate-control-flow checks whether the successor BB for the 'else' branch of a conditional branch has been visited. If it has been visited, si-annotate-control-flow assumes the conditional branch has been handled and will not try to insert if intrinsic for it. This assumption is not correct when the IR contains multiple unreachable BB's. Then 'if' intrinscs are not inserted and incorrect ISA are generated. This patch fixes the issue by let amdgpu-unify-divergent-exit-nodes unify unreachables even if they are uniformly reached. In this way the IR will not contain multiple exits, and structurizer is able to structurize the IR containing one unified exit. Reviewed by: Ruiling Song, Matt Arsenault Differential Revision: https://reviews.llvm.org/D131181 Fixes: SWDEV-343244	2022-08-09 10:23:32 -04:00
Carl Ritson	4c4db81630	[AMDGPU] Extend SILoadStoreOptimizer to s_load instructions Apply merging to s_load as is done for s_buffer_load. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130742	2022-07-30 11:38:39 +09:00
Austin Kerbow	ba0d079c7a	[AMDGPU] Aggressively schedule to reduce RP in occupancy limited regions By not clustering loads and adjusting heuristics to more aggressively reduce register pressure we may be able to increase occupancy for the function if it was dropped in a first pass scheduling. Similarly, try to reduce spilling if register usage exceeds lower bound occupancy. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130329	2022-07-27 22:34:37 -07:00
Jay Foad	716ca2e3ef	[AMDGPU] Pre-sink IR input for some tests Edit the IR input for some codegen tests to simulate what the IR code sinking pass would do to it. This makes the tests immune to the presence or absence of the code sinking pass in the codegen pass pipeline, which does not belong there. Differential Revision: https://reviews.llvm.org/D130169	2022-07-21 14:25:44 +01:00
Thomas Symalla	fd64a857ee	[AMDGPU] Combine s_or_saveexec, s_xor instructions. This patch merges a consecutive sequence of s_or_saveexec s_o, s_i s_xor exec, exec, s_o into a single s_andn2_saveexec s_o, s_i instruction. This patch also cleans up the SIOptimizeExecMasking pass a bit. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D129073	2022-07-21 14:16:37 +02:00
Jay Foad	9383b09858	[AMDGPU][GlobalISel] Fix subtarget checks for combining to v_med3_i16 Differential Revision: https://reviews.llvm.org/D130243	2022-07-21 11:41:31 +01:00
Jon Chesterfield	3a20597776	[amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it. There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue. It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address. The intent is to emit a __const array of LDS addresses and index into it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D125060	2022-07-19 17:46:19 +01:00
Matt Arsenault	8d0383eb69	CodeGen: Remove AliasAnalysis from regalloc This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.	2022-07-18 17:23:41 -04:00
Ivan Kosarev	432cbd7827	[AMDGPU][CodeGen] Support (register + immediate) SMRD offsets. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129381	2022-07-18 11:29:31 +01:00
Matt Arsenault	e9a45d45d0	GlobalISel: Allow forming atomic/volatile G_SEXTLOAD Mirror the change to G_ZEXTLOAD.	2022-07-08 11:55:08 -04:00
Matt Arsenault	1ee6ce9bad	GlobalISel: Allow forming atomic/volatile G_ZEXTLOAD SelectionDAG has a target hook, getExtendForAtomicOps, which it uses in the computeKnownBits implementation for ATOMIC_LOAD. This is pretty ugly (as is having a separate load opcode for atomics), so instead allow making use of atomic zextload. Enable this for AArch64 since the DAG path defaults in to the zext behavior. The tablegen changes are pretty ugly, but partially helps migrate SelectionDAG from using ISD::ATOMIC_LOAD to regular ISD::LOAD with atomic memory operands. For now the DAG emitter will emit matchers for patterns which the DAG will not produce. I'm still a bit confused by the intent of the isLoad/isStore/isAtomic bits. The DAG implementation rejects trying to use any of these in combination. For now I've opted to make the isLoad checks also check isAtomic, although I think having isLoad and isAtomic set on these makes most sense.	2022-07-08 11:55:08 -04:00
Jay Foad	8fc8bf59f2	[AMDGPU] Add GFX11 test coverage sharing checks with GFX10	2022-07-08 11:56:49 +01:00
Jay Foad	de3b5d7316	[AMDGPU] More GFX11 coverage for tests with generated checks	2022-07-08 11:06:02 +01:00
Jay Foad	a59c3eb2f3	[AMDGPU] Add GFX11 coverage to shared sdag/gisel tests	2022-07-08 09:40:20 +01:00

1 2 3 4 5 ...

1698 Commits