llvm-project

Author	SHA1	Message	Date
pvanhout	16e1f8e970	Revert "[AMDGPU] Select v_sat_pk_u8_i16" This reverts commit 64b45db34a0cd979dae9ca3016e9da517e57b987. Reason: the patterns are wrong which can result in a miscompilation. However, fixing the pattern is not trivial due to how i8 values are handled, and due to the additional type-checking performed by D147127: trunc/smax/smin are all defined as int ops in the DAG despite them working on vectors too. As this is not a much-needed pattern, I prefer reverting for now until I can find time to properly rewrite the pattern.	2023-03-31 12:39:49 +02:00
Ivan Kosarev	32f46ef09f	[AMDGPU][AsmParser][NFC] Refine immediate operand definitions. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D144959	2023-03-30 15:11:34 +01:00
pvanhout	64b45db34a	[AMDGPU] Select v_sat_pk_u8_i16 The backend knew about `v_sat_pk_u8_i16` but never made use of it. This patch adds selection patterns (DAG/GISel) for that instruction. I think it'll be very rarely used, but at least it's possible to use it. Solves #58266 (https://github.com/llvm/llvm-project/issues/58266) Reviewed By: foad Differential Revision: https://reviews.llvm.org/D144729	2023-03-15 09:36:12 +01:00
Ivan Kosarev	dbbab71b76	[AMDGPU][NFC] Eliminate the u32imm operand definition. It is only used to infer the types of offset parameters in isel patterns, which we can specify directly. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D144890	2023-02-28 12:23:47 +00:00
Pierre van Houtryve	70924673af	[RFC][GISel] Add a way to ignore COPY instructions in InstructionSelector RFC to add a way to ignore COPY instructions when pattern-matching MIR in GISel. - Add a new "GISelFlags" class to TableGen. Both `Pattern` and `PatFrags` defs can use it to alter matching behaviour. - Flags start at zero and are scoped: the setter returns a `SaveAndRestore` object so that when the current scope ends, the flags are restored to their previous values. This allows child patterns to modify the flags without affecting the parent pattern. - Child patterns always reuse the parent's pattern, but they can override its values. For more examples, see `GlobalISelEmitterFlags.td` tests. - [AMDGPU] Use the IgnoreCopies flag in BFI patterns, which are known to be bothered by cross-regbank copies. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136234	2023-02-10 08:37:42 +01:00
Ivan Kosarev	3d6b108a87	[AMDGPU] Remove the unused u8imm operand definition. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D142193	2023-02-07 11:48:38 +00:00
Matt Arsenault	93ec3fa402	AMDGPU: Support atomicrmw uinc_wrap/udec_wrap For now keep the exising intrinsics working.	2023-01-27 22:17:16 -04:00
Jay Foad	81084bfa2c	[AMDGPU] Make use of !listremove. NFCI. This only affects the order of implicit operands in some MIR tests. Differential Revision: https://reviews.llvm.org/D139829	2022-12-12 17:01:04 +00:00
Justin Bogner	916ae0a060	[AMDGPU] Handle nnan and fast on the call in fpmed3 patterns We were only allowing these med3 patterns if the operands were known to not be NaN, but we should also allow it if the calls to max/min have the `nnan` or `fast` flags. Differential Revision: https://reviews.llvm.org/D139506	2022-12-06 22:57:52 -08:00
jeff	f4e6149d82	[AMDGPU] Use V_PERM to match buildvectors when inputs are not canonicalized (i.e. can't use V_PACK) If we can not prove that f16 operands of a buildvector are canonicalized, then we can not lower into a V_PACK. In this scenario, we would previously lower into some combination of and(sdwa), shr, or. This patch allows for matching into V_PERM instead. Change-Id: Ifa4a74fdb81ef44f22ba490c7fdf81ec8aebc945	2022-10-03 12:58:29 -07:00
Petar Avramovic	dcc756d03e	[AMDGPU] Pattern for flat atomic fadd f64 intrinsic with local addr Fix regression from clang opencl test in builtins-fp-atomics-gfx90a.cl test_flat_add_local_f64 caused by D130579 Revert a3becb333d7faae695e18728e9b8fa3a3579a240. Differential Revision: https://reviews.llvm.org/D134568	2022-09-25 13:25:41 +02:00
Petar Avramovic	6db7921b65	AMDGPU: Use tablegen patterns for buffer global and flat atomic fadd Remove manual selection for atomic fadd from global-isel. Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD which corresponds to llvm-ir's atomicrmw fadd instruction. global and flat atomic fadd patterns changes: Split rtn/no-rtn patterns Add missing patterns or fix predicates Remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors). Patterns now check addrspace of pointer, added patterns for flat intrinsic. with global addrspace pointer that selects into global atomic instruction. buffer atomic fadd patterns changes: Rdit patterns to import into global-isel. Remove gfx6/gfx7 _addr64 and _offset patterns. Remove patterns that can't be reached (same pattern but different feature). Differential Revision: https://reviews.llvm.org/D130579	2022-09-23 17:52:10 +02:00
Abinav Puthan Purayil	17a81ecf85	[AMDGPU] Use the HasNoUse predicate for no-ret atomic op selection This change replaces the C++ predicates with the HasNoUse builtin predicate that would enable the no-ret atomic op selection in GlobalISel. Differential Revision: https://reviews.llvm.org/D125213	2022-07-08 09:47:33 +05:30
Abinav Puthan Purayil	7504c7a877	[AMDGPU] Use AddedComplexity for ret and noret atomic ops selection This patch removes the predicate for return atomic ops and uses AddedComplexity to distinguish its selection from its no return variant. This will produce better matchers that doesn't unnecessarily check for the negated predicate if the initial predicate failed. Also, it simplifies the enabling of no return atomic ops selection in GlobalISel. Differential Revision: https://reviews.llvm.org/D128241	2022-07-08 09:47:33 +05:30
Joe Nash	07b7fada73	[AMDGPU] gfx11 VOPD instructions MC support VOPD is a new encoding for dual-issue instructions for use in wave32. This patch includes MC layer support only. A VOPD instruction is constituted of an X component (for which there are 13 possible opcodes) and a Y component (for which there are the 13 X opcodes plus 3 more). Most of the complexity in defining and parsing a VOPD operation arises from the possible different total numbers of operands and deferred parsing of certain operands depending on the constituent X and Y opcodes. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D128218	2022-06-24 11:08:39 -04:00
Joe Nash	e243ead6fc	Reland [AMDGPU] gfx11 vop3dpp instructions There was an issue with encoding wide (>64 bit) instructions on BigEndian hosts, which is fixed in D127195. Therefore reland this. gfx11 adds the ability to use dpp modifiers on vop3 instructions. This patch adds machine code layer support for that. The MCCodeEmitter is changed to use APInt instead of uint64_t to support these wider instructions. Patch 16/N for upstreaming of AMDGPU gfx11 architecture Differential Revision: https://reviews.llvm.org/D126483	2022-06-07 14:49:13 -04:00
Joe Nash	eaed07eb7e	Revert "[AMDGPU] gfx11 vop3dpp instructions" This reverts commit 99a83b1286748501e0ccf199a582dc3ec5451ef5.	2022-06-06 17:12:09 -04:00
Joe Nash	99a83b1286	[AMDGPU] gfx11 vop3dpp instructions gfx11 adds the ability to use dpp modifiers on vop3 instructions. This patch adds machine code layer support for that. The MCCodeEmitter is changed to use APInt instead of uint64_t to support these wider instructions. Patch 16/N for upstreaming of AMDGPU gfx11 architecture Depends on D126475 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126483	2022-06-06 09:34:59 -04:00
Matt Arsenault	0ecbb683a2	TableGen/GlobalISel: Make address space/align predicates consistent The builtin predicate handling has a strange behavior where the code assumes that a PatFrag is a stack of PatFrags, and each level adds at most one predicate. I don't think this particularly makes sense, especially without a diagnostic to ensure you aren't trying to set multiple at once. This wasn't followed for address spaces and alignment, which could potentially fall through to report no builtin predicate was added. Just switch these to follow the existing convention for now.	2022-04-22 15:48:07 -04:00
Abinav Puthan Purayil	45ca94334e	[AMDGPU] Select no-return atomic intrinsics in tblgen This is to avoid relying on the post-isel hook. This change also enable the saddr pattern selection for atomic intrinsics in GlobalISel. Differential Revision: https://reviews.llvm.org/D123583	2022-04-22 09:37:40 +05:30
Abinav Puthan Purayil	b7df71524e	[AMDGPU][GlobalISel] Force return atomic selection for now	2022-04-20 16:00:08 +05:30
Abinav Puthan Purayil	f4e8cf25af	[AMDGPU] Select no-return ds_* atomic ops in tblgen. SelectionDAG relies on MachineInstr's HasPostISelHook for selecting the no-return atomic ops. GlobalISel, at the moment, doesn't handle HasPostISelHook. This change adds the selection for no-return ds_* atomic ops in tblgen so that it can work with both GlobalISel and SelectionDAG. I couldn't add the predicates for GlobalISel in this change since there's a restriction in GlobalISelEmitter that disallows selecting generic atomics ops that return with instructions that doesn't return. We can't remove the HasPostISelHook code that selects the no return atomic ops in SelectionDAG yet since we still need to cover selections in FLATInstructions.td, BUFInstructions.td. Differential Revision: https://reviews.llvm.org/D115881	2022-02-10 09:26:37 +05:30
Abinav Puthan Purayil	d8b690409d	[AMDGPU] Set MemoryVT for truncstores in tblgen. GlobalISelEmitter was skipping these patterns when its predicates were checked. This patch should allow us to select d16_hi stores in GlobalISel. Differential Revision: https://reviews.llvm.org/D117762	2022-01-20 19:05:12 +05:30
Matt Arsenault	9392b40d4b	AMDGPU/GlobalISel: Fix selection of constant 32-bit addrspace loads Unfortunately the selection patterns still rely on the address space from the memory operand instead of using the pointer type. Add this address space to the list of cases supported by global-like loads. Alternatively we would have to adjust the address space of the memory operand to deviate from the underlying IR value, which looks ugly and is more work in the legalizer. This doesn't come up in the DAG path because it uses a different selection strategy where the cast is inserted during the addressing mode matching.	2022-01-17 10:06:33 -05:00
Mateja Marjanovic	ca57b80cd6	Code quality: Combine V_RSQ Combine V_RCP and V_SQRT into V_RSQ on AMDGPU for GlobalISel. Change-Id: I93c5dcb412483156a6e8b68c4085cbce83ac9703	2021-11-30 17:17:15 +01:00
Abinav Puthan Purayil	14c4051122	[AMDGPU][NFC] Remove unused defvar in AMDGPUInstructions.td.	2021-11-30 17:03:37 +05:30
Abinav Puthan Purayil	078da26b1c	[AMDGPU] Check for unneeded shift mask in shift PatFrags. The existing constrained shift PatFrags only dealt with masked shift from OpenCL front-ends. This change copies the X86DAGToDAGISel::isUnneededShiftMask() function to AMDGPU and uses it in the shift PatFrag predicates. Differential Revision: https://reviews.llvm.org/D113448	2021-11-24 10:53:12 +05:30
Abinav Puthan Purayil	61e3b9fefe	[AMDGPU] Add constrained shift pattern matches. The motivation for this is due to clang's conformance to https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#operators-shift which makes clang emit (<shift> a, (and b, <width> - 1)) for `a <shift> b` in OpenCL where a is an int of bit width <width>. Differential revision: https://reviews.llvm.org/D110231	2021-10-26 19:07:19 +05:30
Piotr Sobczak	d869921004	[AMDGPU] Add patterns for i8/i16 local atomic load/store Add patterns for i8/i16 local atomic load/store. Added tests for new patterns. Copied atomic_[store/load]_local.ll to GlobalISel directory. Differential Revision: https://reviews.llvm.org/D111869	2021-10-18 11:23:10 +02:00
Jay Foad	ce098ccc1c	[AMDGPU] Simplify tablegen files. NFC. There is no need to cast records to strings before comparing them.	2021-07-07 09:19:23 +01:00
Julien Pagès	46adccc5cc	[AMDGPU] Improve Codegen for build_vector Improve the code generation of build_vector. Use the v_pack_b32_f16 instruction instead of v_and_b32 + v_lshl_or_b32 Differential Revision: https://reviews.llvm.org/D98081 Patch by Julien Pagès!	2021-05-12 14:17:44 +01:00
Jay Foad	d92b4956d6	[AMDGPU] Inline FSHRPattern into its only use. NFC.	2021-03-26 09:32:02 +00:00
Paul C. Anagnostopoulos	91d2e5c81a	[TableGen] Add the !filter bang operator. Add a test. Update the Programmer's Reference. Use it in some TableGen files. Differential Revision: https://reviews.llvm.org/D91008	2020-11-09 10:56:55 -05:00
Jay Foad	0d5989bb24	[AMDGPU] Split R600 and GCN bfe patterns This is in preparation for making the GCN patterns divergence-aware. NFC. Differential Revision: https://reviews.llvm.org/D88579	2020-10-05 09:55:10 +01:00
Jay Foad	286d3fc750	[AMDGPU] Split R600 and GCN bfi patterns This is in preparation for making the GCN patterns divergence-aware. NFC. Differential Revision: https://reviews.llvm.org/D88244	2020-09-28 10:16:51 +01:00
Stanislav Mekhanoshin	277de43d88	[AMDGPU] Unify intrinsic ret/nortn interface We have a single noret intrinsic an a lot of special handling around it. Declare it just as any other but do not define rtn instructions itself instead. Differential Revision: https://reviews.llvm.org/D87719	2020-09-15 15:26:42 -07:00
Mirko Brkusanin	d17ea67b92	[AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores Fix local ds_read/write_b96/b128 so they can be selected if the alignment allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break them down. Differential Revision: https://reviews.llvm.org/D81638	2020-08-21 12:26:31 +02:00
Jay Foad	ecac951be9	[AMDGPU] Fix and simplify AMDGPUTargetLowering::LowerUDIVREM Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32. Differential Revision: https://reviews.llvm.org/D83382	2020-07-08 19:14:49 +01:00
Matt Arsenault	9e03bdebc1	AMDGPU: Add llvm.amdgcn.sqrt intrinsic I spread the GlobalISel test into the regular one, which I've been avoiding so far.	2020-06-26 15:07:07 -04:00
Matt Arsenault	89c8c80bd5	AMDGPU: Change pre-gfx9 implementation of fcanonicalize to mul If f32 denormals were enabled pre-gfx9, we would still try to implement this with v_max_f32. Pre-gfx9, these instructions ignored the denormal mode and did not flush. Switch to the multiply form for f32 as a workaround which should always work in any case. This fixes conformance failures when the library implementation of fmin/fmax were accidentally not inlined, forcing the assumption of no flushing on targets where denormals are not enabled by default. This is a workaround, since really we should not be mixing code with different FP mode expectations, but prefer the lowering that will work in any mode. Now this will always use max to implement canonicalize on gfx9+. This is only really beneficial for f64. For f32/f16 it's a neutral choice (and worse in terms of code size in 1 case), but possibly worse for the compiler since it does add an extra register use operand. Leave this change for later.	2020-04-23 15:24:13 -04:00
Matt Arsenault	5660bb6bc9	AMDGPU: Remove denormal subtarget features Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.	2020-04-02 17:17:12 -04:00
Simon Pilgrim	e91feeed21	[AMDGPU] Add ISD::FSHR -> ALIGNBIT support This patch allows ISD::FSHR(i32) patterns to lower to ALIGNBIT instructions. This improves test coverage of ISD::FSHR matching - x86 has both FSHL/FSHR instructions and we prefer FSHL by default. Differential Revision: https://reviews.llvm.org/D76070	2020-03-12 20:16:57 +00:00
Matt Arsenault	1024b73ef5	AMDGPU: Split denormal mode tracking bits Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet. This is just a mechanical change, and mostly still assumes the input and output mode match. This should be refined for some cases. For example, fcanonicalize lowering should use the flushing variant if either input or output flushing is enabled	2020-02-04 10:44:21 -08:00
Matt Arsenault	84e035d8f1	AMDGPU: Don't check constant address space for atomic stores We define a separate list for storable address spaces. This saves entry in the matcher table address space list.	2020-01-24 12:15:09 -08:00
Matt Arsenault	9ffd0ed838	AMDGPU/GlobalISel: Fix import of integer med3 This isn't too useful now, since nothing is currently trying to form min/max from cmp+select.	2020-01-09 10:29:32 -05:00
Matt Arsenault	c66b2e1c87	AMDGPU: Eliminate more legacy codepred address space PatFrags These should now be limited to R600 code.	2020-01-09 10:29:32 -05:00
Matt Arsenault	3766f4bacc	AMDGPU: Use new PatFrag system for d16 stores	2020-01-09 10:29:32 -05:00
Matt Arsenault	22700f68e1	AMDGPU: Annotate EXTRACT_SUBREGs with source register classes This partially fixes GlobalISel import of the patterns, but removes a lot of entriess from the end of the skipped pattern log.	2020-01-07 21:56:16 -05:00
Matt Arsenault	bd8d696c14	AMDGPU: Use ImmLeaf	2020-01-07 15:10:07 -05:00
Matt Arsenault	e4464bf3d4	AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTOR	2020-01-06 11:19:33 -05:00

1 2 3

139 Commits