llvm-project

Author	SHA1	Message	Date
Matt Arsenault	52ec7379ad	AMDGPU/GlobalISel: Fold add of constant into G_INSERT_VECTOR_ELT Move the subregister base like in the extract case.	2020-01-22 11:09:15 -05:00
Matt Arsenault	d1dbb5e471	AMDGPU/GlobalISel: Select G_INSERT_VECTOR_ELT	2020-01-22 11:00:49 -05:00
Matt Arsenault	e3d352c541	AMDGPU/GlobalISel: Fold constant offset vector extract indexes Handle dynamic vector extracts that use an index that's an add of a constant offset into moving the base subregister of the indexing operation. Force the add into the loop in regbankselect, which will be recognized when selected.	2020-01-22 10:50:59 -05:00
Matt Arsenault	a722cbf77c	AMDGPU/GlobalISel: Handle atomic_inc/atomic_dec The intermediate instruction drops the extra volatile argument. We are missing an atomic ordering on these.	2020-01-22 09:26:17 -05:00
Matt Arsenault	592de0009f	AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp The existing test is overly reliant on -mattr=-flat-for-global, and some missing optimizations to re-use.	2020-01-17 20:09:53 -05:00
Matt Arsenault	ec9628318d	AMDGPU/GlobalISel: Select DS append/consume	2020-01-17 20:09:53 -05:00
Matt Arsenault	9b2f3532c7	AMDGPU/GlobalISel: Select DS GWS intrinsics	2020-01-16 11:25:10 -05:00
Matt Arsenault	711a17afaf	AMDGPU/GlobalISel: Select exp with patterns This does produce slightly different code. Now a unique IMPLICIT_DEF is emitted for each of the implicit_def operands, rather than reusing the same one.	2020-01-15 18:33:15 -05:00
Matt Arsenault	203801425d	AMDGPU/GlobalISel: Select llvm.amdgcn.ds.ordered.{add\|swap}	2020-01-13 13:09:38 -05:00
Matt Arsenault	35c3d101ae	AMDGPU/GlobalISel: Select G_EXTRACT_VECTOR_ELT Doesn't try to do the fold into the base register of an add of a constant in the index like the DAG path does.	2020-01-09 19:52:24 -05:00
Matt Arsenault	b4a647449f	TableGen/GlobalISel: Add way for SDNodeXForm to work on timm The current implementation assumes there is an instruction associated with the transform, but this is not the case for timm/TargetConstant/immarg values. These transforms should directly operate on a specific MachineOperand in the source instruction. TableGen would assert if you attempted to define an equivalent GISDNodeXFormEquiv using timm when it failed to find the instruction matcher. Specially recognize SDNodeXForms on timm, and pass the operand index to the render function. Ideally this would be a separate render function type that looks like void renderFoo(MachineInstrBuilder, const MachineOperand&), but this proved to be somewhat mechanically painful. Add an optional operand index which will only be passed if the transform should only look at the one source operand. Theoretically it would also be possible to only ever pass the MachineOperand, and the existing renderers would check the parent. I think that would be somewhat ugly for the standard usage which may want to inspect other operands, and I also think MachineOperand should eventually not carry a pointer to the parent instruction. Use it in one sample pattern. This isn't a great example, since the transform exists to satisfy DAG type constraints. This could also be avoided by just changing the MachineInstr's arbitrary choice of operand type from i16 to i32. Other patterns have nontrivial uses, but this serves as the simplest example. One flaw this still has is if you try to use an SDNodeXForm defined for imm, but the source pattern uses timm, you still see the "Failed to lookup instruction" assert. However, there is now a way to avoid it.	2020-01-09 17:37:52 -05:00
Matt Arsenault	7d67742160	AMDGPU/GlobalISel: Fix import of zext of s16 op patterns	2020-01-09 10:29:32 -05:00
Matt Arsenault	e71af77568	AMDGPU/GlobalISel: Add IMMPopCount xform Partially fixes BFE pattern import.	2020-01-09 10:29:32 -05:00
Matt Arsenault	79450a4ea2	AMDGPU/GlobalISel: Add selectVOP3Mods_nnan This doesn't enable any new imports yet, but moves the fmed patterns from failing on this to hitting the "complex suboperand referenced more than once" limitation in tablegen.	2020-01-09 10:29:32 -05:00
Matt Arsenault	d964086c62	AMDGPU/GlobalISel: Add equiv xform for bitcast_fpimm_to_i32 Only partially fixes one pattern import.	2020-01-09 10:29:31 -05:00
Matt Arsenault	3952748ffd	AMDGPU/GlobalISel: Fix add of neg inline constant pattern	2020-01-09 10:29:31 -05:00
Matt Arsenault	c3a10faadc	AMDGPU: Remove VOP3Mods0Clamp0OMod Now that overridable default operands work, there's no reason to use complex patterns to just produce 0s.	2020-01-07 15:10:08 -05:00
Matt Arsenault	d4c9e13324	AMDGPU/GlobalISel: Select G_UADDE/G_USUBE	2020-01-06 18:27:52 -05:00
Matt Arsenault	4e85ca9562	AMDGPU/GlobalISel: Replace handling of boolean values This solves selection failures with generated selection patterns, which would fail due to inferring the SGPR reg bank for virtual registers with a set register class instead of VCC bank. Use instruction selection would constrain the virtual register to a specific class, so when the def was selected later the bank no longer was set to VCC. Remove the SCC reg bank. SCC isn't directly addressable, so it requires copying from SCC to an allocatable 32-bit register during selection, so these might as well be treated as 32-bit SGPR values. Now any scalar boolean value that will produce an outupt in SCC should be widened during RegBankSelect to s32. Any s1 value should be a vector boolean during selection. This makes the vcc register bank unambiguous with a normal SGPR during selection. Summary of how this should now work: - G_TRUNC is always a no-op, and never should use a vcc bank result. - SALU boolean operations should be promoted to s32 in RegBankSelect apply mapping - An s1 value means vcc bank at selection. The exception is for legalization artifacts that use s1, which are never VCC. All other contexts should infer the VCC register classes for s1 typed registers. The LLT for the register is now needed to infer the correct register class. Extensions with vcc sources should be legalized to a select of constants during RegBankSelect. - Copy from non-vcc to vcc ensures high bits of the input value are cleared during selection. - SALU boolean inputs should ensure the inputs are 0/1. This includes select, conditional branches, and carry-ins. There are a few somewhat dirty details. One is that G_TRUNC/G_*EXT selection ignores the usual register-bank from register class functions, and can't handle truncates with VCC result banks. I think this is OK, since the artifacts are specially treated anyway. This does require some care to avoid producing cases with vcc. There will also be no 100% reliable way to verify this rule is followed in selection in case of register classes, and violations manifests themselves as invalid copy instructions much later. Standard phi handling also only considers the bank of the result register, and doesn't insert copies to make the source banks match. This doesn't work for vcc, so we have to manually correct phi inputs in this case. We should add a verifier check to make sure there are no phis with mixed vcc and non-vcc register bank inputs. There's also some duplication with the LegalizerHelper, and some code which should live in the helper. I don't see a good way to share special knowledge about what types to use for intermediate operations depending on the bank for example. Using the helper to replace extensions with selects also seems somewhat awkward to me. Another issue is there are some contexts calling getRegBankFromRegClass that apparently don't have the LLT type for the register, but I haven't yet run into a real issue from this. This also introduces new unnecessary instructions in most cases, since we don't yet try to optimize out the zext when the source is known to come from a compare.	2020-01-06 18:26:42 -05:00
Matt Arsenault	14d25052a2	AMDGPU: Use ImmLeaf for inline immediate predicates	2020-01-06 17:21:51 -05:00
Matt Arsenault	e4464bf3d4	AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTOR	2020-01-06 11:19:33 -05:00
Matt Arsenault	f1c85ecdfc	AMDGPU/GlobalISel: Select more G_EXTRACTs correctly This assumed a 32-bit extract size, which would produce invalid copies with 64-bit extracts. Handle the easy case. Ideally we would have a way to get the proper subreg index for any 32-bit offset, but there should probably be a tablegenerated way of getting the subreg index for any size and offset.	2020-01-06 11:10:13 -05:00
Matt Arsenault	9861a8538c	AMDGPU/GlobalISel: Add new utils file There are some things that are shareable between the legalizer, regbankselect, and the selector that don't have an obvious place to go.	2020-01-03 15:25:50 -05:00
Matt Arsenault	53fc484067	AMDGPU/GlobalISel: Fix off by one in operand index This should be looking at the RHS of the add for a constant.	2020-01-03 10:30:30 -05:00
Matt Arsenault	25e7da0c24	AMDGPU/GlobalISel: Remove manual G_FENCE selection The tablegen emitter now handles the immediate operand correctly, so let the generatedd matcher works.	2020-01-02 17:16:10 -05:00
Matt Arsenault	48e0e68edb	AMDGPU/GlobalISel: Re-use MRI available in selector	2019-12-30 13:00:17 -05:00
Matt Arsenault	dff3f8d742	AMDGPU/GlobalISel: Fix missing scc imp-def on scalar and/or/xor	2019-12-21 04:55:36 -05:00
Matt Arsenault	bc276c6379	GlobalISel: Lower s1 source G_SITOFP/G_UITOFP	2019-11-15 13:37:20 +05:30
Daniel Sanders	e74c5b9661	[globalisel] Rename G_GEP to G_PTR_ADD Summary: G_GEP is rather poorly named. It's a simple pointer+scalar addition and doesn't support any of the complexities of getelementptr. I therefore propose that we rename it. There's a G_PTR_MASK so let's follow that convention and go with G_PTR_ADD Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69734	2019-11-05 10:31:17 -08:00
Matt Arsenault	f9a42ed0a7	AMDGPU: Relax 32-bit SGPR register class Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This will allow the register coalescer to do a better job eliminating copies to m0. For GlobalISel, as a terrible hack, use SGPR_32 for things that should use SCC until booleans are solved. llvm-svn: 375267	2019-10-18 18:26:37 +00:00
Matt Arsenault	538b73b797	AMDGPU/GlobalISel: Handle more G_INSERT cases Start manually writing a table to get the subreg index. TableGen should probably generate this, but I'm not sure what it looks like in the arbitrary case where subregisters are allowed to not fully cover the super-registers. llvm-svn: 373947	2019-10-07 19:16:26 +00:00
Matt Arsenault	0b2ea91d6d	AMDGPU/GlobalISel: Use S_MOV_B64 for inline constants This hides some defects in SIFoldOperands when the immediates are split. llvm-svn: 373943	2019-10-07 19:07:19 +00:00
Matt Arsenault	b4cbf9862c	AMDGPU/GlobalISel: Select more G_INSERT cases At minimum handle the s64 insert type, which are emitted in real cases during legalization. We really need TableGen to emit something to emit something like the inverse of composeSubRegIndices do determine the subreg index to use. llvm-svn: 373938	2019-10-07 18:43:31 +00:00
Matt Arsenault	27269054d2	GlobalISel: Add target pre-isel instructions Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937	2019-10-07 18:43:29 +00:00
Matt Arsenault	e59296a051	AMDGPU/GlobalISel: Fall back on weird G_EXTRACT offsets llvm-svn: 373842	2019-10-06 01:41:22 +00:00
Matt Arsenault	412e0bf8f3	AMDGPU/GlobalISel: Select G_PTRTOINT llvm-svn: 373715	2019-10-04 08:35:37 +00:00
Piotr Sobczak	265e94e657	[AMDGPU] Extend buffer intrinsics with swizzling Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491	2019-10-02 17:22:36 +00:00
Matt Arsenault	86f864dace	AMDGPU/GlobalISel: Use getIntrinsicID helper llvm-svn: 373417	2019-10-02 01:02:27 +00:00
Matt Arsenault	fdea5e02ce	AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFP llvm-svn: 373298	2019-10-01 02:23:20 +00:00
Matt Arsenault	59b91aa93e	AMDGPU/GlobalISel: Add support for init.exec intrinsics TThe existing wave32 behavior seems broken and incomplete, but this reproduces it. llvm-svn: 373296	2019-10-01 02:07:25 +00:00
Matt Arsenault	54167ea316	AMDGPU/GlobalISel: Select G_UADDO/G_USUBO llvm-svn: 373288	2019-10-01 01:23:13 +00:00
Matt Arsenault	76f44f6b53	AMDGPU/GlobalISel: Avoid getting MRI in every function Store it in AMDGPUInstructionSelector to avoid boilerplate in nearly every select function. llvm-svn: 373139	2019-09-28 03:41:13 +00:00
Bjorn Pettersson	169cb63478	[AMDGPU] Use std::make_tuple to make some toolchains happy again My toolchain stopped working (LLVM 8.0 , libstdc++ 5.4.0) after r372338. The same problem was seen in clang-cuda-build buildbots: clang-cuda-build/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:763:12: error: chosen constructor is explicit in copy-initialization return {Reg, 0, nullptr}; ^~~~~~~~~~~~~~~~~ /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here constexpr tuple(_UElements&&... __elements) ^ This commit adds explicit calls to std::make_tuple to work around the problem. llvm-svn: 372384	2019-09-20 12:13:12 +00:00
Matt Arsenault	3ecab8e455	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338	2019-09-19 16:26:14 +00:00
Hans Wennborg	13bdae8541	Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_ instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314	2019-09-19 12:33:07 +00:00
Matt Arsenault	494243597b	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store.format This needs special handling due to some subtargets that have a nonstandard register layout for f16 vectors Also reject some illegal types on other targets. llvm-svn: 372293	2019-09-19 02:35:08 +00:00
Matt Arsenault	67f1f6ff8c	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store llvm-svn: 372292	2019-09-19 02:30:27 +00:00
Matt Arsenault	d8399d12cd	GlobalISel: Don't materialize immarg arguments to intrinsics Encode them directly as an imm argument to G_INTRINSIC. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_ instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285	2019-09-19 01:33:14 +00:00
Matt Arsenault	fb51e64eac	AMDGPU/GlobalISel: Fail select of G_INSERT non-32-bit source This was producing an illegal copy which would hit an assert later. Error on selection for now until this is implemented. llvm-svn: 371993	2019-09-16 14:26:14 +00:00
Matt Arsenault	3b7ffc6ae7	AMDGPU/GlobalISel: Fix assert on multi-return side effect intrinsics llvm.amdgcn.else hits this. llvm-svn: 371812	2019-09-13 04:12:12 +00:00

... 2 3 4 5 6

295 Commits