llvm-project

Author	SHA1	Message	Date
Tim Renouf	a37679d67b	[AMDGPU] Fix for negative offsets in buffer/tbuffer intrinsics Summary: The new buffer/tbuffer intrinsics handle an out-of-range immediate offset by moving/adding offset&-4096 to a vgpr, leaving an in-range immediate offset, with a chance of the move/add being CSEd for similar loads/stores. However it turns out that a negative offset in a vgpr is illegal, even if adding the immediate offset makes it legal again. Therefore, this commit disables the offset&-4096 thing if the offset is negative. Differential Revision: https://reviews.llvm.org/D52683 Change-Id: Ie02f0a74f240a138dc2a29d17cfbd9e350e4ed13 llvm-svn: 343672	2018-10-03 10:29:43 +00:00
Fangrui Song	3d76d36059	[AMDGPU] Rename pass "isel" to "amdgpu-isel" Summary: The AMDGPU target specific pass "isel" is a misleading name. Reviewers: tstellar, echristo, javed.absar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D52759 llvm-svn: 343659	2018-10-03 03:38:22 +00:00
Matt Arsenault	635d479322	AMDGPU: Always run AMDGPUAlwaysInline Even if calls are enabled, it still needs to be run for forcing inline of functions that use LDS. llvm-svn: 343657	2018-10-03 02:47:25 +00:00
Matt Arsenault	ab41193312	AMDGPU: Expand atomicrmw nand in IR llvm-svn: 343559	2018-10-02 03:50:56 +00:00
Bjorn Pettersson	c2fc53ac90	[PHIElimination] Lower a PHI node with only undef uses as IMPLICIT_DEF Summary: The lowering of PHI nodes used to detect if all inputs originated from IMPLICIT_DEF's. If so the PHI node was replaced by an IMPLICIT_DEF. Now we also consider undef uses when checking the inputs. So if all inputs are implicitly defined or undef we lower the PHI to an IMPLICIT_DEF. This makes PHIElimination::LowerPHINode more consistent as it checks both implicit and undef properties at later stages. Reviewers: MatzeB, tstellar Reviewed By: MatzeB Subscribers: jvesely, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D52558 llvm-svn: 343417	2018-09-30 17:26:58 +00:00
Stanislav Mekhanoshin	b080adfc0c	[AMDGPU] Fold copy (copy vgpr) This allows to reduce a number of used VGPRs in some cases. Differential Revision: https://reviews.llvm.org/D52577 llvm-svn: 343249	2018-09-27 18:55:20 +00:00
Stanislav Mekhanoshin	8dfcd83371	[AMDGPU] Fix ds combine with subregs Differential Revision: https://reviews.llvm.org/D52522 llvm-svn: 343047	2018-09-25 23:33:18 +00:00
Changpeng Fang	6f4922ccc9	AMDGPU: Add Selection patterns to support add of one bit. Summary: We generate s_xor to lower add of i1s in general cases, and s_not to lower add with a one-bit imm of -1 (true). Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D52518 llvm-svn: 343030	2018-09-25 21:21:18 +00:00
Daniil Fukalov	349b5943b4	[RegAllocGreedy] avoid using physreg candidates that cannot be correctly spilled For the AMDGPU target if a MBB contains exec mask restore preamble, SplitEditor may get state when it cannot insert a spill instruction. E.g. for a MIR bb.100: %1 = S_OR_SAVEEXEC_B64 %2, implicit-def $exec, implicit-def $scc, implicit $exec and if the regalloc will try to allocate a virtreg to the physreg already assigned to virtreg %1, it should insert spill instruction before the S_OR_SAVEEXEC_B64 instruction. But it is not possible since can generate incorrect code in terms of exec mask. The change makes regalloc to ignore such physreg candidates. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D52052 llvm-svn: 343004	2018-09-25 18:37:38 +00:00
Sameer Sahasrabuddhe	b4f2d1cb68	[AMDGPU] restore r342722 which was reverted with r342743 [AMDGPU] lower-switch in preISel as a workaround for legacy DA Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. llvm-svn: 342956	2018-09-25 09:39:21 +00:00
Stanislav Mekhanoshin	14fefe7f8e	[AMDGPU] Remove useless check from test. NFC. The check for assignment of zero is practically useless while the assignment moves around with different scheduling. llvm-svn: 342935	2018-09-25 01:24:54 +00:00
Matt Arsenault	f432011d33	AMDGPU: Fix private handling for allowsMisalignedMemoryAccesses If the alignment is at least 4, this should report true. Something still seems off with how < 4-byte types are handled here though. Fixing this seems to change how some combines get to where they get, but somehow isn't changing the net result. llvm-svn: 342879	2018-09-24 13:18:15 +00:00
Sameer Sahasrabuddhe	0807e94951	revert changes from r342722 "[AMDGPU] lower-switch in preISel as a workaround for legacy DA" This broke regression tests. The first breakage was noticed here: http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/23549 llvm-svn: 342743	2018-09-21 16:31:51 +00:00
Sameer Sahasrabuddhe	2de7653fd5	[AMDGPU] lower-switch in preISel as a workaround for legacy DA Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, simoll Differential Revision: https://reviews.llvm.org/D52221 llvm-svn: 342722	2018-09-21 11:26:55 +00:00
Alexander Timofeev	36617f0160	[AMDGPU] Divergence driven instruction selection. Part 1. Summary: This change is the first part of the AMDGPU target description change. The aim of it is the effective splitting the vector and scalar flows at the selection stage. Selection uses predicate functions based on the framework implemented earlier - https://reviews.llvm.org/D35267 Differential revision: https://reviews.llvm.org/D52019 Reviewers: rampitec llvm-svn: 342719	2018-09-21 10:31:22 +00:00
Carl Ritson	6b8d75425e	[AMDGPU] Add instruction selection for i1 to f16 conversion Summary: This is required for GPUs with 16 bit instructions where f16 is a legal register type and hence int_to_fp i1 to f16 is not lowered by legalizing. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52018 Change-Id: Ie4c0fd6ced7cf10ad612023c6879724d9ded5851 llvm-svn: 342558	2018-09-19 16:32:12 +00:00
Farhana Aleen	f5a2848376	[AMDGPU] Match udot8 pattern Summary: D.u32 = S0.u4[0] * S1.u4[0] + S0.u4[1] * S1.u4[1] + S0.u4[2] * S1.u4[2] + S0.u4[3] * S1.u4[3] + S0.u4[4] * S1.u4[4] + S0.u4[5] * S1.u4[5] + S0.u4[6] * S1.u4[6] + S0.u4[7] * S1.u4[7] + S2.u32 Author: FarhanaAleen Reviewed By: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D51947 llvm-svn: 342497	2018-09-18 16:59:48 +00:00
Matt Arsenault	ebf46143ea	AMDGPU: Don't form fmed3 if it will require materialization If there is a single use constant, it can be folded into the min/max, but not into med3. llvm-svn: 342443	2018-09-18 02:34:54 +00:00
Matt Arsenault	9d49c449ec	AMDGPU: Expand vector canonicalizes llvm-svn: 342439	2018-09-18 01:51:33 +00:00
David Stuttard	20de3e99b5	[AMDGPU] Ensure trig range reduction only used for subtargets that require it Summary: GFX9 and above support sin/cos instructions with a greater range and thus don't require a fract instruction prior to invocation. Added a subtarget feature to reflect this and added code to take advantage of expanded range on GFX9+ Also updated the tests to check correct behaviour Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51933 Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0 llvm-svn: 342222	2018-09-14 10:27:19 +00:00
Matt Arsenault	ff987ac6ea	AMDGPU: Fix not preserving alignent in call setups If an argument was passed on the stack, this was using the default alignment. I'm not sure there's an observable change from this. This was observable due to bugs in expansion of unaligned loads and stores, but since that is fixed I don't think this matters much. llvm-svn: 342133	2018-09-13 12:14:31 +00:00
Matt Arsenault	842cda6312	DAG: Fix expansion of unaligned FP loads and stores This was trying to scalarizing a scalar FP type, resulting in an assert. Fixes unaligned f64 stack stores for AMDGPU. llvm-svn: 342132	2018-09-13 12:14:23 +00:00
Matt Arsenault	9de2fb58fa	AMDGPU: Fix some outdated datalayouts in tests llvm-svn: 342131	2018-09-13 11:56:28 +00:00
Alexander Timofeev	2fb44808b1	[AMDGPU] Preliminary patch for divergence driven instruction selection. Load offset inlining pattern changed. Differential revision: https://reviews.llvm.org/D51975 Reviewers: rampitec llvm-svn: 342115	2018-09-13 06:34:56 +00:00
Konstantin Zhuravlyov	71e43ee47d	AMDGPU: Re-apply r341982 after fixing the layering issue Move isa version determination into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). llvm-svn: 342069	2018-09-12 18:50:47 +00:00
Ilya Biryukov	95066496d0	Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser." This reverts commit r341982. The change introduced a layering violation. Reverting to unbreak our integrate. llvm-svn: 342023	2018-09-12 07:05:30 +00:00
Konstantin Zhuravlyov	941615e4c8	AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). Differential Revision: https://reviews.llvm.org/D51890 llvm-svn: 341982	2018-09-11 18:56:51 +00:00
Alexander Timofeev	db7ee7660a	[AMDGPU] Preliminary patch for divergence driven instruction selection. Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec llvm-svn: 341928	2018-09-11 11:56:50 +00:00
Matt Arsenault	d0cf1b26d4	AMDGPU: Fix r600 test llvm-svn: 341898	2018-09-11 04:39:16 +00:00
Matt Arsenault	99c780159d	AMDGPU: Don't error on out of bounds address spaces We should never abort on valid IR. The most reasonable interpretation of an arbitrary address space pointer is probably some kind of special subset of global memory. llvm-svn: 341894	2018-09-11 04:00:41 +00:00
Alexander Timofeev	20cbe6f319	[AMDGPU] Preliminary patch for divergence driven instruction selection. Inline immediate move to V_MADAK_F32. Differential revision: https://reviews.llvm.org/D51586 Reviewer: rampitec llvm-svn: 341843	2018-09-10 16:42:49 +00:00
Matt Arsenault	7f6dc597d3	AMDGPU: Stop reporting is-noop addrspacecast for constant 32-bit This will require something to cast. Before this would eliminate the cast, which would result in copies of $noreg. llvm-svn: 341803	2018-09-10 11:59:27 +00:00
Matt Arsenault	57b5966dad	DAG: Handle odd vector sizes in calling conv splitting This already worked if only one register piece was used, but didn't if a type was split into multiple, unequal sized pieces. Fixes not splitting 3i16/v3f16 into two registers for AMDGPU. This will also allow fixing the ABI for 16-bit vectors in a future commit so that it's the same for all subtargets. llvm-svn: 341801	2018-09-10 11:49:23 +00:00
Carl Ritson	f898edd117	[AMDGPU] Prevent sequences of non-instructions disrupting GCNHazardRecognizer wait state counting Summary: This fixes a bug where a large number of implicit def instructions can fill the GCNHazardRecognizer lookahead buffer causing required NOPs to not be inserted. Reviewers: nhaehnle, arsenm Reviewed By: arsenm Subscribers: sheredom, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51726 Change-Id: Ie75338f94de704ee5816b05afd0c922c6748a95b llvm-svn: 341798	2018-09-10 10:14:48 +00:00
Matt Arsenault	72d27f5525	AMDGPU: Fix tests using old number for constant address space llvm-svn: 341770	2018-09-10 02:54:25 +00:00
Matt Arsenault	d77fcc2a92	AMDGPU: Use GOT PSV since it has an address space now llvm-svn: 341768	2018-09-10 02:23:39 +00:00
Matt Arsenault	b998674610	AMDGPU: Don't abort on unknown addrspace argument llvm-svn: 341767	2018-09-10 02:23:30 +00:00
Alexander Timofeev	a805c96c65	[AMDGPU] Preliminary patch for divergence driven instruction selection. Fold immediate SMRD offset. Differential revision: https://reviews.llvm.org/D51610 Reviewer: rampitec llvm-svn: 341636	2018-09-07 09:05:34 +00:00
Scott Linder	834cbc645c	Revert r341413 Causes a regression in expensive checks. llvm-svn: 341589	2018-09-06 21:38:56 +00:00
Scott Linder	dfe089dfd1	[AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions Emit a waterfall loop in the general case for a potentially-divergent Rsrc operand. When practical, avoid this by using Addr64 instructions. Differential Revision: https://reviews.llvm.org/D50982 llvm-svn: 341413	2018-09-04 21:50:47 +00:00
Matt Arsenault	813613c494	AMDGPU: Fix DAG divergence not reporting flat loads Match behavior in DAG of r340343 llvm-svn: 341393	2018-09-04 18:58:19 +00:00
Matt Arsenault	ca25b58957	DAG: Handle extract_vector_elt in isKnownNeverNaN llvm-svn: 341317	2018-09-03 14:01:03 +00:00
Tom Stellard	ffc6bd6f3d	AMDGPU/GlobalISel: Define instruction mapping for G_SELECT Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D49737 llvm-svn: 341271	2018-09-01 02:41:19 +00:00
Stanislav Mekhanoshin	44451b3344	[AMDGPU] Split v32i32 loads Differential Revision: https://reviews.llvm.org/D51555 llvm-svn: 341266	2018-08-31 22:43:36 +00:00
Matt Arsenault	bf07a50a98	AMDGPU: Restrict extract_vector_elt combine to loads The intention is to enable the extract_vector_elt load combine, and doing this for other operations interferes with more useful optimizations on vectors. Handle any type of load since in principle we should do the same combine for the various load intrinsics. llvm-svn: 341219	2018-08-31 15:39:52 +00:00
Matt Arsenault	6f35f0c212	AMDGPU: Actually commit re-run of update_llc_test_checks llvm-svn: 341218	2018-08-31 15:05:06 +00:00
Matt Arsenault	28c16bd534	AMDGPU: Fix broken generated check lines This was incorrectly using the same check prefix for multiple lines llvm-svn: 341214	2018-08-31 14:34:22 +00:00
Matt Arsenault	65e43cade8	AMDGPU: Remove obsolete tests llvm-svn: 341169	2018-08-31 06:07:45 +00:00
Matt Arsenault	988df63525	AMDGPU: Stop forcing internalize at -O0 This doesn't really matter if clang is always emitting the visibility as hidden by default. llvm-svn: 341168	2018-08-31 06:02:36 +00:00
Matt Arsenault	0da6350dc8	AMDGPU: Remove remnants of old address space mapping llvm-svn: 341165	2018-08-31 05:49:54 +00:00

1 2 3 4 5 ...

1822 Commits