llvm-project

Author	SHA1	Message	Date
Thomas Symalla	718aec209c	[AMDGPU] Improve v_cmpx usage on GFX10.3. On GFX10.3 targets, the following instruction sequence v_cmp_* SGPR, ... s_and_saveexec ..., SGPR leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR. An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison. This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence. It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets. Same as D119696 including a buildbot and MIR test fix. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D122332	2022-03-25 11:40:18 +01:00
Thomas Symalla	7de6107dce	Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3." This reverts commit 011c64191ef9ccc6538d52f4b57f98f37d4ea36e and e725e2afe02e18398525652c9bceda1eb055ea64. Differential Revision: https://reviews.llvm.org/D122117	2022-03-21 09:50:44 +01:00
Thomas Symalla	011c64191e	[AMDGPU] Improve v_cmpx usage on GFX10.3. On GFX10.3 targets, the following instruction sequence v_cmp_* SGPR, ... s_and_saveexec ..., SGPR leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR. An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison. This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence. It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets. Reviewed By: sebastian-ne, critson Differential Revision: https://reviews.llvm.org/D119696	2022-03-21 09:31:59 +01:00
Carl Ritson	565af157ef	[AMDGPU] Extend pre-emit peephole to redundantly masked VCC Extend pre-emit peephole for S_CBRANCH_VCC[N]Z to eliminate redundant S_AND operations against EXEC for V_CMP results in VCC. These occur after after register allocation when VCC has been selected as the comparison destination. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D120202	2022-02-25 10:18:31 +09:00
Ruiling Song	0719c43735	AMDGPU: Don't clobber source register for V_SET_INACTIVE_* The WWM register has unmodeled register liveness, For v_set_inactive_*, clobberring source register is dangerous because it will overwrite the inactive lanes. When the source vgpr is dead at v_set_inactive_lane, the inactive lanes may be not really dead. This may make common optimizations doing wrong. For example in a simple if-then cfg in Machine IR: bb.if: %src = bb.then: %src1 = COPY %src %dst = V_SET_INACTIVE %src1(tied-def 0), %inactive bb.end ... = PHI [0, %bb.then] [%src, %bb.if] The register coalescer will think it is safe to optimize "%src1 = COPY %src" in bb.then. And at the same time, there is no interference for the PHI in bb.end. The source and destination values of the PHI will be assigned the same register. The single PHI register will be overwritten by the v_set_inactive, then we would get wrong value in bb.end. With this change, we will copy the content of the source register before setting inactive lanes after register allocation. Yes, this will sacrifice the WWM code generation a little, but I don't have any better idea to do things correctly. Differential Revision: https://reviews.llvm.org/D117482	2022-02-06 12:38:26 +08:00
Konstantina	aa418b9133	[AMDGPU][SIWholeQuadMode] Use the right VCC register to activate the correct lanes. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D118096	2022-01-26 08:54:39 -08:00
Jay Foad	0bc14a0a98	[AMDGPU] Tweak some compares in wqm.ll test This prevents the compares from being optimized away when D86578 lands, which seems unintended. Also fixed some unused results.	2022-01-19 12:42:56 +00:00
Carl Ritson	ed4d8fdafd	[AMDGPU] Autogenerate wqm.ll Switch wqm.ll to be autogenerated. Replace gfx6 and gfx8 targets with gfx9 (wave64) and gfx10 (wave32). Reviewed By: kmitropoulou Differential Revision: https://reviews.llvm.org/D117455	2022-01-18 16:04:40 +09:00
RamNalamothu	18f9351223	[AMDGPU] Do not generate ELF symbols for the local branch target labels The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slows down the loader, and is only used to simplify disassembly. Use '--symbolize-operands' with llvm-objdump to improve readability of the branch target operands in disassembly. Fixes: SWDEV-312223 Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D114273	2021-11-20 10:32:41 +05:30
Carl Ritson	13877db2fa	[AMDGPU] Fix shortfalls in WQM marking When tracking defined lanes through phi nodes in the live range graph each branch of the phi must be handled independently. Also rewrite the marking algorithm to reduce unnecessary operations. Previously a shared set of defined lanes was used which caused marking to stop prematurely. This was observable in existing lit tests, but test patterns did not cover this detail. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D98614	2021-03-15 21:44:15 +09:00
Piotr Sobczak	4672bac177	[AMDGPU] Introduce Strict WQM mode * Add amdgcn_strict_wqm intrinsic. * Add a corresponding STRICT_WQM machine instruction. * The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active. * The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96258	2021-03-03 14:19:16 +01:00
Piotr Sobczak	c3ce7bae80	[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm * Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm The change is done for consistency as the "strict" prefix will become an important, distinguishing factor between amdgcn_wqm and amdgcn_strictwqm in the future. The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict mode will always be enabled irrespective of control flow decisions. The amdgcn_wwm will be removed, but doing so in two steps gives users time to switch to the new name at their own pace. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96257	2021-03-03 09:33:57 +01:00
Carl Ritson	c16f776028	[AMDGPU] Move kill lowering to WQM pass and add live mask tracking Move implementation of kill intrinsics to WQM pass. Add live lane tracking by updating a stored exec mask when lanes are killed. Use live lane tracking to enable early termination of shader at any point in control flow. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D94746	2021-02-11 20:31:29 +09:00
Mircea Trofin	ee57d30f44	[NFC] Removed unused prefixes from CodeGen/AMDGPU Last bulk batch. Differential Revision: https://reviews.llvm.org/D94236	2021-01-07 09:48:14 -08:00
Carl Ritson	7a880ab388	[AMDGPU] Move WQM Pass after MI Scheduler Exec mask manipulation inserted by SIWholeQuadMode barriers to instruction scheduling. Move the entire pass after the machine instruction scheduler and make changes so pass is correct for non-SSA operation. These changes should leave the pass still usable pre-scheduler, although tests have be updated to reflect post-scheduler results. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D88081	2020-10-27 10:25:53 +09:00
Matt Arsenault	27df165270	Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel." This reverts commit c3492a1aa1b98c8d81b0969d52cea7681f0624c2. I think this is the wrong strategy and wrong place to do this transform anyway. Also reverts follow up commit 7d593d0d6905b55ca1124fca5e4d1ebb17203138.	2020-09-18 09:48:33 -04:00
Michael Liao	c3492a1aa1	[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel. - Need to lower COPY from SGPR to VGPR to a real instruction as the standard COPY is used where the source and destination are from the same register bank so that we potentially coalesc them together and save one COPY. Considering that, backend optimizations, such as CSE, won't handle them. However, the copy from SGPR to VGPR always needs materializing to a native instruction, it should be lowered into a real one before other backend optimizations. Differential Revision: https://reviews.llvm.org/D87556	2020-09-17 11:04:17 -04:00
Carl Ritson	674226126d	[AMDGPU] Apply pre-emit s_cbranch_vcc optimation to more patterns Add handling of s_andn2 and mask of 0. This eliminates redundant instructions from uniform control flow. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D83641	2020-07-15 11:02:35 +09:00
Konstantin Pyzhov	72e8754916	[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU. Reviewers: sameerds, dstuttard Differential Revision: https://reviews.llvm.org/D77228	2020-04-06 09:05:58 -04:00
Konstantin Pyzhov	51dc028314	Revert e1730cfeb3588f20dcf4a96b181ad52761666e52	2020-04-06 05:56:11 -04:00
Konstantin Pyzhov	e1730cfeb3	[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU. Reviewers: sameerds, dstuttard Differential Revision: https://reviews.llvm.org/D77228	2020-04-06 05:10:37 -04:00
Scott Linder	60b1967c39	[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI. As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative. Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first. Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75138	2020-03-19 15:35:16 -04:00
cdevadas	e53a9d96e6	Resubmit: [AMDGPU] Invert the handling of skip insertion. The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092	2020-01-22 13:18:32 +09:00
Nicolai Hähnle	a80291ce10	Revert "[AMDGPU] Invert the handling of skip insertion." This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd. The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa.	2020-01-21 09:17:25 +01:00
Matt Arsenault	20ca49b646	AMDGPU: Update tests to use modern buffer intrinsics	2020-01-16 13:49:43 -05:00
cdevadas	0dc6c249bf	[AMDGPU] Invert the handling of skip insertion. The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092	2020-01-15 15:18:16 +05:30
Michael Bedy	4a32cd11ac	[AMDGPU] Remove unnecessary v_mov from a register to itself in WQM lowering. Summary: - SI Whole Quad Mode phase is replacing WQM pseudo instructions with v_mov instructions. While this is necessary for the special handling of moving results out of WWM live ranges, it is not necessary for WQM live ranges. The result is a v_mov from a register to itself after every WQM operation. This change uses a COPY psuedo in these cases, which allows the register allocator to coalesce the moves away. Reviewers: tpr, dstuttard, foad, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71386	2020-01-10 23:01:19 -05:00
Jordan Rupprecht	f9f81289e6	Revert [MBP] Disable aggressive loop rotate in plain mode This reverts r369664 (git commit 51f48295cbe8fa3a44db263b528dd9f7bae7bf9a) It causes many benchmark regressions, internally and in llvm's benchmark suite. llvm-svn: 370398	2019-08-29 19:03:58 +00:00
Guozhi Wei	51f48295cb	[MBP] Disable aggressive loop rotate in plain mode Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. Differential Revision: https://reviews.llvm.org/D65673 llvm-svn: 369664	2019-08-22 16:21:32 +00:00
Hans Wennborg	a45f301f7a	Revert r368339 "[MBP] Disable aggressive loop rotate in plain mode" It caused assertions to fire when building Chromium: lib/CodeGen/LiveDebugValues.cpp:331: bool {anonymous}::LiveDebugValues::OpenRangesSet::empty() const: Assertion `Vars.empty() == VarLocs.empty() && "open ranges are inconsistent"' failed. See https://crbug.com/992871#c3 for how to reproduce. > Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. > > To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. > > Differential Revision: https://reviews.llvm.org/D65673 llvm-svn: 368579	2019-08-12 14:23:13 +00:00
Guozhi Wei	80347c3acc	[MBP] Disable aggressive loop rotate in plain mode Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. Differential Revision: https://reviews.llvm.org/D65673 llvm-svn: 368339	2019-08-08 20:25:23 +00:00
Guozhi Wei	d2210af332	[MBP] Move a latch block with conditional exit and multi predecessors to top of loop Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is: * a latch block * it has two successors, one is loop header, another is exit * it has more than one predecessors If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch. Differential Revision: https://reviews.llvm.org/D43256 llvm-svn: 363471	2019-06-14 23:08:59 +00:00
Stanislav Mekhanoshin	971cb8b633	[AMDGPU] gfx1010: prefer V_MUL_LO_U32 over V_MUL_LO_I32 GFX10 deprecates v_mul_lo_i32 instruction, so choose u32 form for all targets. Differential Revision: https://reviews.llvm.org/D61525 llvm-svn: 360094	2019-05-06 22:27:05 +00:00
Rhys Perry	c2814e12e7	AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructions Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches. Reviewers: arsen, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60824 llvm-svn: 358592	2019-04-17 16:31:52 +00:00
Matt Arsenault	aa9bcd56b1	AMDGPU: Remove llvm.AMDGPU.kill This is the last of the old AMDGPU intrinsics. llvm-svn: 348615	2018-12-07 17:46:16 +00:00
Alexander Timofeev	db7ee7660a	[AMDGPU] Preliminary patch for divergence driven instruction selection. Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec llvm-svn: 341928	2018-09-11 11:56:50 +00:00
Tim Renouf	abd85fb1f5	[AMDGPU] Reworked SIFixWWMLiveness Summary: I encountered some problems with SIFixWWMLiveness when WWM is in a loop: 1. It sometimes gave invalid MIR where there is some control flow path to the new implicit use of a register on EXIT_WWM that does not pass through any def. 2. There were lots of false positives of registers that needed to have an implicit use added to EXIT_WWM. 3. Adding an implicit use to EXIT_WWM (and adding an implicit def just before the WWM code, which I tried in order to fix (1)) caused lots of the values to be spilled and reloaded unnecessarily. This commit is a rework of SIFixWWMLiveness, with the following changes: 1. Instead of considering any register with a def that can reach the WWM code and a def that can be reached from the WWM code, it now considers three specific cases that need to be handled. 2. A register that needs liveness over WWM to be synthesized now has it done by adding itself as an implicit use to defs other than the dominant one. Also added the following fixmes: FIXME: We should detect whether a register in one of the above categories is already live at the WWM code before deciding to add the implicit uses to synthesize its liveness. FIXME: I believe this whole scheme may be flawed due to the possibility of the register allocator doing live interval splitting. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46756 Change-Id: Ie7fba0ede0378849181df3f1a9a7a39ed1a94a94 llvm-svn: 338783	2018-08-02 23:31:32 +00:00
Nicolai Haehnle	1045928aab	AMDGPU: Convert test cases to the dimension-aware intrinsics Summary: Also explicitly port over some tests in llvm.amdgcn.image.* that were missing. Some tests are removed because they no longer apply (i.e. explicitly testing building an address vector via insertelement). This is in preparation for the eventual removal of the old-style intrinsics. Some additional notes: - constant-address-space-32bit.ll: change some GCN-NEXT to GCN because the instruction schedule was subtly altered - insert_vector_elt.ll: the old test didn't actually test anything, because %tmp1 was not used; remove the load, because it doesn't work (Because of the amdgpu_ps calling convention? In any case, it's orthogonal to what the test claims to be testing.) Change-Id: Idfa99b6512ad139e755e82b8b89548ab08f0afcf Reviewers: arsenm, rampitec Subscribers: MatzeB, qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D48018 llvm-svn: 335229	2018-06-21 13:37:19 +00:00
Tim Renouf	364edcd2e5	[AMDGPU] Fixed WWM bug in block otherwise entirely in WQM Summary: For a block with WQM on entry and exit and containing no exact mode code, but containing some WWM code, the WQM pass forgot to process the block at all and so did not insert code to enter and leave WWM. This commit fixes that. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47027 Change-Id: I044792eead1293bed4203fb26ce75f47878afeb6 llvm-svn: 333362	2018-05-27 17:26:11 +00:00
Yaxun Liu	2a22c5deff	[AMDGPU] Switch to the new addr space mapping by default This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101	2018-02-02 16:07:16 +00:00
Dmitry Preobrazhensky	a0342dc9eb	[AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev} See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765 Reviewers: tamazov, SamWot, arsenm, vpykhtin Differential Revision: https://reviews.llvm.org/D40088 llvm-svn: 318675	2017-11-20 18:24:21 +00:00
Stanislav Mekhanoshin	d4ae470d2e	[AMDGPU] Prevent post-RA scheduler from breaking memory clauses The pre-RA scheduler does load/store clustering, but post-RA scheduler undoes it. Add mutation to prevent it. Differential Revision: https://reviews.llvm.org/D38014 llvm-svn: 313670	2017-09-19 20:54:38 +00:00
Tim Renouf	660ba2b8af	[AMDGPU] exp should not be in WQM mode A mrt exp with vm=1 must be in exact (non-WQM) mode, as it also exports the exec mask as the valid mask to determine which pixels to render. This commit marks any exp as needing to be in exact mode. Actually, if there are multiple mrt exps, only one needs to have vm=1, and only that one needs to be in exact mode. But that is an optimization for another day. Differential Revision: https://reviews.llvm.org/D36305 llvm-svn: 312915	2017-09-11 13:55:39 +00:00
Connor Abbott	66b9bd6e50	[AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic Summary: This intrinsic lets us set inactive lanes to an identity value when implementing wavefront reductions. In combination with Whole Wavefront Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan. Lowering the intrinsic needs to happen post-RA so that RA knows that the destination isn't completely overwritten due to the EXEC shenanigans, so we need another pseudo-instruction to represent the un-lowered intrinsic. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34719 llvm-svn: 310088	2017-08-04 18:36:54 +00:00
Connor Abbott	92638ab625	[AMDGPU] Add support for Whole Wavefront Mode Summary: Whole Wavefront Wode (WWM) is similar to WQM, except that all of the lanes are always enabled, regardless of control flow. This is required for implementing wavefront reductions in non-uniform control flow, where we need to use the inactive lanes to propagate intermediate results, so they need to be enabled. We need to propagate WWM to uses (unless they're explicitly marked as exact) so that they also propagate intermediate results correctly. We do the analysis and exec mask munging during the WQM pass, since there are interactions with WQM for things that require both WQM and WWM. For simplicity, WWM is entirely block-local -- blocks are never WWM on entry or exit of a block, and WWM is not propagated to the block level. This means that computations involving WWM cannot involve control flow, but we only ever plan to use WWM for a few limited purposes (none of which involve control flow) anyways. Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There isn't yet a way to turn WWM off -- that will be added in a future change. Finally, it turns out that turning on inactive lanes causes a number of problems with register allocation. While the best long-term solution seems like teaching LLVM's register allocator about predication, for now we need to add some hacks to prevent ourselves from getting into trouble due to constraints that aren't currently expressed in LLVM. For the gory details, see the comments at the top of SIFixWWMLiveness.cpp. Reviewers: arsenm, nhaehnle, tpr Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35524 llvm-svn: 310087	2017-08-04 18:36:52 +00:00
Connor Abbott	8c217d0a29	[AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM Summary: Previously, we assumed that certain types of instructions needed WQM in pixel shaders, particularly DS instructions and image sampling instructions. This was ok because with OpenGL, the assumption was correct. But we want to start using DPP instructions for derivatives as well as other things, so the assumption that we can infer whether to use WQM based on the instruction won't continue to hold. This intrinsic lets frontends like Mesa indicate what things need WQM based on their knowledge of the API, rather than second-guessing them in the backend. We need to keep around the old method of enabling WQM, but eventually we should remove it once Mesa catches up. For now, this will let us use DPP instructions for computing derivatives correctly. Reviewers: arsenm, tpr, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35167 llvm-svn: 310085	2017-08-04 18:36:49 +00:00
Stanislav Mekhanoshin	56ea488d8b	[AMDGPU] Allow SDWA in instructions with immediates and SGPRs An encoding does not allow to use SDWA in an instruction with scalar operands, either literals or SGPRs. That is however possible to copy these operands into a VGPR first. Several copies of the value are produced if multiple SDWA conversions were done. To cleanup MachineLICM (to hoist copies out of loops), MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace SGPR to VGPR copy with immediate copy right to the VGPR) runs are added after the SDWA pass. Differential Revision: https://reviews.llvm.org/D33583 llvm-svn: 304219	2017-05-30 16:49:24 +00:00
Matt Arsenault	964a848514	AMDGPU: Convert image intrinsic uses in tests llvm-svn: 298386	2017-03-21 16:24:12 +00:00
Matt Arsenault	707780b420	AMDGPU: Always allocate emergency stack slot at offset 0 This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877	2017-02-22 21:05:25 +00:00
Matt Arsenault	3ea06336fc	AMDGPU: Remove some uses of llvm.SI.export in tests Merge some of the old, smaller tests into more complete versions. llvm-svn: 295792	2017-02-22 00:02:21 +00:00

1 2

71 Commits