llvm-project

Author	SHA1	Message	Date
Jon Chesterfield	253f804deb	[amdgpu] Update med3 combine to skip i64 [amdgpu] Update med3 combine to skip i64 Fixes an assumption that a type which is not i32 will be i16. This asserts when trying to sign/zero extend an i64 to i32. Test case was cut down from an openmp application. Variations on it are hit by other combines before reaching the problematic one, e.g. replacing the immediate values with other function arguments changes the codegen path and misses this combine. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D98872	2021-03-18 15:56:41 +00:00
Jay Foad	078b338ba6	[AMDGPU] Add some gfx1010 test coverage. NFC.	2021-03-18 14:00:07 +00:00
Matt Arsenault	b9a0384983	GlobalISel: Preserve source value information for outgoing byval args Pass through the original argument IR value in order to preserve the aliasing information in the memcpy memory operands.	2021-03-18 09:16:54 -04:00
Matt Arsenault	61f834cc09	GlobalISel: Insert memcpy for outgoing byval arguments byval requires an implicit copy between the caller and callee such that the callee may write into the stack area without it modifying the value in the parent. Previously, this was passing through the raw pointer value which would break if the callee wrote into it. Most of the time, this copy can be optimized out (however we don't have the optimization SelectionDAG does yet). This will trigger more fallbacks for AMDGPU now, since we don't have legalization for memcpy yet (although we should stop using byval anyway).	2021-03-18 09:16:54 -04:00
Simon Pilgrim	388fbefb4f	[AMDGPU] Regenerate atomic_optimizations_global_pointer.ll tests	2021-03-18 11:15:44 +00:00
Simon Pilgrim	cfc256ba9f	[DAG] TargetLowering::isBinOp() - add ISD::SSUBSAT/USUBSAT Add to the generic non-commutative binop list.	2021-03-17 14:51:00 +00:00
Simon Pilgrim	4a68740547	Revert rG3b635253ddd0106c88051cff3540d8eb90bee22f "[AMDGPU] Regenerate wave32.ll test checks" Breaks on some buildbots.	2021-03-17 11:47:09 +00:00
Simon Pilgrim	3b635253dd	[AMDGPU] Regenerate wave32.ll test checks This is to help simplify the diff on an upcoming patch	2021-03-17 11:27:11 +00:00
Stanislav Mekhanoshin	bc27a31801	[AMDGPU] Fix copyPhysReg to not produce unalined vgpr access RA can insert something like a sub1_sub2 COPY of a wide VGPR tuple which results in the unaligned acces with v_pk_mov_b32 after the copy is expanded. This is regression after D97316. Differential Revision: https://reviews.llvm.org/D98549	2021-03-15 14:14:30 -07:00
Stanislav Mekhanoshin	3bffb1cd0e	[AMDGPU] Use single cache policy operand Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amount of code. These operands are mostly 0 anyway. Additional advantage that parser will accept these flags in any order unlike now. Differential Revision: https://reviews.llvm.org/D96469	2021-03-15 13:00:59 -07:00
Jon Chesterfield	13e49dcee4	[amdgpu] Implement lower function LDS pass [amdgpu] Implement lower function LDS pass Local variables are allocated at kernel launch. This pass collects global variables that are used from non-kernel functions, moves them into a new struct type, and allocates an instance of that type in every kernel. Uses are then replaced with a constantexpr offset. Prior to this pass, accesses from a function are compiled to trap. With this pass, most such accesses are removed before reaching codegen. The trap logic is left unchanged by this pass. It is still reachable for the cases this pass misses, notably the extern shared construct from hip and variables marked constant which survive the optimizer. This is of interest to the openmp project because the deviceRTL runtime library uses cuda shared variables from functions that cannot be inlined. Trunk llvm therefore cannot compile some openmp kernels for amdgpu. In addition to the unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled and the function pointer hashing scheme deleted passes the openmp suite. This lowering will use more LDS than strictly necessary. It is intended to be a functionally correct fallback for cases that are difficult to target from future optimisation passes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94648	2021-03-15 15:24:01 +00:00
Carl Ritson	13877db2fa	[AMDGPU] Fix shortfalls in WQM marking When tracking defined lanes through phi nodes in the live range graph each branch of the phi must be handled independently. Also rewrite the marking algorithm to reduce unnecessary operations. Previously a shared set of defined lanes was used which caused marking to stop prematurely. This was observable in existing lit tests, but test patterns did not cover this detail. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D98614	2021-03-15 21:44:15 +09:00
Roman Lebedev	78b8ce40ef	Reland [SCEV] Improve modelling for (null) pointer constants This reverts commit 329aeb5db43f5e69df038fb20d2def77fe6f8595, and relands commit 61f006ac655431bd44b9e089f74c73bec0c1a48c. This is a continuation of D89456. As it was suggested there, now that SCEV models `PtrToInt`, we can try to improve SCEV's pointer handling. In particular, i believe, i will need this in the future to further fix `SCEVAddExpr`operation type handling. This removes special handling of `ConstantPointerNull` from `ScalarEvolution::createSCEV()`, and add constant folding into `ScalarEvolution::getPtrToIntExpr()`. This way, `null` constants stay as such in SCEV's, but gracefully become zero integers when asked. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D98147	2021-03-13 16:05:34 +03:00
Roman Lebedev	329aeb5db4	Temporairly evert "[SCEV] Improve modelling for (null) pointer constants" This appears to have broken ubsan bot: https://lab.llvm.org/buildbot/#/builders/85/builds/3062 https://reviews.llvm.org/D98147#2623549 It looks like LSR needs some kind of a change around insertion point handling. Reverting until i have a fix. This reverts commit 61f006ac655431bd44b9e089f74c73bec0c1a48c.	2021-03-13 09:10:28 +03:00
Roman Lebedev	61f006ac65	[SCEV] Improve modelling for (null) pointer constants This is a continuation of D89456. As it was suggested there, now that SCEV models `PtrToInt`, we can try to improve SCEV's pointer handling. In particular, i believe, i will need this in the future to further fix `SCEVAddExpr`operation type handling. This removes special handling of `ConstantPointerNull` from `ScalarEvolution::createSCEV()`, and add constant folding into `ScalarEvolution::getPtrToIntExpr()`. This way, `null` constants stay as such in SCEV's, but gracefully become zero integers when asked. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D98147	2021-03-12 22:11:58 +03:00
Simonas Kazlauskas	a2eca31da2	Test cases for rem-seteq fold with illegal types This also briefly tests a larger set of architectures than the more exhaustive functionality tests for AArch64 and x86. As requested in D88785 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98339	2021-03-12 16:28:04 +02:00
Matt Arsenault	6b76d82853	GlobalISel: Fix marking byval arguments as immutable byval arguments need to be assumed writable. Only implicitly stack passed arguments which aren't addressable in the IR can be assumed immutable. Mips is still broken since for some reason its doing its own thing with the ValueHandlers (and x86 doesn't actually handle byval arguments now, although some of the code is there).	2021-03-12 09:01:53 -05:00
Matt Arsenault	34471c3060	GlobalISel: Partially fix handling of byval arguments This was essentially ignoring byval and treating them as a pointer argument which needed to be loaded from. This should copy the frame index value to the virtual register, not insert a load from the frame index into the pointer value. For AMDGPU, this was producing a load from the byval pointer argument, to a pointer used for the byval arguments. I do not understand how AArch64 managed to work before since it appears to be similarly broken. We could also change the ValueHandler API to avoid the extra copy from the frame index, since currently it returns a new register. I believe there is still an issue with outgoing byval arguments. These should have a copy inserted in case the callee decided to overwrite the memory.	2021-03-12 09:01:53 -05:00
Carl Ritson	f08dadd242	[AMDGPU] Do not annotate an else branch if there is a kill As llvm.amdgcn.kill is lowered to a terminator it can cause else branch annotations to end up in the wrong block. Do not annotate conditionals as else branches where there is a kill to avoid this. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D97427	2021-03-12 11:52:08 +09:00
Carl Ritson	c07f2025e4	[AMDGPU] Restrict image_msaa_load to MSAA dimension types This instruction is only valid on 2D MSAA and 2D MSAA Array surfaces. Remove intrinsic support for other dimension types, and block assembly for unsupported dimensions. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D98397	2021-03-12 09:47:24 +09:00
Ruiling Song	e8e6817d00	[AMDGPU] Don't check hasStackObjects() when reserving VGPR We have amdgpu_gfx functions that have high register pressure. If we do not reserve VGPR for SGPR spill, we will fall into the path to spill the SGPR to memory, which does not only have correctness issue, but also have really bad performance. I don't know why there is the check for hasStackObjects(), in our case, we don't have stack objects at the time of finalizeLowering(). So just remove the check that we always reserve a VGPR for possible SGPR spill in non-entry functions. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D98345	2021-03-12 08:11:14 +08:00
Matt Arsenault	70cb57d7da	AMDGPU/GlobalISel: Improve private addressing mode matching This enables the look-through-copy to hack around not correctly regbankselecting constants to match the use bank.	2021-03-11 10:23:35 -05:00
Matt Arsenault	cf5ecd5644	GlobalISel: Fix off by one in finding explicit byval alignment For attribute sets, the return index is at 0, and arguments start at 1. getParamAlignment adds the offset of 1, so we need to convert from attribute index back to IR index.	2021-03-11 10:23:08 -05:00
Matt Arsenault	0e0c7ef8e4	AMDGPU/GlobalISel: Add more tests for byval arguments	2021-03-11 10:23:08 -05:00
Ruiling Song	66340846b3	[AMDGPU] Always create Stack Object for reserved VGPR As we may overwrite inactive lanes of a caller-save-vgpr, we should always save/restore the reserved vgpr for sgpr spill. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D98319	2021-03-11 10:06:07 +08:00
Daniel Sanders	134a179dee	[mir] Change 'undef' for MMO base addresses to 'unknown-address' Differential Revision: https://reviews.llvm.org/D98100	2021-03-10 16:46:44 -08:00
Stanislav Mekhanoshin	574a9dabc6	[AMDGPU] Always expand system scope fp atomics on gfx90a FP atomics in system scope cannot be used and shall always be expanded in a CAS loop. Differential Revision: https://reviews.llvm.org/D98085	2021-03-10 12:35:23 -08:00
Jay Foad	70f013fd3b	[AMDGPU] Fix isReallyTriviallyReMaterializable for V_MOV_* D57708 changed SIInstrInfo::isReallyTriviallyReMaterializable to reject V_MOVs with extra implicit operands, but it accidentally rejected all V_MOVs because of their implicit use of exec. Fix it but avoid adding a moderately expensive call to MI.getDesc().getNumImplicitUses(). In real graphics shaders this changes quite a few vgpr copies into move- immediates, which is good for avoiding stalls on GFX10. Differential Revision: https://reviews.llvm.org/D98347	2021-03-10 16:18:12 +00:00
Christudasan Devadasan	4c6ab48fb1	GlobalISel: Try to combine G_[SU]DIV and G_[SU]REM It is good to have a combined `divrem` instruction when the `div` and `rem` are computed from identical input operands. Some targets can lower them through a single expansion that computes both division and remainder. It effectively reduces the number of instructions than individually expanding them. Reviewed By: arsenm, paquette Differential Revision: https://reviews.llvm.org/D96013	2021-03-10 18:46:07 +05:30
Christudasan Devadasan	24c0ad7143	[AMDGPU] Fix the dead frame indices during custom spill lowering. AMDGPU target tries to handle the SGPR and VGPR spills in a custom pass before the actual frame lowering pass. Once they are handled and the respective frames are eliminated in the custom pass, certain uses of them still remain. For instance, the DBG_VALUE instructions inserted by the allocator alongside the spill instruction will use the corresponding frame index. They become dead later during PEI and causes a crash while trying to replace the frame indices. We should possibly avoid this custom pass. For now, replacing such dead references with null register value. Reviewed By: arsenm, scott.linder Differential Revision: https://reviews.llvm.org/D98038	2021-03-09 23:22:49 +05:30
Ruiling Song	f0ccdde3c9	[AMDGPU] Remove SI_MASK_BRANCH This is already deprecated, so remove code working on this. Also update the tests by using S_CBRANCH_EXECZ instead of SI_MASK_BRANCH. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D97545	2021-03-09 09:13:23 +08:00
Craig Topper	0eb405c3b8	[SelectionDAG] Add computeKnownBits support for ISD::USUBSAT. The result of ISD::USUBSAT will never be larger than the LHS. We can use this to put a bound on the number of leading zeros. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98133	2021-03-07 09:48:42 -08:00
Jay Foad	99682bc039	Revert "Revert "[AMDGPU] Restore the s_memtime instruction in gfx1030"" This reverts commit e58d68fcd06ddc7743e0419c0b364df3d44121b6. This reinstates commit fc28f600e558c1344618bda149a068d6162b6f0b with a fix to initialize HasShaderCyclesRegister. See https://reviews.llvm.org/D97928.	2021-03-06 09:00:01 +00:00
Mitch Phillips	e58d68fcd0	Revert "[AMDGPU] Restore the s_memtime instruction in gfx1030" Broke the ASan/MSan buildbots. See more comments in the original patch, https://reviews.llvm.org/D97928. Build failure at http://lab.llvm.org:8011/#/builders/5/builds/5327 This reverts commit fc28f600e558c1344618bda149a068d6162b6f0b.	2021-03-05 18:24:59 -08:00
Jay Foad	fc28f600e5	[AMDGPU] Restore the s_memtime instruction in gfx1030 gfx1030 added a new way to implement readcyclecounter using the SHADER_CYCLES hardware register, but the s_memtime instruction still exists, so the MC layer should still accept it and the llvm.amdgcn.s.memtime intrinsic should still work. Differential Revision: https://reviews.llvm.org/D97928	2021-03-05 20:19:11 +00:00
RamNalamothu	3998a8e797	[AMDGPU] Do not attempt sgpr spills to vgpr, when it is disabled This covers a path missed in https://reviews.llvm.org/D95768. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D98013	2021-03-05 22:47:21 +05:30
Sebastian Neubauer	e0e73714fb	[AMDGPU] Keep skip branch for ds instructions Same as other memory instructions, ds instructions add latency even if exec is zero. Jumping over them if exec=0 is cheaper than executing them. With this change, the branch instruction that skips over a basic block if exec=0 is not removed when the block contains a ds instruction. Differential Revision: https://reviews.llvm.org/D97922	2021-03-05 12:34:09 +01:00
Petar Avramovic	36beaa3ba3	Reland AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect Recommit bf5a5826504754788a8f1e3fec7a7dc95cda5782. Depends on 4c8fb7ddd6fa49258e0e9427e7345fb56ba522d4 which was reverted. RegBankSelect creates zext and trunc when it selects banks for uniform i1. Add zext_trunc_fold from generic combiner to post RegBankSelect combiner. Differential Revision: https://reviews.llvm.org/D95432	2021-03-05 11:05:37 +01:00
Petar Avramovic	d44f61f81c	Reland [GlobalISel] Combine zext(trunc x) to x Recommit 4112299ee761a9b6a309c8ff4a7e75f8c8d8851b. Depends on 4c8fb7ddd6fa49258e0e9427e7345fb56ba522d4 which was reverted. Combine zext(trunc x) to x when truncated bits are known to be zero. Differential Revision: https://reviews.llvm.org/D96031	2021-03-05 11:05:37 +01:00
Jay Foad	ed7458398a	[AMDGPU] Don't check for VMEM hazards on GFX10 The hazard where a VMEM reads an SGPR written by a VALU counts as a data dependency hazard, so no nops are required on GFX10. Tested with Vulkan CTS on GFX10.1 and GFX10.3. Differential Revision: https://reviews.llvm.org/D97926	2021-03-04 21:44:56 +00:00
Petar Avramovic	d7834556b7	Reland [GlobalISel] Start using vectors in GISelKnownBits This is recommit of 4c8fb7ddd6fa49258e0e9427e7345fb56ba522d4. MIR in one unit test had mismatched types. For vectors we consider a bit as known if it is the same for all demanded vector elements (all elements by default). KnownBits BitWidth for vector type is size of vector element. Add support for G_BUILD_VECTOR. This allows combines of urem_pow2_to_mask in pre-legalizer combiner. Differential Revision: https://reviews.llvm.org/D96122	2021-03-04 21:47:13 +01:00
Daniel Sanders	9fc2be6f28	[mir] Fix confusing MIR when MMO's value is nullptr but offset is non-zero :: (store 1 + 4, addrspace 1) -> :: (store 1 into undef + 4, addrspace 1) An offset without a base isn't terribly useful but it's convenient to update the offset without checking the value. For example, when breaking apart stores into smaller units Differential Revision: https://reviews.llvm.org/D97812	2021-03-04 10:34:30 -08:00
Nico Weber	e68de60bc4	Revert "AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect" This reverts commit bf5a5826504754788a8f1e3fec7a7dc95cda5782. Also depends on now-reverted 4c8fb7ddd6fa49258e0e9427e7345fb56ba522d4	2021-03-04 10:16:11 -05:00
Nico Weber	59beb1ef6d	Revert "[GlobalISel] Combine zext(trunc x) to x" This reverts commit 4112299ee761a9b6a309c8ff4a7e75f8c8d8851b. Seems to depend on 4c8fb7ddd6fa49258e0e9427e7345fb56ba522d4 which is being reverted.	2021-03-04 10:13:40 -05:00
Nico Weber	4b1015361c	Revert "[GlobalISel] Start using vectors in GISelKnownBits" This reverts commit 4c8fb7ddd6fa49258e0e9427e7345fb56ba522d4. Breaks check-llvm everywhere, see https://reviews.llvm.org/D96122	2021-03-04 10:13:40 -05:00
Petar Avramovic	bf5a582650	AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect RegBankSelect creates zext and trunc when it selects banks for uniform i1. Add zext_trunc_fold from generic combiner to post RegBankSelect combiner. Differential Revision: https://reviews.llvm.org/D95432	2021-03-04 15:05:24 +01:00
Petar Avramovic	4112299ee7	[GlobalISel] Combine zext(trunc x) to x Combine zext(trunc x) to x when truncated bits are known to be zero. Differential Revision: https://reviews.llvm.org/D96031	2021-03-04 15:05:23 +01:00
Petar Avramovic	4c8fb7ddd6	[GlobalISel] Start using vectors in GISelKnownBits For vectors we consider a bit as known if it is the same for all demanded vector elements (all elements by default). KnownBits BitWidth for vector type is size of vector element. Add support for G_BUILD_VECTOR. This allows combines of urem_pow2_to_mask in pre-legalizer combiner. Differential Revision: https://reviews.llvm.org/D96122	2021-03-04 15:05:23 +01:00
Baptiste Saleil	54c0f520c7	[VirtRegRewriter] Insert missing killed flags when tracking subregister liveness VirtRegRewriter may sometimes fail to correctly apply the kill flag where necessary, which causes unecessary code gen on PowerPC. This patch fixes the way masks for defined lanes are computed and the way mask for used lanes is computed. Contact albion.fung@ibm.com instead of author for problems related to this commit. Differential Revision: https://reviews.llvm.org/D92405	2021-03-03 12:02:04 -05:00
Matt Arsenault	78dcff4841	GlobalISel: Add default implementation of assignValueToReg Refactor insertion of the asserting ops. This enables using them for AMDGPU. This code should essentially be the same for every target. Mips, X86 and ARM all have different code there now, but this seems to be an accident. The assignment functions are called with different types than they would be in the DAG, so this is all likely an assortment of hacks to get around that.	2021-03-03 09:29:53 -05:00

1 2 3 4 5 ...

4381 Commits