llvm-project

Author	SHA1	Message	Date
Carl Ritson	fbaa35e169	[AMDGPU] Add SelectionDAG support for insert_subvector on v4f64 Enable custom insert_subvector for larger vector types. This is necessary now that SelectionDAG can attempt v3f64 insert to v4f64, etc. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D105385	2021-07-27 10:11:34 +09:00
Michael Liao	b0402a35fc	[amdgpu] Add 64-bit PC support when expanding unconditional branches. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106445	2021-07-26 14:50:30 -04:00
Jay Foad	59f6865231	[AMDGPU][GISel] Fix MMO for raw/struct buffer access with non-constant offset Codegen for the raw/struct buffer access intrinsics would update the offset in the MMO to reflect the combined offset, if it was known to be constant. If the combined offset was not known to be constant, or if there was an index, it would set the offset in the MMO to 0. This is unsafe because it makes it look like the access does not alias with another access with a fixed non-zero offset. Fix these cases by setting the pointer in the MMO to null, to reflect the fact that we do not have any known IR value pointer + constant offset for the access. D106284 did this for SelectionDAG. This is the corresponding fix for GlobalISel. Differential Revision: https://reviews.llvm.org/D106451	2021-07-26 14:27:30 +01:00
Jay Foad	9ac10658ae	[AMDGPU] Fix MMO for raw/struct buffer access with non-constant offset Codegen for the raw/struct buffer access intrinsics would update the offset in the MMO to reflect the combined offset, if it was known to be constant. If the combined offset was not known to be constant, or if there was an index, it would set the offset in the MMO to 0. This is unsafe because it makes it look like the access does not alias with another access with a fixed non-zero offset. Fix these cases by setting the pointer in the MMO to null, to reflect the fact that we do not have any known IR value pointer + constant offset for the access. Differential Revision: https://reviews.llvm.org/D106284	2021-07-26 14:27:30 +01:00
David Sherwood	0aff1798b5	[Analysis] Add simple cost model for strict (in-order) reductions I have added a new FastMathFlags parameter to getArithmeticReductionCost to indicate what type of reduction we are performing: 1. Tree-wise. This is the typical fast-math reduction that involves continually splitting a vector up into halves and adding each half together until we get a scalar result. This is the default behaviour for integers, whereas for floating point we only do this if reassociation is allowed. 2. Ordered. This now allows us to estimate the cost of performing a strict vector reduction by treating it as a series of scalar operations in lane order. This is the case when FP reassociation is not permitted. For scalable vectors this is more difficult because at compile time we do not know how many lanes there are, and so we use the worst case maximum vscale value. I have also fixed getTypeBasedIntrinsicInstrCost to pass in the FastMathFlags, which meant fixing up some X86 tests where we always assumed the vector.reduce.fadd/mul intrinsics were 'fast'. New tests have been added here: Analysis/CostModel/AArch64/reduce-fadd.ll Analysis/CostModel/AArch64/sve-intrinsics.ll Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D105432	2021-07-26 10:26:06 +01:00
Alexander Belyaev	edb05d555e	[llvm] Inline getAssociatedFunction() in LLVM_DEBUG. Function* F is used only inside LLVM_DEBUG, so that it causes unused variable warning.	2021-07-24 11:49:21 +02:00
Kuter Dinel	96709823ec	[AMDGPU] Deduce attributes with the Attributor This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes. Reviewed By: jdoerfert, arsenm Differential Revision: https://reviews.llvm.org/D104997	2021-07-24 06:07:15 +03:00
David Truby	1528a4d400	[llvm][sve] Lowering for VLS truncating stores This adds custom lowering for truncating stores when operating on fixed length vectors in SVE. It also includes a DAG combine to fold extends followed by truncating stores into non-truncating stores in order to prevent this pattern appearing once truncating stores are supported. Currently truncating stores are not used in certain cases where the size of the vector is larger than the target vector width. Differential Revision: https://reviews.llvm.org/D104471	2021-07-23 14:04:55 +01:00
Sebastian Neubauer	2f15319968	[AMDGPU] Fix running ResourceUsageAnalysis Clear the map when running the analysis multiple times. The assertion that should ensure that every function is only analyzed once triggered sometimes (once every ~70 compiles of some graphics pipelines) when two functions of subsequent runs were allocated at the same address. Differential Revision: https://reviews.llvm.org/D106452	2021-07-23 09:25:15 +02:00
Carl Ritson	7d4baf25aa	[AMDGPU] Add maximum NSA size limit ISA feature Add maximum NSA size limit as an ISA feature. Use this to reduce NSA usage on GFX10.1 to avoid stability issues with 4 and 5 dwords NSA instructions. Maintain use of longer NSA instructions on GFX10.3. Note: this also contains some minor fixes for GlobalISel which did not work correctly with non-NSA form instructions on GFX10. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D103348	2021-07-23 16:16:06 +09:00
Carl Ritson	6efb3220b4	[AMDGPU] Add VReg_192/VReg_224 support for MIMG instructions Allow MIMG instructions to be selected with 6/7 VGPRs for vaddr. Previously these were rounded up to VReg_256 this saves VGPRs. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D103800	2021-07-22 10:42:15 +09:00
Carl Ritson	9dcd75f86f	[AMDGPU] Allow frontends to disable null export for pixel shaders Disable null export (for kills) when a frontend defines a pixel shader as not exporting using amdgpu-color-export and amdgpu-depth-export function attrbutes. This allows the generation of export free pixel shaders. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D105683	2021-07-22 10:20:46 +09:00
Stanislav Mekhanoshin	fe197ef9f1	[AMDGPU] Mark relevant rematerializable VOP3 instructions Differential Revision: https://reviews.llvm.org/D106110	2021-07-21 14:44:13 -07:00
Stanislav Mekhanoshin	9625ca5b60	[AMDGPU] Mark relevant rematerializable VOP2 instructions Differential Revision: https://reviews.llvm.org/D106023	2021-07-21 14:24:59 -07:00
Stanislav Mekhanoshin	4eb24817ec	[AMDGPU] Mark all relevant VOP1 instructions rematerializable Differential Revision: https://reviews.llvm.org/D105919	2021-07-21 14:05:32 -07:00
Stanislav Mekhanoshin	d01b34ed31	[AMDGPU] Move perfhint analysis This is SCC pass, moving it to the end of SCC PM saves one Function PM. This needs the analysis to take into account memory access width since it is now places after the load/store optimizer (D105651). Differential Revision: https://reviews.llvm.org/D105652	2021-07-21 13:06:49 -07:00
Stanislav Mekhanoshin	a397c1c82f	[AMDGPU] Tune perfhint analysis to account access width A function with less memory instructions but wider access is the same as a function with more but narrower accesses in terms of memory boundness. In fact the pass would give different answers before and after vectorization without this change. Differential Revision: https://reviews.llvm.org/D105651	2021-07-21 12:46:10 -07:00
Sebastian Neubauer	b642d01fa8	[AMDGPU] Improve killed check for vgpr optimization The killed flag is not always set. E.g. when a variable is used in a loop, it is never marked as killed, although it is unused in following basic blocks. Also, we try to deprecate kill flags and not use them. Check if the register is live in the endif block. If not, consider it killed in the then and else blocks. The vgpr-liverange tests have two new tests with loops (pre-committed, so the diff is visible). I also needed to change the subtarget to gfx10.1, otherwise calls are not working. Differential Revision: https://reviews.llvm.org/D106291	2021-07-21 15:24:59 +02:00
Jay Foad	3ed29f960c	[AMDGPU] NFC refactoring in isel for buffer access intrinsics Rename getBufferOffsetForMMO to updateBufferMMO and pass in the MMO to be updated, in preparation for the bug fix in D106284. Call updateBufferMMO consistently for all buffer intrinsics, even the ones that use setBufferOffsets to decompose a combined offset expression. Add a getIdxEn helper function. Differential Revision: https://reviews.llvm.org/D106354	2021-07-21 11:12:49 +01:00
Sebastian Neubauer	2b08f6af62	[AMDGPU] Improve register computation for indirect calls First, collect the register usage in each function, then apply the maximum register usage of all functions to functions with indirect calls. This is more accurate than guessing the maximum register usage without looking at the actual usage. As before, assume that indirect calls will hit a function in the current module. Differential Revision: https://reviews.llvm.org/D105839	2021-07-20 13:48:50 +02:00
Stanislav Mekhanoshin	9dc2636623	[AMDGPU] Disable LDS lowering for GFX shaders Apparently these need external LDS symbols to remain. Fixes: SC1-3279 Differential Revision: https://reviews.llvm.org/D106288	2021-07-20 02:55:25 -07:00
Amy Huang	fd972bb9fd	Revert "[llvm][sve] Lowering for VLS truncating stores" because it causes a seg fault (see https://reviews.llvm.org/D104471). This reverts commit c305557acdaad453e32309d575fe9c6c7090c099.	2021-07-19 11:03:33 -07:00
Jay Foad	96d8f2a1e0	[AMDGPU] Fix typo in comments idexen -> idxen	2021-07-19 13:39:30 +01:00
Nikita Popov	2c68ecccc9	[OpaquePtr] Remove uses of CreateGEP() without element type Remove uses of to-be-deprecated API. In cases where the correct element type was not immediately obvious to me, fall back to explicit getPointerElementType().	2021-07-17 22:56:27 +02:00
Nikita Popov	357756ecf6	[OpaquePtr] Remove uses of CreateConstGEP1_64() without element type Remove uses of to-be-deprecated API.	2021-07-17 16:43:20 +02:00
Carl Ritson	c7f2f81f5e	[AMDGPU] Tidy SReg/SGPR definitions using template class Use a multiclass to consistently define SReg/SGPR/TTMP register classes. Add missing TTMP registers for 96b, 160b, 192b, 224b. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D105800	2021-07-17 11:26:46 +09:00
Matt Arsenault	3ceb92295e	AMDGPU/GlobalISel: Preserve more memory types	2021-07-16 08:57:26 -04:00
Matt Arsenault	21a0ef8d19	AMDGPU/GlobalISel: Redo kernel argument load handling This avoids relying on G_EXTRACT on unusual types, and also properly decomposes structs into multiple registers. This also preserves the LLTs in the memory operands.	2021-07-16 08:56:54 -04:00
Dmitry Preobrazhensky	09c9f4dc7d	[AMDGPU][MC] Added missing isCall/isBranch flags Added isCall for S_CALL_B64; added isBranch for S_SUBVECTOR_LOOP_*. Differential Revision: https://reviews.llvm.org/D106072	2021-07-16 14:59:10 +03:00
Stanislav Mekhanoshin	c46d99e4ba	[AMDGPU] Refine -O0 and -O1 passes. Differential Revision: https://reviews.llvm.org/D105579	2021-07-15 09:51:54 -07:00
Sebastian Neubauer	afd895709d	[AMDGPU] Use isMetaInstruction for instruction size Meta instructions have a size of 0. Use isMetaInstruction instead of listing them explicitly. Differential Revision: https://reviews.llvm.org/D106043	2021-07-15 12:23:11 +02:00
Stanislav Mekhanoshin	76b7d3432e	[AMDGPU] Add TII::isIgnorableUse() to allow VOP rematerialization Any def of EXEC prevents rematerialization of any VOP instruction because of the physreg use. Create a callback to check if the physreg use can be ingored to allow rematerialization. Differential Revision: https://reviews.llvm.org/D105836	2021-07-14 13:03:58 -07:00
Matt Arsenault	47269da5d8	GlobalISel: Handle lowering non-power-of-2 extloads	2021-07-14 11:54:11 -04:00
Sebastian Neubauer	4359b870b1	[AMDGPU] Init scratch only if necessary If no scratch or flat instructions are used, we do not need to initialize the flat scratch hardware register. Differential Revision: https://reviews.llvm.org/D105920	2021-07-14 10:45:22 +02:00
Ruiling Song	d9b9fdd91b	[AMDGPU] Don't handle export done when unify exit nodes This patch aims to revert the changes introduced by D70781 D71192 D76364 D70781 was introduced to fix hardware hang where we do not insert exp- null-done for a kill inside infinit loop. At that time we have not added exp-null-done for kill early termination, but I believe as for now, we will always add the exp-null-done for early termination case in LaterBranchLowering. D71192 was introduced to handle the only_kill case, which is also been handled by the kill early termination work. D76364 was used to fix a regression by D71192, where we cleared the done bit of the export in the existing program and not let the normal return block branching to the new unified return block. With this change, we just trust frontends have setup exp-done correctly which is true for all existing frontends. The backend only inserts exp-null-done for the kill cases which is handled in SILateBranchLowering.cpp. Reviewed by: critson Differential Revision: https://reviews.llvm.org/D105610	2021-07-14 14:54:37 +08:00
Matt Arsenault	eebe841a47	RegAlloc: Allow targets to split register allocation AMDGPU normally spills SGPRs to VGPRs. Previously, since all register classes are handled at the same time, this was problematic. We don't know ahead of time how many registers will be needed to be reserved to handle the spilling. If no VGPRs were left for spilling, we would have to try to spill to memory. If the spilled SGPRs were required for exec mask manipulation, it is highly problematic because the lanes active at the point of spill are not necessarily the same as at the restore point. Avoid this problem by fully allocating SGPRs in a separate regalloc run from VGPRs. This way we know the exact number of VGPRs needed, and can reserve them for a second run. This fixes the most serious issues, but it is still possible using inline asm to make all VGPRs unavailable. Start erroring in the case where we ever would require memory for an SGPR spill. This is implemented by giving each regalloc pass a callback which reports if a register class should be handled or not. A few passes need some small changes to deal with leftover virtual registers. In the AMDGPU implementation, a new pass is introduced to take the place of PrologEpilogInserter for SGPR spills emitted during the first run. One disadvantage of this is currently StackSlotColoring is no longer used for SGPR spills. It would need to be run again, which will require more work. Error if the standard -regalloc option is used. Introduce new separate -sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be controlled individually. PBQB is not currently supported, so this also prevents using the unhandled allocator.	2021-07-13 18:49:29 -04:00
Matt Arsenault	fb44c3223e	AMDGPU: Promote signext/zeroext i16 shader returns This makes them consistent with all the other return convention handling. If we don't do this, we lose the sext/zext flag if treated as a full assignment, which complicates a future GlobalISel patch.	2021-07-13 11:04:51 -04:00
Hafiz Abid Qadeer	b205f2bb89	[AMDGPU] Handle s_branch to another section. Currently, if target of s_branch instruction is in another section, it will fail with the error of undefined label. Although in this case, the label is not undefined but present in another section. This patch tries to handle this issue. So while handling fixup_si_sopp_br fixup in getRelocType, if the target label is undefined we issue an error as before. If it is defined, a new relocation type R_AMDGPU_REL16 is returned. This issue has been reported in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100181 and https://bugs.llvm.org/show_bug.cgi?id=45887. Before https://reviews.llvm.org/D79943, we used to get an crash for this scenario. The crash is fixed now but the we still get an undefined label error. Jumps to other section can arise with hold/cold splitting. A patch to handle the relocation in lld will follow shortly. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D105760	2021-07-13 12:17:47 +01:00
Sebastian Neubauer	ad2c66ec5d	[AMDGPU] Optimize VGPR LiveRange in waterfall loops The loops are run exactly once per lane, so VGPRs do not need to be saved. Use the SIOptimizeVGPRLiveRange pass to add phi nodes that take undef when coming from the loop. There is still a shortcoming: Return values from a function call in the loop are copied because their live range conflicts with the live range of arguments, even if arguments are only IMPLICIT_DEF after the phi insertion. Differential Revision: https://reviews.llvm.org/D105192	2021-07-13 12:15:08 +02:00
Sebastian Neubauer	9d72c0ad43	[AMDGPU] Mark waterfall loops as SI_WATERFALL_LOOP This way, they can be detected later, e.g. by the SIOptimizeVGPRLiveRange pass. Differential Revision: https://reviews.llvm.org/D105467	2021-07-13 12:15:08 +02:00
Stanislav Mekhanoshin	d46d534dbb	[AMDGPU] Make some VOP1 instructions rematerializable This is a pilot change to verify the logic. The rest will be done in a same way, at least the rest of VOP1. Differential Revision: https://reviews.llvm.org/D105742	2021-07-12 23:43:45 -07:00
David Truby	c305557acd	[llvm][sve] Lowering for VLS truncating stores This adds custom lowering for truncating stores when operating on fixed length vectors in SVE. It also includes a DAG combine to fold extends followed by truncating stores into non-truncating stores in order to prevent this pattern appearing once truncating stores are supported. Currently truncating stores are not used in certain cases where the size of the vector is larger than the target vector width. Differential Revision: https://reviews.llvm.org/D104471	2021-07-12 11:14:17 +01:00
Stanislav Mekhanoshin	4a3b055653	[AMDGPU] Fix flags of V_MOV_B64_PSEUDO In particular it was not rematerializable. Differential Revision: https://reviews.llvm.org/D105724	2021-07-09 12:49:28 -07:00
David Green	38c9a4068d	[TTI] Remove IsPairwiseForm from getArithmeticReductionCost This patch removes the IsPairwiseForm flag from the Reduction Cost TTI hooks, along with some accompanying code for pattern matching reductions from trees starting at extract elements. IsPairWise is now assumed to be false, which was the predominant way that the value was used from both the Loop and SLP vectorizers. Since the adjustments such as D93860, the SLP vectorizer has not relied upon this distinction between paiwise and non-pairwise reductions. This also removes some code that was detecting reductions trees starting from extract elements inside the costmodel. This case was double-counting costs though, adding the individual costs on the individual instruction _and_ the total cost of the reduction. Removing it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to not double count. The cost of reduction intrinsics is still tested through the various tests in llvm/test/Analysis/CostModel/X86/reduce-xyz.ll. Differential Revision: https://reviews.llvm.org/D105484	2021-07-09 11:51:16 +01:00
Stanislav Mekhanoshin	e5b0fe1b83	[AMDGPU] Mark more SOP instructions as rematerializable The rest of the SOP instructions implicitly set SCC and not suitable for the rematerialization. Differential Revision: https://reviews.llvm.org/D105670	2021-07-08 16:00:45 -07:00
Nikita Popov	9e225a2a71	[AMDGPU] Simplify GEP construction (NFC) Noticed while making a related change. This code was doing something really peculiar: Creating an APInt by parsing a string. And then creating a SmallVector with one element to create the GEP. Instead create the APInt from integers and directly pass the single index to GetElementPtrInst::Create().	2021-07-08 21:21:43 +02:00
Nikita Popov	cfb94212d4	[AMDGPU] Pass explicit GEP type in printf transform (NFC) This code is working on an i8*. Avoid nullptr element type in preparation for removing support.	2021-07-08 21:21:43 +02:00
Matt Arsenault	9b057f647d	GlobalISel: Track original argument index in ArgInfo SelectionDAG's equivalents in ISD::InputArg/OutputArg track the original argument index. Mips relies on this, and its currently reinventing its own parallel CallLowering infrastructure which tracks these indexes on the side. Add this to help move towards deleting the custom mips handling.	2021-07-08 13:39:02 -04:00
Stanislav Mekhanoshin	74a5760d35	[AMDGPU] Set LoopInfo as preserved by SIAnnotateControlFlow The pass does not change loops, it just adds calls. Differential Revision: https://reviews.llvm.org/D105583	2021-07-08 09:34:43 -07:00
Michael Liao	cc92833f8a	[amdgpu] Remove the GlobalDCE pass prior to the internalization pass. - In [D98783](https://reviews.llvm.org/D98783), an extra GlobalDCE pass is inserted before the internalization pass to ensure a global variable without users could be internalized even if there are dead users. Instead of inserting a dedicated optimization pass, the dead user checking, i.e. 'use_empty()', should be preceeded with constant dead user removal to ensure an accurate result. Differential Revision: https://reviews.llvm.org/D105590	2021-07-08 10:25:58 -04:00

1 2 3 4 5 ...

6165 Commits