llvm-project

Author	SHA1	Message	Date
Jay Foad	6b5067a81a	[AMDGPU] Don't assert that image intrinsics are supported Unsupported intrinsics should give a regular "cannot select" error. Differential Revision: https://reviews.llvm.org/D148147	2023-04-16 19:54:55 +01:00
Jay Foad	cf736e2325	[AMDGPU] Avoid else-after-return in isLegalAddressingMode. NFC.	2023-04-13 16:55:58 +01:00
Diana Picus	d9bf8aba23	[AMDGPU] Add MMOs for GFX11 Streamout Instructions The GFX11 NGG Streamout Instructions perform atomic operations on dedicated registers. At the moment, they lack machine memory operands, which causes the si-memory-legalizer pass to treat them conservatively and introduce several unnecessary waits and cache invalidations. This patch introduces a new address space to represent these special registers and teaches instruction selection to add memory operands with this new address space to DS_ADD/SUB_GS_REG_RTN. Since this address space is meant to be compiler-internal, we move it up a bit from the other address spaces and give it the number 128. According to the LLVM Language Reference, address space numbers can go all the way up to 2^24, but I'm not sure how well this is supported in practice [1], so using a smaller number seems safer. [1] `0107513fe7/llvm/utils/TableGen/IntrinsicEmitter.cpp (L401)` Differential Revision: https://reviews.llvm.org/D146031	2023-04-11 11:11:32 +02:00
Kazu Hirata	53ead5215b	[Target] Use isNullConstant and isOneConstant (NFC)	2023-04-10 18:23:07 -07:00
Jessica Del	04317d4da7	[AMDGPU][GISel] Add inverse ballot intrinsic The inverse ballot intrinsic takes in a boolean mask for all lanes and returns the boolean for the current lane. See SPIR-V's `subgroupInverseBallot()` in the [[ https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt \| GL_KHR_shader_subgroup extension ]]. This allows decision making via branch and select instructions with a manually manipulated mask. Implemented in GlobalISel and SelectionDAG, since currently both are supported. The SelectionDAG required pseudo instructions to use the custom inserter. The boolean mask needs to be uniform for all lanes. Therefore we expect SGPR input. In case the source is in a VGPR, we insert one or more `v_readfirstlane` instructions. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D146287	2023-04-06 07:46:50 +02:00
Jon Chesterfield	0507448d82	[amdgpu] Implement dynamic LDS accesses from non-kernel functions The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so. 1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel 2/ Then it creates a new `extern char[0]` variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables 3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen 4/ The assembler builds the lookup table using the metadata 5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144233	2023-04-04 20:06:34 +01:00
Craig Topper	219ff07f72	[Targets] Rename Flag->Glue. NFC Long long ago Glue was called Flag, and it was never completely renamed.	2023-04-02 19:28:51 -07:00
Simon Pilgrim	8153b92d9b	[DAG] Add SelectionDAG::SplitScalar helper Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source. Differential Revision: https://reviews.llvm.org/D147264	2023-03-31 18:35:40 +01:00
Kazu Hirata	62c38ffb15	[Target] Use isNullConstant (NFC)	2023-03-26 23:34:13 -07:00
pvanhout	1f1fea6c38	Reland: [DAG/AMDGPU] Use UniformityAnalysis in DAGISel Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis. No explosions seen during internal testing so this looks like a smooth transition. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D145918	2023-03-14 14:38:45 +01:00
pvanhout	0e79106fc9	Revert "[DAG/AMDGPU] Use UniformityAnalysis in DAGISel" This reverts commit 0022b5803fd4f5a4e9fcf233267c0ffa1b88f763.	2023-03-14 11:48:58 +01:00
pvanhout	0022b5803f	[DAG/AMDGPU] Use UniformityAnalysis in DAGISel Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis. No explosions seen during internal testing so this looks like a smooth transition. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D145918	2023-03-14 11:18:28 +01:00
Stanislav Mekhanoshin	e7ec123c6a	[AMDGPU] Implement idempotent atomic lowering This turns an idempotent atomic operation into an atomic load. Fixes: SWDEV-385135 Differential Revision: https://reviews.llvm.org/D144759	2023-03-08 14:09:59 -08:00
Christudasan Devadasan	2171f04c12	[AMDGPU] Extend WorkGroupID* codegen for compute shaders Currently, the codegen support for llvm.amdgcn.workgroup.id* intrinsics are enabled only for compute kernels. In addition, this patch enables their selection for compute shaders on subtargets that have architected SGPRs. Differential Revision: https://reviews.llvm.org/D145045	2023-03-08 07:36:19 +05:30
pvanhout	edca49cfb7	[AMDGPU] Match med3 for (max (min ..)) We previously only matched (min (max ...)) Depends on D144728 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145159	2023-03-07 11:14:31 +01:00
Matt Arsenault	9f4746b65f	AMDGPU: Combine down fcopysign f64 magnitude Copy through the low bits and only apply an f32 copysign to the high half. This is effectively what we do for codegen anyway, but this provides some combine benefits. The cases involving constants show some small improvements. https://reviews.llvm.org/D142682	2023-03-06 05:54:25 -04:00
Matt Arsenault	606a62ce27	AMDGPU: Force sign operand of f64 fcopysign to f32 The fcopysign DAG operation, unlike the IR one, allows different types for the sign and magnitude. We can reduce the bitwidth of the high operand since only the sign bit matters. The default combine only introduces mixed fcopysign operand types from fpext/fptrunc. We effectively do this already during selection, but doing it earlier in the combiner should expose new combine opportunities (e.g. the existing tests now eliminate the load of the low half of the double). Unfortunately this isn't enough to handle the case I'm interested in just yet.	2023-03-05 19:54:13 -04:00
Matt Arsenault	bd1f7c417f	AMDGPU: Try to push fneg as integer into select I initially attempted to select the source modifier from xor of a sign mask. This proved to be more difficult since foldBinOpIntoSelect does not consider free fneg of integers and undoes the combine.	2023-03-05 18:53:16 -04:00
Jeffrey Byrnes	b89236a96f	[AMDGPU] Vectorize misaligned global loads & stores Based on experimentation on gfx906,908,90a and 1030, wider global loads / stores are more performant than multiple narrower ones independent of alignment -- this is especially true when combining 8 bit loads / stores, in which case speedup was usually 2x across all alignments. Differential Revision: https://reviews.llvm.org/D145170 Change-Id: I6ee6c76e6ace7fc373cc1b2aac3818fc1425a0c1	2023-03-03 13:18:25 -08:00
Mirko Brkusanin	926746d22a	[AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions If more registers are needed for VAddr then the NSA format allows then the final register can act as a contigous set of remaining addresses. Update legalizer to pack register for this new format and allow instruction selection to use NSA encoding when number of addresses exceeds max size. Also update SIShrinkInstructions to handle partial NSA. Differential Revision: https://reviews.llvm.org/D144034	2023-02-23 13:33:34 +01:00
Piotr Sobczak	a3d7b3121c	[AMDGPU][NFC] Add getMaxMUBUFImmOffset Replace magic constant 4095 with the function getMaxMUBUFImmOffset(). Differential Revision: https://reviews.llvm.org/D144623	2023-02-23 11:29:59 +01:00
Jay Foad	c9f4df57ca	[AMDGPU] Move splitMUBUFOffset from AMDGPUBaseInfo to SIInstrInfo Moving this out of AMDGPUBaseInfo enforces that AMDGPUBaseInfo should not be calling into GCNSubtarget. Differential Revision: https://reviews.llvm.org/D144564	2023-02-22 16:19:05 +00:00
Kazu Hirata	f8f3db2756	Use APInt::count{l,r}_{zero,one} (NFC)	2023-02-19 22:04:47 -08:00
Matt Arsenault	09dd4d870e	DAG: Remove hasBitPreservingFPLogic This doesn't make sense as an option. fneg and fabs are bit preserving by definition. If a target has some fneg or fabs instruction that are not bitpreserving it's incorrect to lower fneg/fabs to use it.	2023-02-14 10:25:24 -04:00
Kazu Hirata	64dad4ba9a	Use llvm::bit_cast (NFC)	2023-02-14 01:22:12 -08:00
Christudasan Devadasan	1c9e6238fe	[AMDGPU] Allow architected SGPRs for workgroup IDs Some subtargets use architected SGPRs for workgroup IDs instead of the regular SGPRs. This patch enables the support for the same and is guarded under the subtarget feature FeatureArchitectedSGPRs. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D143707	2023-02-13 22:11:35 +05:30
Changpeng Fang	7ca3444fba	AMDGPU: Use module flag to get code object version at IR level folow-up Summary: This is part of the leftover work for https://reviews.llvm.org/D143138. In this work, we pass code object version as an argument to initialize target ID and use it for targetID dump. Reviewers: arsenm Differential Revision https://reviews.llvm.org/D143293	2023-02-10 11:16:38 -08:00
Stanislav Mekhanoshin	94def1b44e	[AMDGPU] Do not exapnd fp atomics on gfx940 FP atomics are safe on gfx940. This fixes regression after D131560. Fixes: SWDEV-380468 Differential Revision: https://reviews.llvm.org/D143603	2023-02-08 13:22:04 -08:00
Matt Arsenault	6ce86a7eff	AMDGPU: Ensure flat loads are broken into dword in functions We were assuming we could rely on the flat scratch init detection to imply if there are possible flat addressed stack objects, which doesn't work outside of a kernel. We should have a way to prove if a given flat access can't access the stack. We could use a not-stack parameter attribute to avoid these splits. Make the minimally correct change for GlobalISel; I'll address this better in my larger patch to rewrite load and store legalization. Fixes: SWDEV-218237	2023-02-05 05:25:15 -04:00
Changpeng Fang	54cf69c9d5	AMDGPU: Use module flag to get code object version at IR level Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler. For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later. For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D14313	2023-02-02 18:57:26 -08:00
Kazu Hirata	e078201835	[Target] Use llvm::count{l,r}_{zero,one} (NFC)	2023-01-28 09:23:07 -08:00
Matt Arsenault	93ec3fa402	AMDGPU: Support atomicrmw uinc_wrap/udec_wrap For now keep the exising intrinsics working.	2023-01-27 22:17:16 -04:00
Nicolai Hähnle	10cef708a7	AMDGPU: Clean up LDS-related occupancy calculations Occupancy is expressed as waves per SIMD. This means that we need to take into account the number of SIMDs per "CU" or, to be more precise, the number of SIMDs over which a workgroup may be distributed. getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs into account at all. At the same time, we need to take into account that WGP mode offers access to a larger total amount of LDS, since this can affect how non-power-of-two LDS allocations are rounded. To make this work consistently, we distinguish between (available) local memory size and addressable local memory size (which is always limited by 64kB on gfx10+, even with WGP mode). This change results in a massive amount of test churn. A lot of it is caused by the fact that the default work group size is 1024, which means that (due to rounding effects) the default occupancy on older hardware is 8 instead of 10, which affects scheduling via register pressure estimates. I've adjusted most tests by just running the UTC tools, but in some cases I manually changed the work group size to 32 or 64 to make sure that work group size chunkiness has no effect. Differential Revision: https://reviews.llvm.org/D139468	2023-01-23 21:43:06 +01:00
Jay Foad	768aed1378	[MC] Make more use of MCInstrDesc::operands. NFC. Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A future patch will remove opInfo_begin and opInfo_end. Also use it instead of raw access to the OpInfo pointer. A future patch will remove this pointer. Differential Revision: https://reviews.llvm.org/D142213	2023-01-23 11:31:41 +00:00
Kazu Hirata	caa99a01f5	Use llvm::popcount instead of llvm::countPopulation(NFC)	2023-01-22 12:48:51 -08:00
Kazu Hirata	188ec33726	[llvm] Use llvm::bit_width (NFC)	2023-01-21 14:48:32 -08:00
Kazu Hirata	83d56fb17a	Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC) This patch drops the ZeroBehavior parameter from bit counting functions like countLeadingZeros. ZeroBehavior specifies the behavior when the input to count{Leading,Trailing}Zeros is zero and when the input to count{Leading,Trailing}Ones is all ones. ZeroBehavior was first introduced on May 24, 2013 in commit eb91eac9fb866ab1243366d2e238b9961895612d. While that patch did not state the intention, I would guess ZeroBehavior was for performance reasons. The x86 machines around that time required a conditional branch to implement countLeadingZero<uint32_t> that returns the 32 on zero: test edi, edi je .LBB0_2 bsr eax, edi xor eax, 31 .LBB1_2: mov eax, 32 That is, we can remove the conditional branch if we don't care about the behavior on zero. IIUC, Intel's Haswell architecture, launched on June 4, 2013, introduced several bit manipulation instructions, including lzcnt and tzcnt, which eliminated the need for the conditional branch. I think it's time to retire ZeroBehavior as its utility is very limited. If you care about compilation speed, you should build LLVM with an appropriate -march= to take advantage of lzcnt and tzcnt. Even if not, modern host compilers should be able to optimize away quite a few conditional branches because the input is often known to be nonzero from dominating conditional branches. Differential Revision: https://reviews.llvm.org/D141798	2023-01-18 19:58:44 -08:00
Juan Manuel MARTINEZ CAAMAÑO	543db09b97	[CodeGen][AMDGPU] EXTRACT_VECTOR_ELT: input vector element type can differ from output type In function SITargetLowering::performExtractVectorElt, the output type was not considered which could lead to type mismatches later. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139943	2023-01-06 09:46:02 +01:00
serge-sans-paille	38818b60c5	Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part Use deduction guides instead of helper functions. The only non-automatic changes have been: 1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t), (uint8_t)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that). Per reviewers' comment, some useless makeArrayRef have been removed in the process. This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides. Differential Revision: https://reviews.llvm.org/D140955	2023-01-05 14:11:08 +01:00
Christudasan Devadasan	a3028239a7	Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs" This reverts commit 40ba0942e2ab1107f83aa5a0ee5ae2980bf47b1a.	2022-12-21 16:17:42 +05:30
Carl Ritson	5bc703f755	[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass Accelerate finding the base class for a physical register by building a statically mapping table from physical registers to base classes using TableGen. Replace uses of SIRegisterInfo::getPhysRegClass with TargetRegisterInfo::getPhysRegBaseClass in order to use the computed table. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D139422	2022-12-20 16:22:14 +09:00
Ivan Kosarev	85dada81e3	[AMDGPU][CodeGen] Support raw format TFE buffer loads other than byte, short and d16 ones. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D138215	2022-12-19 11:39:08 +00:00
Matt Arsenault	9d6003c764	AMDGPU: Lower addrspacecast on gfx6 Fixes inconsistent handling of constant-32bit case. Turns out we can lower all the casts just fine, it's just accessing the flat results that's a problem.	2022-12-18 08:02:45 -05:00
Christudasan Devadasan	40ba0942e2	[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time. This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs. Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which isn an unproblematic case. This patch also implements the whole wave spills which might occur if RA spills any live range of virtual registers involved in the whole wave operations. Earlier, we had been hand-picking registers for such machine operands. But now with SGPR spills into virtual VGPR lanes, we are exposing them to the allocator. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D124196	2022-12-17 11:56:32 +05:30
Fangrui Song	21c4dc7997	std::optional::value => operator*/operator-> value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). This fixes clang.	2022-12-17 00:42:05 +00:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Pierre van Houtryve	678d8946ba	[AMDGPU] Add bf16 storage support - [Clang] Declare AMDGPU target as supporting BF16 for storage-only purposes on amdgcn - Add Sema & CodeGen tests cases. - Also add cases that D138651 would have covered as this patch replaces it. - [AMDGPU] Add BF16 storage-only support - Support legalization/dealing with bf16 operations in DAGIsel. - bf16 as a type remains illegal and is represented as i16 for storage purposes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139398	2022-12-13 10:34:26 -05:00
Fangrui Song	67819a72c6	[CodeGen] llvm::Optional => std::optional	2022-12-13 09:06:36 +00:00
Doru Bercea	aea5980e26	Emit CAS loop for min/max atomics.	2022-12-12 11:42:30 -06:00
Nico Weber	a862d09a92	Revert "[amdgpu] Reimplement LDS lowering" This reverts commit 982017240d7f25a8a6969b8b73dc51f9ac5b93ed. Breaks check-llvm, see https://reviews.llvm.org/D139433#3974862	2022-12-06 12:01:36 -05:00

1 2 3 4 5 ...

1148 Commits