llvm-project

Author	SHA1	Message	Date
Mariusz Sikora	a018c8cdbb	GFX12: Add LoopDataPrefetchPass (#75625 ) It is currently disabled by default. It will need experiments on a real HW to tune and decide on the profitability. --------- Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-19 08:32:16 +01:00
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Valery Pykhtin	dd051295bc	[AMDGPU] Enable GCNRewritePartialRegUses pass by default. (#72975 ) Let's try once again after #69957 has landed.	2023-12-14 14:10:27 +01:00
Petar Avramovic	6892c175c5	AMDGPU/GlobalISel: add AMDGPUGlobalISelDivergenceLowering pass (#75340 ) Add empty AMDGPUGlobalISelDivergenceLowering pass. This pass will implement - selection of divergent i1 phis as lane mask phis, requires lane mask merging in some cases - lower uses of divergent i1 values outside of the cycle using lane mask merging - lowering of all cases of temporal divergence: - lower uses of uniform i1 values outside of the cycle using lane mask merging - lower uses of uniform non-i1 values outside of the cycle using a copy to vgpr inside of the cycle Add very detailed set of regression tests for cases mentioned above. patch 1 from: https://github.com/llvm/llvm-project/pull/73337	2023-12-13 16:42:56 +01:00
Piotr Sobczak	fac093dd08	[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 13:52:40 +01:00
Kazu Hirata	586ecdf205	[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-11 21:01:36 -08:00
Jay Foad	28233b11ac	[AMDGPU] New AMDGPUInsertSingleUseVDST pass (#72388 ) Add support for emitting GFX11.5 s_singleuse_vdst instructions. This is a power saving feature whereby the compiler can annotate VALU instructions whose results are known to have only a single use, so the hardware can in some cases avoid writing the result back to VGPR RAM. To begin with the pass is disabled by default because of one missing feature: we need an exclusion list of opcodes that never qualify as single-use producers and/or consumers. A future patch will implement this and enable the pass by default. --------- Co-authored-by: Scott Egerton <scott.egerton@amd.com>	2023-11-24 10:23:06 +00:00
Carl Ritson	af6ff98c53	[AMDGPU] Move WWM register pre-allocation to during regalloc (#70618 ) Move SIPreAllocateWWMRegs pass to just before VGPR allocation. This saves recomputation of the virtual matrix and live reg map, with the slight regression in O0 that live intervals and slot indexes must be computed.	2023-11-08 11:54:28 +09:00
Matt Arsenault	d34a10a47d	AMDGPU: Port AMDGPUAttributor to new pass manager (#71349 )	2023-11-07 15:40:40 +09:00
Valery Pykhtin	e808f8a616	[AMDGPU] GCNRegPressurePrinter pass to print GCNRegPressure values for testing. (#70031 ) Using GCNDownwardRPTracker or GCNUpwardRPTracker the pass collects register pressure values for a function and prints these values next to instructions. Output can be used to generate Filecheck rules in mir tests.	2023-11-01 23:01:39 +01:00
Alex Voicu	0ce6255a50	[HIP][LLVM][Opt] Add LLVM support for `hipstdpar` This patch adds the LLVM changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What we do here is add two passes, one mandatory and one optional: 1. HipStdParAcceleratorCodeSelectionPass is mandatory, depends on CallGraphAnalysis, and implements the following transform: - Traverse the call-graph, and check for functions that are roots for accelerator execution (at the moment, these are GPU kernels exclusively, and would originate in the accelerator specific algorithm library the toolchain uses as an implementation detail); - Starting from a root, do a BFS to find all functions that are reachable (called directly or indirectly via a call- chain) and record them; - After having done the above for all roots in the Module, we have the computed the set of reachable functions, which is the union of roots and functions reachable from roots; - All functions that are not in the reachable set are removed; for the special case where the reachable set is empty we completely clear the module; 2. HipStdParAllocationInterpositionPass is optional, is meant as a fallback with restricted functionality for cases where on-demand paging is unavailable on a platform, and implements the following transform: - Iterate all functions in a Module; - If a function's name is in a predefined set of allocation / deallocation that the runtime implementation is allowed and expected to interpose, replace all its uses with the equivalent accelerator aware function, iff the latter is available; - If the accelerator aware equivalent is unavailable we warn, but compilation will go ahead, which means that it is possible to get issues around the accelerator trying to access inaccessible memory at run time; - We rely on direct name matching as opposed to using the new alloc-kind family of attributes and / or the LibCall analysis pass because some of the legacy functions that need replacing would not carry the former or be identified by the latter. Reviewed by: JonChesterfield, yaxunl Differential Revision: https://reviews.llvm.org/D155856	2023-10-12 11:26:48 +01:00
Alex Voicu	25935c384d	Revert "[HIP][LLVM][Opt] Add LLVM support for `hipstdpar`" This reverts commit c5bba7ea5a05f540948f76a189c880eb24a5e8c6.	2023-10-11 12:27:03 +01:00
Alex Voicu	c5bba7ea5a	[HIP][LLVM][Opt] Add LLVM support for `hipstdpar` This patch adds the LLVM changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What we do here is add two passes, one mandatory and one optional: 1. HipStdParAcceleratorCodeSelectionPass is mandatory, depends on CallGraphAnalysis, and implements the following transform: - Traverse the call-graph, and check for functions that are roots for accelerator execution (at the moment, these are GPU kernels exclusively, and would originate in the accelerator specific algorithm library the toolchain uses as an implementation detail); - Starting from a root, do a BFS to find all functions that are reachable (called directly or indirectly via a call- chain) and record them; - After having done the above for all roots in the Module, we have the computed the set of reachable functions, which is the union of roots and functions reachable from roots; - All functions that are not in the reachable set are removed; for the special case where the reachable set is empty we completely clear the module; 2. HipStdParAllocationInterpositionPass is optional, is meant as a fallback with restricted functionality for cases where on-demand paging is unavailable on a platform, and implements the following transform: - Iterate all functions in a Module; - If a function's name is in a predefined set of allocation / deallocation that the runtime implementation is allowed and expected to interpose, replace all its uses with the equivalent accelerator aware function, iff the latter is available; - If the accelerator aware equivalent is unavailable we warn, but compilation will go ahead, which means that it is possible to get issues around the accelerator trying to access inaccessible memory at run time; - We rely on direct name matching as opposed to using the new alloc-kind family of attributes and / or the LibCall analysis pass because some of the legacy functions that need replacing would not carry the former or be identified by the latter. Reviewed by: JonChesterfield, yaxunl Differential Revision: https://reviews.llvm.org/D155856	2023-10-11 12:22:00 +01:00
Alex Voicu	98eda5dda7	Revert "[HIP][LLVM][Opt] Add LLVM support for `hipstdpar`" in order to address build breakage. This reverts commit 9b98ebb0eb43b005921926a622177f10e13b1ac6.	2023-10-10 12:16:10 +01:00
Alex Voicu	9b98ebb0eb	[HIP][LLVM][Opt] Add LLVM support for `hipstdpar` This patch adds the LLVM changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What we do here is add two passes, one mandatory and one optional: 1. HipStdParAcceleratorCodeSelectionPass is mandatory, depends on CallGraphAnalysis, and implements the following transform: - Traverse the call-graph, and check for functions that are roots for accelerator execution (at the moment, these are GPU kernels exclusively, and would originate in the accelerator specific algorithm library the toolchain uses as an implementation detail); - Starting from a root, do a BFS to find all functions that are reachable (called directly or indirectly via a call- chain) and record them; - After having done the above for all roots in the Module, we have the computed the set of reachable functions, which is the union of roots and functions reachable from roots; - All functions that are not in the reachable set are removed; for the special case where the reachable set is empty we completely clear the module; 2. HipStdParAllocationInterpositionPass is optional, is meant as a fallback with restricted functionality for cases where on-demand paging is unavailable on a platform, and implements the following transform: - Iterate all functions in a Module; - If a function's name is in a predefined set of allocation / deallocation that the runtime implementation is allowed and expected to interpose, replace all its uses with the equivalent accelerator aware function, iff the latter is available; - If the accelerator aware equivalent is unavailable we warn, but compilation will go ahead, which means that it is possible to get issues around the accelerator trying to access inaccessible memory at run time; - We rely on direct name matching as opposed to using the new alloc-kind family of attributes and / or the LibCall analysis pass because some of the legacy functions that need replacing would not carry the former or be identified by the latter. Reviewed by: JonChesterfield, yaxunl Differential Revision: https://reviews.llvm.org/D155856	2023-10-10 12:02:05 +01:00
Jeffrey Byrnes	6afceba510	[AMDGPU][IGLP] SingleWaveOpt: Cache DSW Counters from PreRA (#67759 ) Save the DSW counters from PreRA scheduling. While this avoids recalculation in the postRA pass, that isn't the main purpose. This is required because of physical register dependencies in PostRA scheduling -- they alter the DAG s.t. our counters may become incorrect -- which alters the layout of the pipeline. By preserving the values from PreRA, we can be sure that we accurately construct the pipeline. Additionally, remove a bad assert in SharesPredWithPrevNthGroup -- it is possible that we will have an empty cache if OtherGroup has no elements which have a V_PERM pred (possible if the V_PERM SG is empty).	2023-10-06 17:34:14 -07:00
Jay Foad	d85d143ad9	[AMDGPU] New image intrinsic optimizer pass (#67151 ) Implement a new pass to combine multiple image_load_2dmsaa and 2darraymsaa intrinsic calls into a single image_msaa_load if: - they refer to the same vaddr except for sample_id, - they use a constant sample_id and they fall into the same group, - they have the same dmask and the number of instructions and the number of vaddr/vdata dword transfers is reduced by the combine This should be valid on all GFX11 but a hardware bug renders it unworkable on GFX11.0.* so it is only enabled for GFX11.5. Based on a patch by Rodrigo Dominguez!	2023-09-26 09:33:49 +01:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Kazu Hirata	a9c7ba964f	[AMDGPU] Fix a warning This patch fixes: llvm/lib/Target/AMDGPU/AMDGPU.h:297:18: error: private field 'TM' is not used [-Werror,-Wunused-private-field]	2023-09-12 14:02:07 -07:00
jwanggit86	b853988e0d	[AMDGPU] Port AMDGPURewriteUndefForPHI to new pass manager (#66008 ) This patch ports the AMDGPURewriteUndefForPHI pass to the new pass manager. With this, the pass is supported under both the legacy and the new pass managers. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2023-09-12 13:32:02 -07:00
Matt Arsenault	f7dcabe502	AMDGPU: Pass in TargetMachine to AMDGPULowerModuleLDSPass https://reviews.llvm.org/D157660	2023-09-02 12:02:36 -04:00
Pravin Jagtap	6ef6c954c6	[AMDGPU] Reorder atomic optimizer to avoid CAS loop. Expand-Atomic pass emits the CAS loop for FP operations which limits the optimizations offered by atomic optimizer. Moving atomic optimizer before expand-atomics allows better codegen. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D157265	2023-08-30 12:05:21 -04:00
pvanhout	89e91e4c0c	[AMDGPU] Remove post-PromoteAlloca SROA run PromoteAlloca now uses SSAUpdater, it doesn't need SROA to clean-up after it anymore. Internal testing shows no noticeable performance impact. Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D156398	2023-08-11 08:29:21 +02:00
Matt Arsenault	58e87c961e	AMDGPU: Port AMDGPULowerKernelArguments to new pass manager https://reviews.llvm.org/D157498	2023-08-09 18:34:30 -04:00
Matt Arsenault	9a806551a0	AMDGPU: Delete old PM support for libcall passes This has no reason to run in the codegen pipeline.	2023-08-01 18:22:02 -04:00
Matt Arsenault	5dfdd3494b	AMDGPU: Don't try to fold wavefrontsize intrinsic in libcall simplify It's not a libcall so doesn't really belong here to begin with. Relying on checking the target name and explicit features isn't particularly sound either. The library doesn't use the intrinsic anymore, so it doesn't matter anyway.	2023-08-01 18:20:50 -04:00
Matt Arsenault	4d42e8b5d1	Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd. The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG.	2023-07-31 20:15:45 -04:00
Matt Arsenault	5b5bd81b71	AMDGPU: Move placement of RemoveIncompatibleFunctions This should be approximately first and run with other module passes. https://reviews.llvm.org/D155987	2023-07-31 19:22:04 -04:00
Vitaly Buka	a496c8be6e	Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" And dependent commits. Details in D150388. This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b. No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll Reviewed By: alexfh Differential Revision: https://reviews.llvm.org/D156381	2023-07-26 22:13:32 -07:00
Christudasan Devadasan	b4a62b1fa5	[AMDGPU] Enable whole wave register copy So far, we haven't exposed the allocation of whole-wave registers to regalloc. We hand-picked them for various whole wave mode operations. With a future patch, we want the allocator to efficiently allocate them rather than using the custom pre-allocation pass. Any liverange split of virtual registers involved in whole-wave operations require the resulting COPY introduced with the split to be performed for all lanes. It isn't implemented in the compiler yet. This patch would identify all such copies and manipulate the exec mask around them to enable all lanes without affecting the value of exec mask elsewhere. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D143762	2023-07-07 22:58:55 +05:30
Christudasan Devadasan	b78b36e1a2	[AMDGPU] Implement whole wave register spill To reduce the register pressure during allocation, when the allocator spills a virtual register that corresponds to a whole wave mode operation, the spill loads and restores should be activated for all lanes by temporarily flipping all bits in exec register to one just before the spills. It is not implemented in the compiler as of today and this patch enables the necessary support. This is a pre-patch before the SGPR spill to virtual VGPR lanes that would eventually causes the whole wave register spills during allocation. Reviewed By: arsenm, cdevadas Differential Revision: https://reviews.llvm.org/D143759	2023-07-07 22:51:45 +05:30
Ivan Kosarev	ee165cdb1b	[AMDGPU] Eliminate SIMCCodeEmitter and de-virtualise encoding methods. Simplifies some future changes needed for <https://github.com/llvm/llvm-project/issues/62629>. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154337	2023-07-05 10:13:33 +01:00
Brendon Cahoon	853b2a84cb	[AMDGPU] Reserve SGPR pair when long branches are present Branch relaxation requires 2 additional SGPRs for AMDGPU to handle the case when an indirect branch target is too far away. The register scavanger may not find available registers, which causes a “did not find scavenging index” assert to occur in assignRegToScavengingIndex. In this patch, we estimate before register allocation whether an indirect branch is likely to be needed, and reserve 2 SGPRs if the branch distance is found to be above a threshold. The distance threshold is an approximation as the exact code size and branch distance are unknown prior to register allocation. Patch by Corbin Robeck. Thanks! Differential Review: https://reviews.llvm.org/D149775	2023-06-29 16:50:46 -05:00
Matt Arsenault	d7d4aa539c	AMDGPU: Move AMDGPUAttributor run earlier Move it up with other module passes. It's a higher level optimization that should probably be done before hacking up the IR for codegen. It should really be done earlier than this. We could possibly move this with other IPO passes, but we'd have to stop inferring the lack of lds.kernel.id calls and have the LDS module pass mark functions which don't need the ID. The one test change is because that pass is relying on the backend run of SROA (which we ideally wouldn't have).	2023-06-28 12:42:40 -04:00
Pravin Jagtap	597fb7fb46	[AMDGPU] Switch to the new cl option amdgpu-atomic-optimizer-strategy. Atomic optimizer is turned on by default through D152649. This patch removes the usage of old command line option amdgpu-atomic-optimizations and transfer the responsibility to `amdgpu-atomic-optimizer-strategy`. We can safely remove old option when LLPC remove its all usage. Reviewed By: foad, arsenm, #amdgpu, cdevadas Differential Revision: https://reviews.llvm.org/D153007	2023-06-22 07:06:42 -04:00
Matt Arsenault	e777da468c	AMDGPU: Delete old AMDGPUPropagateAttributes pass The optimizing, non-broken features have all been moved to AMDGPUAttributor. The only remaining piece of functionality was the broken propagation of the wavesize features. This was fundamentally broken and a hack for device library linking. It doesn't matter when the device libraries are correctly linked and internalized. In case of linked-as-normal-bitcode (as comgr still does), we're reliant on the global subtarget anyway. If we can get away without forcing target-cpu, we should just as well be able to get away without propagating target-features.	2023-06-20 13:05:45 -04:00
Jay Foad	eb7491769a	[AMDGPU] Reimplement the GFX11 early release VGPRs optimization Implement this optimization in SIInsertWaitcnts, where we already have information about whether there might be outstanding VMEM store instructions. This has the following advantages: - Correctly handles atomics-with-return. - Correctly handles call instructions. - Should be faster because it does not require running a separate pass. Differential Revision: https://reviews.llvm.org/D153279	2023-06-19 17:12:54 +01:00
Pravin Jagtap	03d92501f3	[AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy. The D147408 implemented new Iterative approach for scan computations and added new flag `amdgpu-atomic-optimizer-strategy` which is defaulted to DPP. The changeset https://github.com/GPUOpen-Drivers/llpc/pull/2506 adapts to the new changes in LLPC. This patch enables atomic optimizer pass and selects Iterative approach for scan computations by default for compute pipeline. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D152649	2023-06-15 01:18:38 -04:00
Matt Arsenault	5b657f50b8	AMDGPU: Move LICM after AMDGPUCodeGenPrepare The commit that added the run says it's to hoist uniform parts of integer division expansion. That expansion is performed later, so this didn't do anything in that case. Move this later so the original test shows the improvement. This also saves a run of "Canonicalize natural loops". Not sure why this appears to be still getting a separate loop PM run. Also feels a bit heavy to run this just for divide. Is there a way to specifically hoist the divide sequence when it expands?	2023-06-10 07:37:32 -04:00
Matt Arsenault	3c848194f2	CodeGen: Expand memory intrinsics in PreISelIntrinsicLowering Expand large or unknown size memory intrinsics into loops in the default lowering pipeline if the target doesn't have the corresponding libfunc. Previously AMDGPU had a custom pass which existed to call the expansion utilities. With a default no-libcall option, we can remove the libfunc checks in LoopIdiomRecognize for these, which never made any sense. This also provides a path to lifting the immarg restriction on llvm.memcpy.inline. There seems to be a bug where TLI reports functions as available if you use -march and not -mtriple.	2023-06-09 21:04:37 -04:00
Pravin Jagtap	f6c8a8e9cb	[AMDGPU] Iterative scan implementation for atomic optimizer. This patch provides an alternative implementation to DPP for Scan Computations. An alternative implementation iterates over all active lanes of Wavefront using llvm.cttz and performs the following steps: 1. Read the value that needs to be atomically incremented using llvm.amdgcn.readlane intrinsic 2. Accumulate the result. 3. Update the scan result using llvm.amdgcn.writelane intrinsic if intermediate scan results are needed later in the kernel. Reviewed By: arsenm, cdevadas Differential Revision: https://reviews.llvm.org/D147408	2023-06-09 01:08:44 -04:00
Matt Arsenault	846a360e16	AMDGPU: Don't run AMDGPUAttributor with -O0	2023-06-08 07:52:37 -04:00
Valery Pykhtin	342acfc9bb	[AMDGPU] Turn off pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size. There is a failure with this pass in the case when target register class for a subregister isn't known from instruction description (for ex. COPY). Currently in this situation the RC is obtained using TargetRegisterInfo::getSubRegisterClass but in general it's not working. In order to fix this two things should be done: 1. Stop processing a subregister if the target register class is unknown (conservative approach) 2. Improve deduction of subregister' target register class (i.e by processing COPY chain) I was going to implement point 1 but my tests use implicit operands for S_NOP and they don't have associated target register class and all tests fail. Therefore I decided to turn off the pass now, implement point 1 and fix my tests. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D152291	2023-06-07 12:05:25 +02:00
Valery Pykhtin	8d0412ce9d	[AMDGPU] Add pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size. The main purpose of this is to simplify register pressure tracking as after the pass there is no need to track subreg liveness anymore. On the other hand this pass creates more possibilites for the subreg unaware code, as many of the subregs becomes ordinary registers. Intersting sideeffect: spill-vgpr.ll has lost a lot of spills. Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D139732	2023-05-26 09:05:44 +02:00
Anshil Gandhi	a22ef958cb	[AMDGPUCodegenPrepare] Add NewPM Support Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D151241	2023-05-26 00:20:01 -06:00
Joseph Huber	4a1236e0f6	[AMDGPU] Add an option to disable manual ctor / dtor lowering Currently AMDGPU offers extra ctor / dtor lowering by emitting a kernel that can be called. It's possible to handle ctors and dtors using the standard method as shown in D149340's commit message. In which case we on't need these extra kernels as they won't be called. This patch simply adds a way to conditionally turn off this handling if we do not want to get extra kernels in the output. Unrelated, but we could convert this handling to an ODR function that simply calls the code in D149340 constructed via LLVM-IR. That would handle priority correctly and would then be correct if not run in LTO mode. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D150565	2023-05-23 09:03:10 -05:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
Pravin Jagtap	21a69bdb66	[NewPM][AMDGPU] Port amdgpu-atomic-optimizer Reviewed By: arsenm, sameerds, gandhi21299 Differential Revision: https://reviews.llvm.org/D148628	2023-04-20 00:27:47 -04:00

1 2 3 4 5 ...

485 Commits