llvm-project

Author	SHA1	Message	Date
Akshat Oke	ca32bd643b	[NewPM][AMDGPU] Port SIPreAllocateWWMRegs to NPM (#109939 )	2024-10-22 15:37:08 +05:30
Akshat Oke	6360652e9f	Reland [AMDGPU] Serialize WWM_REG vreg flag (#110229 ) (#112492 ) A reland but not an exact copy as `VRegInfo.Flags` from the parser is now an int8 instead of a vector; so only need to copy over the value.	2024-10-21 13:44:09 +05:30
Christudasan Devadasan	72a7b471de	[AMDGPU][NewPM] Fill out addILPOpts. (#108514 )	2024-10-16 13:30:46 +05:30
Christudasan Devadasan	488d3924dd	[CodeGen][NewPM] Port EarlyIfConversion pass to NPM. (#108508 )	2024-10-16 13:22:57 +05:30
Peter Collingbourne	3cab8827fd	Revert "[AMDGPU] Serialize WWM_REG vreg flag (#110229 )" This reverts commit bec839d8eed9dd13fa7eaffd50b28f8f913de2e2. Caused buildbot failures, e.g. https://lab.llvm.org/buildbot/#/builders/52/builds/2928	2024-10-15 13:18:43 -07:00
Akshat Oke	bec839d8ee	[AMDGPU] Serialize WWM_REG vreg flag (#110229 )	2024-10-14 14:37:21 +05:30
Akshat Oke	039e6f879c	[AMDGPU][NewPM] Fill out AMDGPU addMachineSSAOptimizations (#111658 ) Implement the addMachineSSAOptimizations passes for AMDGPU. Porting the other generic passes in this category is WIP.	2024-10-10 15:35:11 +05:30
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
vikashgu	870bdc6ea7	Reapply "[AMDGPU]Optimize SGPR spills (#93668 )" This reverts commit c2fc7f75f67039bb1ed577bc0edbd699a850cd9d. As the dependent patch about split vgpr regalloc pipeline solved the issue(#96353).	2024-10-03 09:47:15 +00:00
Christudasan Devadasan	ac0f64f06d	[AMDGPU] Split vgpr regalloc pipeline (#93526 ) Allocating wwm-registers and per-thread VGPR operands together imposes many challenges in the way the registers are reused during allocation. There are times when regalloc reuses the registers of regular VGPRs operations for wwm-operations in a small range leading to unwantedly clobbering their inactive lanes causing correctness issues that are hard to trace. This patch splits the VGPR allocation pipeline further to allocate wwm-registers first and the regular VGPR operands in a separate pipeline. The splitting would ensure that the physical registers used for wwm allocations won't take part in the next allocation pipeline to avoid any such clobbering.	2024-09-30 19:55:42 +05:30
Matt Arsenault	a87640c97e	AMDGPU: Fix assertion on load of vector of pointers (#110436 ) Fix InferAddressSpaces asserting on a load of a vector of flat pointers. Fixes #110433	2024-09-30 10:16:38 +04:00
Scott Egerton	396f677514	[AMDGPU] Remove unused VGPRSingleUseHintInsts feature (#109769 )	2024-09-24 10:58:00 +01:00
Akshat Oke	0b0874755d	[AMDGPU][NewPM] Port SILowerSGPRSpills to NPM (#108934 )	2024-09-21 09:59:36 +05:30
Akshat Oke	d2d78e584b	[NewPM][CodeGen] Port MachineLICM to NPM (#107376 )	2024-09-20 11:34:18 +05:30
Jay Foad	e03f427196	[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133 ) It is almost always simpler to use {} instead of std::nullopt to initialize an empty ArrayRef. This patch changes all occurrences I could find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor could be deprecated or removed.	2024-09-19 16:16:38 +01:00
Diana Picus	3356208531	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512 ) This reverts commit `7792b4ae79`. The problem was a conflict with `e55d6f5ea2` "[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (https://github.com/llvm/llvm-project/pull/107889)" which changed the syntax of V_SET_INACTIVE (and thus made my MIR test crash). ...if only we had a merge queue.	2024-09-13 11:54:30 +02:00
Diana Picus	7792b4ae79	Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )"" (#108341 ) Reverts llvm/llvm-project#108173 si-init-whole-wave.mir crashes on some buildbots (although it passed both locally with sanitizers enabled and in pre-merge tests). Investigating.	2024-09-12 10:12:09 +02:00
Diana Picus	703ebca869	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )" (#108173 ) This reverts commit `c7a7767fca`. The buildbots failed because I removed a MI from its parent before updating LIS. This PR should fix that.	2024-09-12 09:11:41 +02:00
Akshat Oke	e1ee07d0ff	[AMDGPU][NewPM] Port SIPeepholeSDWA pass to NPM (#107049 )	2024-09-11 14:30:16 +04:00
Vitaly Buka	c7a7767fca	Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 ) Breaks bots, see #105822. Reverts llvm/llvm-project#105822	2024-09-10 09:51:43 -07:00
Diana Picus	44556e64f2	[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822 ) This intrinsic is meant to be used in functions that have a "tail" that needs to be run with all the lanes enabled. The "tail" may contain complex control flow that makes it unsuitable for the use of the existing WWM intrinsics. Instead, we will pretend that the function starts with all the lanes enabled, then branches into the actual body of the function for the lanes that were meant to run it, and then finally all the lanes will rejoin and run the tail. As such, the intrinsic will return the EXEC mask for the body of the function, and is meant to be used only as part of a very limited pattern (for now only in amdgpu_cs_chain functions): ``` entry: %func_exec = call i1 @llvm.amdgcn.init.whole.wave() br i1 %func_exec, label %func, label %tail func: ; ... stuff that should run with the actual EXEC mask br label %tail tail: ; ... stuff that runs with all the lanes enabled; ; can contain more than one basic block ``` It's an error to use the result of this intrinsic for anything other than a branch (but unfortunately checking that in the verifier is non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if between the intrinsic and the branch). The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now is expanded in si-wqm (which is where SI_INIT_EXEC is handled too); however the information that the function was conceptually started in whole wave mode is stored in the machine function info (hasInitWholeWave). This will be useful in prolog epilog insertion, where we can skip saving the inactive lanes for CSRs (since if the function started with all the lanes active, then there are no inactive lanes to preserve).	2024-09-10 13:24:53 +02:00
Christudasan Devadasan	6c143a86cd	[CodeGen][NewPM] Port MachineCSE pass to new pass manager. (#106605 )	2024-09-04 18:54:07 +05:30
Christudasan Devadasan	042104985c	[AMDGPU][NewPM] Port SIShrinkInstructions to new pass manager. (#106967 )	2024-09-03 10:52:50 +05:30
Akshat Oke	da13754103	AMDGPU/NewPM Port SILoadStoreOptimizer to NPM (#106362 )	2024-09-02 11:41:56 +05:30
Shilei Tian	84ed3c29e8	Revert "[AMDGPU][LTO] Assume closed world after linking (#105845 )" (#106889 ) We can't assume closed world even in full LTO post-link stage. It is only true if we are building a "GPU executable". However, AMDGPU does support "dyamic library". I'm not aware of any approach to tell if it is relocatable link when we create the pass. For now let's revert the patch as it is currently breaking things. We can re-enable it once we can handle it correctly.	2024-09-01 09:32:08 -04:00
Akshat Oke	fdca2c33a1	AMDGPU/NewPM Port GCNDPPCombine to NPM (#105816 ) Co-authored-by: Akshat Oke <Akshat.Oke@amd.com>	2024-08-29 14:49:52 +05:30
Akshat Oke	2adc94cd6c	AMDGPU/NewPM: Port SIFoldOperands to new pass manager (#105801 )	2024-08-29 11:34:54 +05:30
Chaitanya	1f02be2e17	[AMDGPU] Enable "amdgpu-sw-lower-lds" pass in pipeline. (#89206 ) This PR enables "amdgpu-sw-lower-lds" pass in the pipeline. Also introduces "amdgpu-enable-sw-lower-lds" cmd line flag to enbale/disable the pass.	2024-08-26 14:21:19 +05:30
Chaitanya	7bc9d95b7e	[AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses. (#87265 ) This PR introduces new pass "amdgpu-sw-lower-lds". This pass lowers the local data store, LDS, uses in kernel and non-kernel functions in module to use dynamically allocated global memory. Packed LDS Layout is emulated in the global memory. The lowered memory instructions from LDS to global memory are then instrumented for address sanitizer, to catch addressing errors. This pass only work when address sanitizer has been enabled and has instrumented the IR. It identifies that IR has been instrumented using "nosanitize_address" module flag. For a kernel, LDS access can be static or dynamic which are direct (accessed within kernel) and indirect (accessed through non-kernels). Replacement of Kernel LDS accesses: - All the LDS accesses corresponding to kernel will be packed together, where all static LDS accesses will be allocated first and then dynamic LDS follows. The total size with alignment is calculated. A new LDS global will be created for the kernel called "SW LDS" and it will have the attribute "amdgpu-lds-size" attached with value of the size calculated. All the LDS accesses in the module will be replaced by GEP with offset into the "Sw LDS". - A new "llvm.amdgcn.<kernel>.dynlds" is created per kernel accessing the dynamic LDS. This will be marked used by kernel and will have MD_absolue_symbol metadata set to total static LDS size, Since dynamic LDS allocation starts after all static LDS allocation. - A device global memory equal to the total LDS size will be allocated. At the prologue of the kernel, a single work-item from the work-group, does a "malloc" and stores the pointer of the allocation in "SW LDS". To store the offsets corresponding to all LDS accesses, another global variable is created which will be called "SW LDS metadata" in this pass. - SW LDS: It is LDS global of ptr type with name "llvm.amdgcn.sw.lds.<kernel-name>". - SW LDS Metadata: It is of struct type, with n members. n equals the number of LDS globals accessed by the kernel(direct and indirect). Each member of struct is another struct of type {i32, i32, i32}. First member corresponds to offset, second member corresponds to size of LDS global being replaced and third represents the total aligned size. It will have name "llvm.amdgcn.sw.lds.<kernel-name>.md". This global will have an intializer with static LDS related offsets and sizes initialized. But for dynamic LDS related entries, offsets will be intialized to previous static LDS allocation end offset. Sizes for them will be zero initially. These dynamic LDS offset and size values will be updated with in the kernel, since kernel can read the dynamic LDS size allocation done at runtime with query to "hidden_dynamic_lds_size" hidden kernel argument. - At the epilogue of kernel, allocated memory would be made free by the same single work-item. Replacement of non-kernel LDS accesses: - Multiple kernels can access the same non-kernel function. All the kernels accessing LDS through non-kernels are sorted and assigned a kernel-id. All the LDS globals accessed by non-kernels are sorted. - This information is used to build two tables: - Base table: Base table will have single row, with elements of the row placed as per kernel ID. Each element in the row corresponds to ptr of "SW LDS" variable created for that kernel. - Offset table: Offset table will have multiple rows and columns. Rows are assumed to be from 0 to (n-1). n is total number of kernels accessing the LDS through non-kernels. Each row will have m elements. m is the total number of unique LDS globals accessed by all non-kernels. Each element in the row correspond to the ptr of the replacement of LDS global done by that particular kernel. - A LDS variable in non-kernel will be replaced based on the information from base and offset tables. Based on kernel-id query, ptr of "SW LDS" for that corresponding kernel is obtained from base table. The Offset into the base "SW LDS" is obtained from corresponding element in offset table. With this information, replacement value is obtained.	2024-08-26 08:59:26 +05:30
Anshil Gandhi	033e225d90	Revert "Revert "[AMDGPU][LTO] Assume closed world after linking (#105845 )" (#106000 )" (#106001 ) This reverts commit 4b6c064dd124c70ff163411dff120c6174e0e022. Add a requirement for an amdgpu target in the test.	2024-08-25 17:23:36 -04:00
Anshil Gandhi	4b6c064dd1	Revert "[AMDGPU][LTO] Assume closed world after linking (#105845 )" (#106000 ) This reverts commit 33f3ebc86e7d3afcb65c551feba5bbc2421b42ed.	2024-08-25 14:56:39 -04:00
Anshil Gandhi	33f3ebc86e	[AMDGPU][LTO] Assume closed world after linking (#105845 )	2024-08-25 14:06:29 -04:00
Juan Manuel Martinez Caamaño	5def27c72c	[AMDGPU] Remove "amdgpu-enable-structurizer-workarounds" flag (#105819 )	2024-08-23 15:04:03 +02:00
Juan Manuel Martinez Caamaño	2b4b909509	[AMDGPU] Remove unused amdgpu-disable-structurizer flag (#105800 )	2024-08-23 14:14:17 +02:00
Juan Manuel Martinez Caamaño	cbf34a5f77	[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645 )	2024-08-23 14:06:17 +02:00
Matt Arsenault	dd90c72b05	AMDGPU: Temporarily stop adding AtomicExpand to new PM passes This breaks using -passes=atomic-expand (but only sometimes?). Somehow an AtomicExpand pass ends up running without a TargetMachine, despite always being constructed with one.	2024-08-21 00:19:37 +04:00
Matt Arsenault	33e18b2b43	AMDGPU/NewPM: Start filling out addIRPasses (#102884 ) This is not complete, but gets AtomicExpand running. I was able to get further than I expected; we're quite close to having all the IR codegen passes ported.	2024-08-20 23:38:05 +04:00
Matt Arsenault	afeef4dbc3	AMDGPU/NewPM: Fill out passes in addCodeGenPrepare (#102867 ) AMDGPUAnnotateKernelFeatures hasn't been ported yet, but it should be soon removable.	2024-08-20 23:35:01 +04:00
Matt Arsenault	7022498ac2	AMDGPU/NewPM: Start implementing addCodeGenPrepare (#102816 )	2024-08-20 00:10:45 +04:00
Christudasan Devadasan	a449b85724	[AMDGPU][R600] Move R600CodeGenPassBuilder into R600TargetMachine(NFC). (#103721 )	2024-08-19 20:40:12 +05:30
Christudasan Devadasan	a566635915	[AMDGPU] Move AMDGPUCodeGenPassBuilder into AMDGPUTargetMachine(NFC) (#103720 ) This will allow us to reuse the existing flags and the static functions while building the pipeline for new pass manager.	2024-08-19 20:32:55 +05:30
Matt Arsenault	36a0f20ac3	AMDGPU/NewPM: Fill out addPreISelPasses (#102814 ) This specific callback should now be at parity with the old pass manager version. There are still some missing IR passes before this point. Also I don't understand the need for the RequiresAnalysisPass at the end. SelectionDAG should just be using the uncached getResult?	2024-08-14 20:57:00 +04:00
Shilei Tian	862f5040fb	[AMDGPU] Enable AMDGPUAttributorPass in full LTO (#102673 ) This is basically same as https://github.com/llvm/llvm-project/pull/102086 but reverts some test case changes that are no longer needed.	2024-08-12 13:39:23 -04:00
Matt Arsenault	05b75e006b	AMDGPU/NewPM: Port AMDGPULateCodeGenPrepare to new pass manager (#102806 )	2024-08-12 15:09:12 +04:00
Matt Arsenault	1c764b952a	AMDGPU: Use GCNTargetMachine in AMDGPUCodeGenPassBuilder (#102805 ) R600 has a separate CodeGenPassBuilder anyway.	2024-08-12 15:02:48 +04:00
Matt Arsenault	dd094b2647	NewPM/AMDGPU: Port AMDGPUPerfHintAnalysis to new pass manager (#102645 ) This was much more difficult than I anticipated. The pass is not in a good state, with poor test coverage. The legacy PM does seem to be relying on maintaining the map state between different SCCs, which seems bad. The pass is going out of its way to avoid putting the attributes it introduces onto non-callee functions. If it just added them, we could use them directly instead of relying on the map, I would think. The NewPM path uses a ModulePass; I'm not sure if we should be using CGSCC here but there seems to be some missing infrastructure to support backend defined ones.	2024-08-11 15:11:10 +04:00
Matt Arsenault	3696a34e59	AMDGPU/NewPM: Port SILowerI1Copies to new pass manager (#102663 )	2024-08-10 07:08:22 +04:00
Matt Arsenault	77e68fbdd3	AMDGPU/NewPM: Port AMDGPUAnnotateUniformValues to new pass manager (#102654 )	2024-08-10 07:06:08 +04:00
Matt Arsenault	76f722f10c	AMDGPU/NewPM: Port SIAnnotateControlFlow to new pass manager (#102653 ) Does not yet add it to the pass pipeline. Somehow it causes 2 tests to assert in SelectionDAG, in functions without any control flow.	2024-08-10 07:02:21 +04:00
Shilei Tian	786c409234	[AMDGPU][Attributor] Add a pass parameter `closed-world` for AMDGPUAttributor pass (#101760 )	2024-08-09 22:12:09 -04:00

1 2 3 4 5 ...

578 Commits