llvm-project

Author	SHA1	Message	Date
Diana Picus	3356208531	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512 ) This reverts commit `7792b4ae79`. The problem was a conflict with `e55d6f5ea2` "[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (https://github.com/llvm/llvm-project/pull/107889)" which changed the syntax of V_SET_INACTIVE (and thus made my MIR test crash). ...if only we had a merge queue.	2024-09-13 11:54:30 +02:00
Diana Picus	7792b4ae79	Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )"" (#108341 ) Reverts llvm/llvm-project#108173 si-init-whole-wave.mir crashes on some buildbots (although it passed both locally with sanitizers enabled and in pre-merge tests). Investigating.	2024-09-12 10:12:09 +02:00
Diana Picus	703ebca869	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )" (#108173 ) This reverts commit `c7a7767fca`. The buildbots failed because I removed a MI from its parent before updating LIS. This PR should fix that.	2024-09-12 09:11:41 +02:00
Akshat Oke	e1ee07d0ff	[AMDGPU][NewPM] Port SIPeepholeSDWA pass to NPM (#107049 )	2024-09-11 14:30:16 +04:00
Vitaly Buka	c7a7767fca	Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 ) Breaks bots, see #105822. Reverts llvm/llvm-project#105822	2024-09-10 09:51:43 -07:00
Diana Picus	44556e64f2	[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822 ) This intrinsic is meant to be used in functions that have a "tail" that needs to be run with all the lanes enabled. The "tail" may contain complex control flow that makes it unsuitable for the use of the existing WWM intrinsics. Instead, we will pretend that the function starts with all the lanes enabled, then branches into the actual body of the function for the lanes that were meant to run it, and then finally all the lanes will rejoin and run the tail. As such, the intrinsic will return the EXEC mask for the body of the function, and is meant to be used only as part of a very limited pattern (for now only in amdgpu_cs_chain functions): ``` entry: %func_exec = call i1 @llvm.amdgcn.init.whole.wave() br i1 %func_exec, label %func, label %tail func: ; ... stuff that should run with the actual EXEC mask br label %tail tail: ; ... stuff that runs with all the lanes enabled; ; can contain more than one basic block ``` It's an error to use the result of this intrinsic for anything other than a branch (but unfortunately checking that in the verifier is non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if between the intrinsic and the branch). The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now is expanded in si-wqm (which is where SI_INIT_EXEC is handled too); however the information that the function was conceptually started in whole wave mode is stored in the machine function info (hasInitWholeWave). This will be useful in prolog epilog insertion, where we can skip saving the inactive lanes for CSRs (since if the function started with all the lanes active, then there are no inactive lanes to preserve).	2024-09-10 13:24:53 +02:00
Christudasan Devadasan	6c143a86cd	[CodeGen][NewPM] Port MachineCSE pass to new pass manager. (#106605 )	2024-09-04 18:54:07 +05:30
Christudasan Devadasan	042104985c	[AMDGPU][NewPM] Port SIShrinkInstructions to new pass manager. (#106967 )	2024-09-03 10:52:50 +05:30
Akshat Oke	da13754103	AMDGPU/NewPM Port SILoadStoreOptimizer to NPM (#106362 )	2024-09-02 11:41:56 +05:30
Shilei Tian	84ed3c29e8	Revert "[AMDGPU][LTO] Assume closed world after linking (#105845 )" (#106889 ) We can't assume closed world even in full LTO post-link stage. It is only true if we are building a "GPU executable". However, AMDGPU does support "dyamic library". I'm not aware of any approach to tell if it is relocatable link when we create the pass. For now let's revert the patch as it is currently breaking things. We can re-enable it once we can handle it correctly.	2024-09-01 09:32:08 -04:00
Akshat Oke	fdca2c33a1	AMDGPU/NewPM Port GCNDPPCombine to NPM (#105816 ) Co-authored-by: Akshat Oke <Akshat.Oke@amd.com>	2024-08-29 14:49:52 +05:30
Akshat Oke	2adc94cd6c	AMDGPU/NewPM: Port SIFoldOperands to new pass manager (#105801 )	2024-08-29 11:34:54 +05:30
Chaitanya	1f02be2e17	[AMDGPU] Enable "amdgpu-sw-lower-lds" pass in pipeline. (#89206 ) This PR enables "amdgpu-sw-lower-lds" pass in the pipeline. Also introduces "amdgpu-enable-sw-lower-lds" cmd line flag to enbale/disable the pass.	2024-08-26 14:21:19 +05:30
Chaitanya	7bc9d95b7e	[AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses. (#87265 ) This PR introduces new pass "amdgpu-sw-lower-lds". This pass lowers the local data store, LDS, uses in kernel and non-kernel functions in module to use dynamically allocated global memory. Packed LDS Layout is emulated in the global memory. The lowered memory instructions from LDS to global memory are then instrumented for address sanitizer, to catch addressing errors. This pass only work when address sanitizer has been enabled and has instrumented the IR. It identifies that IR has been instrumented using "nosanitize_address" module flag. For a kernel, LDS access can be static or dynamic which are direct (accessed within kernel) and indirect (accessed through non-kernels). Replacement of Kernel LDS accesses: - All the LDS accesses corresponding to kernel will be packed together, where all static LDS accesses will be allocated first and then dynamic LDS follows. The total size with alignment is calculated. A new LDS global will be created for the kernel called "SW LDS" and it will have the attribute "amdgpu-lds-size" attached with value of the size calculated. All the LDS accesses in the module will be replaced by GEP with offset into the "Sw LDS". - A new "llvm.amdgcn.<kernel>.dynlds" is created per kernel accessing the dynamic LDS. This will be marked used by kernel and will have MD_absolue_symbol metadata set to total static LDS size, Since dynamic LDS allocation starts after all static LDS allocation. - A device global memory equal to the total LDS size will be allocated. At the prologue of the kernel, a single work-item from the work-group, does a "malloc" and stores the pointer of the allocation in "SW LDS". To store the offsets corresponding to all LDS accesses, another global variable is created which will be called "SW LDS metadata" in this pass. - SW LDS: It is LDS global of ptr type with name "llvm.amdgcn.sw.lds.<kernel-name>". - SW LDS Metadata: It is of struct type, with n members. n equals the number of LDS globals accessed by the kernel(direct and indirect). Each member of struct is another struct of type {i32, i32, i32}. First member corresponds to offset, second member corresponds to size of LDS global being replaced and third represents the total aligned size. It will have name "llvm.amdgcn.sw.lds.<kernel-name>.md". This global will have an intializer with static LDS related offsets and sizes initialized. But for dynamic LDS related entries, offsets will be intialized to previous static LDS allocation end offset. Sizes for them will be zero initially. These dynamic LDS offset and size values will be updated with in the kernel, since kernel can read the dynamic LDS size allocation done at runtime with query to "hidden_dynamic_lds_size" hidden kernel argument. - At the epilogue of kernel, allocated memory would be made free by the same single work-item. Replacement of non-kernel LDS accesses: - Multiple kernels can access the same non-kernel function. All the kernels accessing LDS through non-kernels are sorted and assigned a kernel-id. All the LDS globals accessed by non-kernels are sorted. - This information is used to build two tables: - Base table: Base table will have single row, with elements of the row placed as per kernel ID. Each element in the row corresponds to ptr of "SW LDS" variable created for that kernel. - Offset table: Offset table will have multiple rows and columns. Rows are assumed to be from 0 to (n-1). n is total number of kernels accessing the LDS through non-kernels. Each row will have m elements. m is the total number of unique LDS globals accessed by all non-kernels. Each element in the row correspond to the ptr of the replacement of LDS global done by that particular kernel. - A LDS variable in non-kernel will be replaced based on the information from base and offset tables. Based on kernel-id query, ptr of "SW LDS" for that corresponding kernel is obtained from base table. The Offset into the base "SW LDS" is obtained from corresponding element in offset table. With this information, replacement value is obtained.	2024-08-26 08:59:26 +05:30
Anshil Gandhi	033e225d90	Revert "Revert "[AMDGPU][LTO] Assume closed world after linking (#105845 )" (#106000 )" (#106001 ) This reverts commit 4b6c064dd124c70ff163411dff120c6174e0e022. Add a requirement for an amdgpu target in the test.	2024-08-25 17:23:36 -04:00
Anshil Gandhi	4b6c064dd1	Revert "[AMDGPU][LTO] Assume closed world after linking (#105845 )" (#106000 ) This reverts commit 33f3ebc86e7d3afcb65c551feba5bbc2421b42ed.	2024-08-25 14:56:39 -04:00
Anshil Gandhi	33f3ebc86e	[AMDGPU][LTO] Assume closed world after linking (#105845 )	2024-08-25 14:06:29 -04:00
Juan Manuel Martinez Caamaño	5def27c72c	[AMDGPU] Remove "amdgpu-enable-structurizer-workarounds" flag (#105819 )	2024-08-23 15:04:03 +02:00
Juan Manuel Martinez Caamaño	2b4b909509	[AMDGPU] Remove unused amdgpu-disable-structurizer flag (#105800 )	2024-08-23 14:14:17 +02:00
Juan Manuel Martinez Caamaño	cbf34a5f77	[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645 )	2024-08-23 14:06:17 +02:00
Matt Arsenault	dd90c72b05	AMDGPU: Temporarily stop adding AtomicExpand to new PM passes This breaks using -passes=atomic-expand (but only sometimes?). Somehow an AtomicExpand pass ends up running without a TargetMachine, despite always being constructed with one.	2024-08-21 00:19:37 +04:00
Matt Arsenault	33e18b2b43	AMDGPU/NewPM: Start filling out addIRPasses (#102884 ) This is not complete, but gets AtomicExpand running. I was able to get further than I expected; we're quite close to having all the IR codegen passes ported.	2024-08-20 23:38:05 +04:00
Matt Arsenault	afeef4dbc3	AMDGPU/NewPM: Fill out passes in addCodeGenPrepare (#102867 ) AMDGPUAnnotateKernelFeatures hasn't been ported yet, but it should be soon removable.	2024-08-20 23:35:01 +04:00
Matt Arsenault	7022498ac2	AMDGPU/NewPM: Start implementing addCodeGenPrepare (#102816 )	2024-08-20 00:10:45 +04:00
Christudasan Devadasan	a449b85724	[AMDGPU][R600] Move R600CodeGenPassBuilder into R600TargetMachine(NFC). (#103721 )	2024-08-19 20:40:12 +05:30
Christudasan Devadasan	a566635915	[AMDGPU] Move AMDGPUCodeGenPassBuilder into AMDGPUTargetMachine(NFC) (#103720 ) This will allow us to reuse the existing flags and the static functions while building the pipeline for new pass manager.	2024-08-19 20:32:55 +05:30
Matt Arsenault	36a0f20ac3	AMDGPU/NewPM: Fill out addPreISelPasses (#102814 ) This specific callback should now be at parity with the old pass manager version. There are still some missing IR passes before this point. Also I don't understand the need for the RequiresAnalysisPass at the end. SelectionDAG should just be using the uncached getResult?	2024-08-14 20:57:00 +04:00
Shilei Tian	862f5040fb	[AMDGPU] Enable AMDGPUAttributorPass in full LTO (#102673 ) This is basically same as https://github.com/llvm/llvm-project/pull/102086 but reverts some test case changes that are no longer needed.	2024-08-12 13:39:23 -04:00
Matt Arsenault	05b75e006b	AMDGPU/NewPM: Port AMDGPULateCodeGenPrepare to new pass manager (#102806 )	2024-08-12 15:09:12 +04:00
Matt Arsenault	1c764b952a	AMDGPU: Use GCNTargetMachine in AMDGPUCodeGenPassBuilder (#102805 ) R600 has a separate CodeGenPassBuilder anyway.	2024-08-12 15:02:48 +04:00
Matt Arsenault	dd094b2647	NewPM/AMDGPU: Port AMDGPUPerfHintAnalysis to new pass manager (#102645 ) This was much more difficult than I anticipated. The pass is not in a good state, with poor test coverage. The legacy PM does seem to be relying on maintaining the map state between different SCCs, which seems bad. The pass is going out of its way to avoid putting the attributes it introduces onto non-callee functions. If it just added them, we could use them directly instead of relying on the map, I would think. The NewPM path uses a ModulePass; I'm not sure if we should be using CGSCC here but there seems to be some missing infrastructure to support backend defined ones.	2024-08-11 15:11:10 +04:00
Matt Arsenault	3696a34e59	AMDGPU/NewPM: Port SILowerI1Copies to new pass manager (#102663 )	2024-08-10 07:08:22 +04:00
Matt Arsenault	77e68fbdd3	AMDGPU/NewPM: Port AMDGPUAnnotateUniformValues to new pass manager (#102654 )	2024-08-10 07:06:08 +04:00
Matt Arsenault	76f722f10c	AMDGPU/NewPM: Port SIAnnotateControlFlow to new pass manager (#102653 ) Does not yet add it to the pass pipeline. Somehow it causes 2 tests to assert in SelectionDAG, in functions without any control flow.	2024-08-10 07:02:21 +04:00
Shilei Tian	786c409234	[AMDGPU][Attributor] Add a pass parameter `closed-world` for AMDGPUAttributor pass (#101760 )	2024-08-09 22:12:09 -04:00
Shilei Tian	492484e657	Revert "[AMDGPU] Move `AMDGPUAttributorPass` to full LTO post link stage (#102086 )" This reverts commit 2fe61a5acf272d6826352ef72f47196b01003fc5.	2024-08-09 15:12:24 -04:00
Shilei Tian	2fe61a5acf	[AMDGPU] Move `AMDGPUAttributorPass` to full LTO post link stage (#102086 ) Currently `AMDGPUAttributorPass` is registered in default optimizer pipeline. This will allow the pass to run in default pipeline as well as at thinLTO post link stage. However, it will not run in full LTO post link stage. This patch moves it to full LTO.	2024-08-09 13:35:00 -04:00
Matt Arsenault	cf54cae26b	AMDGPU/NewPM: Port SIFixSGPRCopies to new pass manager (#102614 ) This allows moving some tests relying on -stop-after=amdgpu-isel to move to checking -stop-after=finalize-isel instead, which will more reliably pass the verifier.	2024-08-09 17:52:41 +04:00
Christudasan Devadasan	15b41d207e	[CodeGen] change prototype of regalloc filter function (#93525 ) [CodeGen] Change the prototype of regalloc filter function Change the prototype of the filter function so that we can filter not just by RegClass. We need to implement more complicated filter based upon some other info associated with each register. Patch provided by: Gang Chen (gangc@amd.com)	2024-07-22 16:49:39 +05:30
Jay Foad	74b87b02d2	[AMDGPU] Fix and add namespace closing comments. NFC.	2024-07-16 16:56:31 +01:00
Matt Arsenault	b1bcb7ca46	Reapply "AMDGPU: Move attributor into optimization pipeline (#83131 )" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851 ) This reverts commit adaff46d087799072438dd744b038e6fd50a2d78. Drop the -O3 checks from default-attributes.hip. I don't know why they are different on some bots but reverting this is far too disruptive.	2024-07-15 11:51:44 +04:00
dyung	adaff46d08	Revert "AMDGPU: Move attributor into optimization pipeline (#83131 )" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851 ) This reverts commits 677cc15e0ff2e0e6aa30538eb187990a6a8f53c0 and 78bc1b64a6dc3fb6191355a5e1b502be8b3668e7. The test CodeGenHIP/default-attributes.hip is failing on multiple bots even after the attempted fix including the following: - https://lab.llvm.org/buildbot/#/builders/3/builds/1473 - https://lab.llvm.org/buildbot/#/builders/65/builds/1380 - https://lab.llvm.org/buildbot/#/builders/161/builds/595 - https://lab.llvm.org/buildbot/#/builders/154/builds/1372 - https://lab.llvm.org/buildbot/#/builders/133/builds/1547 - https://lab.llvm.org/buildbot/#/builders/81/builds/755 - https://lab.llvm.org/buildbot/#/builders/40/builds/570 - https://lab.llvm.org/buildbot/#/builders/13/builds/748 - https://lab.llvm.org/buildbot/#/builders/12/builds/1845 - https://lab.llvm.org/buildbot/#/builders/11/builds/1695 - https://lab.llvm.org/buildbot/#/builders/190/builds/1829 - https://lab.llvm.org/buildbot/#/builders/193/builds/962 - https://lab.llvm.org/buildbot/#/builders/23/builds/991 - https://lab.llvm.org/buildbot/#/builders/144/builds/2256 - https://lab.llvm.org/buildbot/#/builders/46/builds/1614 These bots have been broken for a day, so reverting to get everything back to green.	2024-07-14 18:48:54 -07:00
Matt Arsenault	78bc1b64a6	AMDGPU: Move attributor into optimization pipeline (#83131 ) Removing it from the codegen pipeline induces a lot of test churn because llc is no longer optimizing out implicit arguments to kernels. Mostly mechanical, but there are some creative test updates. I preferred to take the changes as-is in tests where the ABI isn't relevant. In cases where it's more relevant, or the optimize out logic was too ingrained in the test, I pre-run the optimization. Some cases manually add attributes to disable inputs.	2024-07-14 08:36:33 +04:00
Jeffrey Byrnes	5da7179cb3	[AMDGPU] Reland: Add IR LiveReg type-based optimization	2024-07-03 09:26:19 -07:00
Vitaly Buka	3e53c97d33	Revert "[AMDGPU] Add IR LiveReg type-based optimization" (#97138 ) Part of #66838. https://lab.llvm.org/buildbot/#/builders/52/builds/404 https://lab.llvm.org/buildbot/#/builders/55/builds/358 https://lab.llvm.org/buildbot/#/builders/164/builds/518 This reverts commit ded956440739ae326a99cbaef18ce4362e972679.	2024-06-28 23:18:26 -07:00
Jeffrey Byrnes	ded9564407	[AMDGPU] Add IR LiveReg type-based optimization Change-Id: Ia0d11b79b8302e79247fe193ccabc0dad2d359a0	2024-06-28 15:01:39 -07:00
Nikita Popov	5cd0ba30f5	Reapply [IR] Lazily initialize the class to pass name mapping (NFC) (#96321 ) (#96462 ) On MSVC the `this` uses inside `decltype` require a lambda capture. On clang they result in an unused capture warning instead. Add the capture and suppress the warning with `(void)this`. ----- Initializing this map is somewhat expensive (especially for O0), so we currently only do it if certain flags are used. I would like to make use of it for crash dumps (#96078), where we don't know in advance whether it will be needed or not. This patch changes the initialization to a lazy approach, where a callback is registered that does the actual initialization. The callbacks will be run the first time the pass name is requested. This way there is no compile-time impact if the mapping is not used.	2024-06-24 15:00:11 +02:00
Nikita Popov	e5a41f0afc	Revert "[IR] Lazily initialize the class to pass name mapping (NFC) (#96321 )" My attempt to fix the Windows build made things worse, revert entirely for now. This reverts commit e7137f2fed5cfee822ae3c4c6d39188adb59a16c. This reverts commit 6eaf204dbb0a6a81cddfd02f625c130f7bb1aae5. This reverts commit 957dc4366dd2ce9d5d2991c3ad76bbf438e9954e.	2024-06-24 10:32:03 +02:00
Nikita Popov	957dc4366d	[IR] Lazily initialize the class to pass name mapping (NFC) (#96321 ) Initializing this map is somewhat expensive (especially for O0), so we currently only do it if certain flags are used. I would like to make use of it for crash dumps (#96078), where we don't know in advance whether it will be needed or not. This patch changes the initialization to a lazy approach, where a callback is registered that does the actual initialization. The callbacks will be run the first time the pass name is requested. This way there is no compile-time impact if the mapping is not used.	2024-06-24 09:40:09 +02:00
vg0204	c2fc7f75f6	Revert "[AMDGPU]Optimize SGPR spills (#93668 )" This reverts commit 4b9112e88a998ce620e4683548f2afd17cc5fe95. A separate issue(#96353) describing it has been opened to further keep its track.	2024-06-24 12:36:36 +05:30

1 2 3 4 5 ...

563 Commits