llvm-project

Author	SHA1	Message	Date
Jon Chesterfield	56202c51d4	Revert "[amdgpu][lds] Use the same isKernel predicate consistently" Looks like this composed poorly with a nominally independent patch, will fix This reverts commit 0ba0398517778514eb44cb7ba9bf9d4d20a856e0.	2022-11-09 16:54:20 +00:00
Jon Chesterfield	0ba0398517	[amdgpu][lds] Use the same isKernel predicate consistently isKernelCC != isKernel(F->getCallingConv()) There's a test case (lower-kernel-lds.ll) that explicitly skips amdgpu_ps so this change picks the isKernel predicate that continues to skip that calling convention. isKernel returns true for AMDGPU_KERNEL and SPIR_KERNEL. isKernelCC also returns true for other calling conventions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136599	2022-11-09 16:45:05 +00:00
Kazu Hirata	20d764aff0	[llvm] Don't including SetVector.h (NFC) llvm/lib/ProfileData/RawMemProfReader.cpp uses SetVector without including SetVector.h, so this patch adds an appropriate #include there.	2022-09-17 12:36:43 -07:00
Jon Chesterfield	cdb9738963	[amdgpu] Expand all ConstantExpr users of LDS variables in instructions Bug noted in D112717 can be sidestepped with this change. Expanding all ConstantExpr involved with LDS up front makes the variable specialisation simpler. Excludes ConstantExpr that don't access LDS to avoid disturbing codegen elsewhere. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D133422	2022-09-14 07:55:46 +01:00
Jon Chesterfield	a28bbd00c6	[amdgpu][nfc] Factor predicate out of findLDSVariablesToLower	2022-08-31 15:44:51 +01:00
Kazu Hirata	e20d210eef	[llvm] Qualify auto (NFC) Identified with readability-qualified-auto.	2022-08-07 23:55:27 -07:00
Austin Kerbow	f5b21680d1	[AMDGPU] Add amdgcn_sched_group_barrier builtin This builtin allows the creation of custom scheduling pipelines on a per-region basis. Like the sched_barrier builtin this is intended to be used either for testing, in situations where the default scheduler heuristics cannot be improved, or in critical kernels where users are trying to get performance that is close to handwritten assembly. Obviously using these builtins will require extra work from the kernel writer to maintain the desired behavior. The builtin can be used to create groups of instructions called "scheduling groups" where ordering between the groups is enforced by the scheduler. __builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter is a mask that determines the types of instructions that you would like to synchronize around and add to a scheduling group. These instructions will be selected from the bottom up starting from the sched_group_barrier's location during instruction scheduling. The second parameter is the number of matching instructions that will be associated with this sched_group_barrier. The third parameter is an identifier which is used to describe what other sched_group_barriers should be synchronized with. Note that multiple sched_group_barriers must be added in order for them to be useful since they only synchronize with other sched_group_barriers. Only "scheduling groups" with a matching third parameter will have any enforced ordering between them. As an example, the code below tries to create a pipeline of 1 VMEM_READ instruction followed by 1 VALU instruction followed by 5 MFMA instructions... // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 1 VALU __builtin_amdgcn_sched_group_barrier(2, 1, 0) // 5 MFMA __builtin_amdgcn_sched_group_barrier(8, 5, 0) // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 3 VALU __builtin_amdgcn_sched_group_barrier(2, 3, 0) // 2 VMEM_WRITE __builtin_amdgcn_sched_group_barrier(64, 2, 0) Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D128158	2022-07-28 10:43:14 -07:00
Austin Kerbow	2db700215a	[AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If there is a need to have highly optimized codegen and kernel developers have knowledge of inter-wave runtime behavior which is unknown to the compiler this builtin can be used to tune scheduling. This intrinsic creates a barrier between scheduling regions. The immediate parameter is a mask to determine the types of instructions that should be prevented from crossing the sched_barrier. In this initial patch, there are only two variations. A mask of 0 means that no instructions may be scheduled across the sched_barrier. A mask of 1 means that non-memory, non-side-effect inducing instructions may cross the sched_barrier. Note that this intrinsic is only meant to work with the scheduling passes. Any other transformations that may move code will not be impacted in the ways described above. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D124700	2022-05-11 13:22:51 -07:00
Stanislav Mekhanoshin	c7eb846345	[AMDGPU] Merge AMDGPULDSUtils into AMDGPUMemoryUtils Differential Revision: https://reviews.llvm.org/D119502	2022-02-11 10:32:24 -08:00
Stanislav Mekhanoshin	290e5722e8	[AMDGPU] Improve clobbering checks in the kernel argument promotion Use same MSSA clobbering checks as in the AMDGPUAnnotateUniformValues. Kernel argument promotion needs exactly the same information so factor out utility function isClobberedInFunction. Differential Revision: https://reviews.llvm.org/D119480	2022-02-10 14:51:47 -08:00

10 Commits