llvm-project

Author	SHA1	Message	Date
Changpeng Fang	350bda4419	AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313 ) Rename the intrinsics to close to the instruction mnemonic names: Use global_load_tr_b64 and global_load_tr_b128 instead of global_load_tr. This patch also removes f16/bf16 versions of builtins/intrinsics. To simplify the design, we should avoid enumerating all possible types in implementing builtins. We can always use bitcast.	2024-03-25 16:55:22 -07:00
Nikita Popov	1aee1e1f4c	[Analysis] Convert tests to opaque pointers (NFC)	2024-02-05 12:04:39 +01:00
Changpeng Fang	3564666fe1	[AMDGPU]: Fix type signatures for wmma intrinsics, NFC (#80087 ) Make the wmma intrinsic type signatures to be canonical. We need a type signature as long as the type is not fixed. However, when an argument's type matches a previous argument's type, we do not need the signature for this argument. This patch fixes three general cases: 1. add missing signatures 2. remove signatures for matching arguments 3. reorer the signatures -- return type signature should always appear first	2024-01-30 23:17:35 -08:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Changpeng Fang	1a300d6da3	AMDGPU: Add SourceOfDivergence for int_amdgcn_global_load_tr (#79218 )	2024-01-23 14:30:11 -08:00
Mariusz Sikora	c99da46fc1	[AMDGPU][GFX12] Add Atomic cond_sub_u32 (#76224 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2024-01-17 19:23:42 +01:00
Mariusz Sikora	966416b9e8	[AMDGPU][GFX12] Add new v_permlane16 variants (#75475 )	2023-12-15 10:14:38 +01:00
Jun Wang	54470176af	[AMDGPU] Add inreg support for SGPR arguments (#67182 ) Function parameters marked with inreg are supposed to be allocated to SGPRs. However, for compute functions, this is ignored and function parameters are allocated to VGPRs. This fix modifies CC_AMDGPU_Func in AMDGPUCallingConv.td to use SGPRs if input arg is marked inreg. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2023-11-08 11:35:52 -08:00
Ruiling, Song	45e425e355	AMDGPU: Teach isArgPassedInSGPR() about cs_chain* calling convention (#67086 ) This cs_chain and cs_chain_preserve use InReg attribute to indicate argument passed through SGPR.	2023-09-22 22:24:17 +08:00
Mirko Brkusanin	de82fde22d	AMDGPU/Uniformity/GlobalISel: G_AMDGPU atomics are always divergent Patch by: Acim Maravic Differential Revision: https://reviews.llvm.org/D157091	2023-08-18 18:23:40 +02:00
Sameer Sahasrabuddhe	4d081560cd	[Uniformity] fix assert in a cycle made divergent by outside branch When diverged paths reach an irreducible cycle C, every block inside C gets marked as a join block. Such a join block J may be contained in a nest of reducible cycles inside C. When visiting J, we can only expect that the outermost C is irreducible, which we now correctly assert.	2023-08-18 13:25:13 +05:30
Sameer Sahasrabuddhe	d9847cde48	[GlobalISel] convergent intrinsics Introduced the convergent equivalent of the existing G_INTRINSIC opcodes: - G_INTRINSIC_CONVERGENT - G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS Out of the targets that currently have some support for GlobalISel, the patch assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154766	2023-07-31 12:15:39 +05:30
Sameer Sahasrabuddhe	da61c865e7	[RFC] Introduce convergence control intrinsics This is a reboot of the original design and implementation by Nicolai Haehnle <nicolai.haehnle@amd.com>: https://reviews.llvm.org/D85603 This change also obsoletes an earlier attempt at restarting the work on convergence tokens: https://reviews.llvm.org/D104504 Changes relative to D85603: 1. Clean up the definition of a "convergent operation", a convergent call and convergent function. 2. Clean up the relationship between dynamic instances, sets of threads and convergence tokens. 3. Redistribute the formal rules into the definitions of the convergence intrinsics. 4. Expand on the semantics of entering a function from outside LLVM, and the environment-defined outcome of the entry intrinsic. 5. Replace the term "cycle" with "closed path". The static rules are defined in terms of closed paths, and then a relation is established with cycles. 6. Specify that if a function contains a controlled convergent operation, then all convergent operations in that function must be controlled. 7. Describe an optional procedure to infer tokens for uncontrolled convergent operations. 8. Introduce controlled maximal convergence-before and controlled m-converged property as an update to the original properties in UniformityAnalysis. 9. Additional constraint that a cycle heart can only occur in the header of a reducible cycle (natural loop). Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D147116	2023-07-12 12:31:42 +05:30
Matt Arsenault	53fb907df4	AMDGPU: Special case uniformity info for single lane workgroups Constructors/destructors and OpenMP make use of single lane groups in some cases.	2023-06-28 07:25:48 -04:00
Matt Arsenault	92ee60b66f	AMDGPU: Drop and upgrade llvm.amdgcn.atomic.inc/dec to atomicrmw	2023-06-21 21:20:26 -04:00
Krzysztof Drewniak	faa2c678aa	[AMDGPU] Add buffer intrinsics that take resources as pointers In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments. The new intrinsics are named by replacing `buffer.` with `buffer.ptr`. One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer. In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis. This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions. Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones. Depends on D145441 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D147547	2023-06-05 16:59:07 +00:00
Sameer Sahasrabuddhe	9615d48540	[AMDGPU][Uniformity] SI_IF and SI_ELSE pseudos are always divergent Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D150861	2023-05-19 11:49:09 +05:30
Carl Ritson	9602c7a081	[AMDGPU][Uniformity] V_MBCNT* is never uniform Mark V_MBCNT instructions add thread/lane position so will never be uniform. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D150759	2023-05-18 13:50:12 +09:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Carl Ritson	cd811e2421	[AMDGPU][UniformityAnalysis] Fix typos in test comment (NFC)	2023-05-17 16:23:10 +09:00
Sameer Sahasrabuddhe	0a170eb786	[Uniformity] Propagate divergence only along divergent outputs. When an instruction is determined to be divergent, not all its outputs are divergent. The users of only divergent outputs should now be examined for divergence. Also, replaced a repeating pattern of "if new divergent instruction, then add to worklist" by combining it into a single function. This does not cause any change in functionality. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D150636	2023-05-17 07:47:43 +05:30
Sameer Sahasrabuddhe	fbe1c0616f	[LLVM][Uniformity] Improve detection of uniform registers The MachineUA now queries the target to determine if a given register holds a uniform value. This is determined using the corresponding register bank if available, or by a combination of the register class and value type. This assumes that the target is optimizing for performance by choosing registers, and the target is responsible for any mismatch with the inferred uniformity. For example, on AMDGPU, an SGPR is now treated as uniform, except if the register bank is VCC (i.e., the register holds a wave-wide vector of 1-bit values) or equivalently if it has a value type of s1. - This does not always work with inline asm, where the register bank or the value type might not be present. We assume that the SGPR is uniform, because it is not expected to be s1 in the vast majority of cases. - The pseudo branch instruction SI_LOOP is now hard-coded to be always divergent, although its condition is an SGPR. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D150438	2023-05-16 09:37:04 +05:30
Sameer Sahasrabuddhe	b0f0dd2554	[LLVM][Uniformity] Propagate temporal divergence explicitly At a cycle C with divergent exits, UA was using a naive traversal of the exiting edges to locate blocks that may use values defined inside C. But this traversal fails when it encounters a cycle. This is now replaced with a much simpler propagation that iterates over every instruction in C and checks any uses that are outside C. But such an iteration can be expensive when C is very large; the original strategy may need to be reconsidered if there is a regression in compilation times. Also fixed lit tests that should have originally caught the missed propagation of temporal divergence. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D149646	2023-05-15 20:17:43 +05:30
pvanhout	ae77aceba5	[Analysis] Remove DA & LegacyDA UniformityAnalysis offers all of the same features and much more, there is no reason left to use the legacy DAs. See RFC: https://discourse.llvm.org/t/rfc-deprecate-divergenceanalysis-legacydivergenceanalysis/69538 - Remove LegacyDivergenceAnalysis.h/.cpp - Remove DivergenceAnalysis.h/.cpp + Unit tests - Remove SyncDependenceAnalysis - it was not a real registered analysis and was only used by DAs - Remove/adjust references to the passes in the docs where applicable - Remove TTI hook associated with those passes. - Move tests to UniformityAnalysis folder. - Remove RUN lines for the DA, leave only the UA ones. - Some tests had to be adjusted/removed depending on how they used the legacy DAs. Reviewed By: foad, sameerds Differential Revision: https://reviews.llvm.org/D148116	2023-04-17 09:01:22 +02:00

24 Commits