llvm-project

Author	SHA1	Message	Date
Sameer Sahasrabuddhe	421557974a	[AMDGPU] Use glue for convergence tokens at call-like operations (#86766 ) The earlier implementation on AMDGPU used explicit token operands at SI_CALL and SI_CALL_ISEL. This is now replaced with CONVERGENCECTRL_GLUE operands, with the following effects: - The treatment of tokens at call-like operations is now consistent with the treatment at intrinsics. - Support for tail calls using implicit tokens at SI_TCRETURN "just works". - The extra parameter at call-like instructions is eliminated, thus restoring those instructions and their handling to the original state. The new glue node is placed after the existing glue node for the outgoing call parameters, which seems to not interfere with selection of the call-like nodes.	2024-04-01 10:51:13 +05:30
Thomas Symalla	256343a0e9	Revert "Update amdgpu_gfx functions to use s0-s3 for inreg SGPR arguments on targets using scratch instructions for stack #78226 " (#86273 ) Reverts llvm/llvm-project#81394 This reverts commit 3ac243bc0d7922d083af2cf025247b5698556062. It is not handling RSrc registers s0-s3 correctly. This leads to a broken test, where it expects s0-s3 as function argument and uses it as RSrc register as well. We need to re-visit the patch, but apparently we only want to have s0-s3 as argument registers if we don't need them as RSrc registers.	2024-03-26 11:01:08 +01:00
Changpeng Fang	350bda4419	AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313 ) Rename the intrinsics to close to the instruction mnemonic names: Use global_load_tr_b64 and global_load_tr_b128 instead of global_load_tr. This patch also removes f16/bf16 versions of builtins/intrinsics. To simplify the design, we should avoid enumerating all possible types in implementing builtins. We can always use bitcast.	2024-03-25 16:55:22 -07:00
David Stuttard	75e528fdd9	[AMDGPU] Extend zero initialization of return values for TFE (#85759 ) buffer_load instructions that use TFE also need to zero initialize return values similar to how the image instructions currently work. Add support for this with standard zero init of all results + zero init of just TFE flag when enable-prt-strict-null subtarget feature is disabled.	2024-03-25 09:01:46 +00:00
Pierre van Houtryve	babbdad15b	[AMDGPU] Handle non-register operands for S_SUB/ADD_U64_PSEUDO (#86104 ) This pseudo uses SSrc_b64 so it allows both an immediate or a register, but the lowering crashed on immediate operands.	2024-03-25 09:23:40 +01:00
SahilPatidar	3ac243bc0d	Update amdgpu_gfx functions to use s0-s3 for inreg SGPR arguments on targets using scratch instructions for stack #78226 (#81394 ) Resolve #78226	2024-03-21 16:52:08 +05:30
Harald van Dijk	ceb744eb2f	[AMDGPU] Fix canonicalization of truncated values. (#83054 ) We were relying on roundings to implicitly canonicalize, which is generally safe, except with roundings that may be optimized away. Fixes #82937.	2024-03-13 12:08:39 +00:00
Arthur Eubanks	94c988bcfd	[NFC] Remove unused parameter from shouldAssumeDSOLocal()	2024-03-11 19:48:17 +00:00
Shilei Tian	e963d0740e	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#84402 ) The current implementation of `isInlinableLiteral16` assumes, a 16-bit inlinable literal is either an `i16` or a `fp16`. This is not always true because of `bf16`. However, we can't tell `fp16` and `bf16` apart by just looking at the value. This patch splits `isInlinableLiteral16` into three versions, `i16`, `fp16`, `bf16` respectively, and call the corresponding version.	2024-03-08 14:49:52 -05:00
Krzysztof Drewniak	6540f1635a	[AMDGPU] Add IR-level pass to rewrite away address space 7 (#77952 ) This commit adds the -lower-buffer-fat-pointers pass, which is applicable to all AMDGCN compilations. The purpose of this pass is to remove the type `ptr addrspace(7)` from incoming IR. This must be done at the LLVM IR level because `ptr addrspace(7)`, as a 160-bit primitive type, cannot be correctly handled by SelectionDAG. The detailed operation of the pass is described in comments, but, in summary, the removal proceeds by: 1. Rewriting loads and stores of ptr addrspace(7) to loads and stores of i160 (including vectors and aggregates). This is needed because the in-register representation of these pointers will stop matching their in-memory representation in step 2, and so ptrtoint/inttoptr operations are used to preserve the expected memory layout 2. Mutating the IR to replace all occurrences of `ptr addrspace(7)` with the type `{ptr addrspace(8), ptr addrspace(6) }`, which makes the two parts of a buffer fat pointer (the 128-bit address space 8 resource and the 32-bit address space 6 offset) visible in the IR. This also impacts the argument and return types of functions. 3. Splitting the resource and offset parts. All instructions that produce or consume buffer fat pointers (like GEP or load) are rewritten to produce or consume the resource and offset parts separately. For example, GEP updates the offset part of the result and a load uses the resource and offset parts to populate the relevant llvm.amdgcn.raw.ptr.buffer.load intrinsic call. At the end of this process, the original mutated instructions are replaced by their new split counterparts, ensuring no invalidly-typed IR escapes this pass. (For operations like call, where the struct form is needed, insertelement operations are inserted). Compared to LGC's PatchBufferOp ( `32cda89776/lgc/patch/PatchBufferOp.cpp` ): this pass - Also handles vectors of ptr addrspace(7)s - Also handles function boundaries - Includes the same uniform buffer optimization for loops and conditionals - Does not handle memcpy() and friends (this is future work) - Does not break up large loads and stores into smaller parts. This should be handled by extending the legalization of .buffer.{load,store} to handle larger types by producing multiple instructions (the same way ordinary LOAD and STORE are legalized). That work is planned for a followup commit. - Does not* have special logic for handling divergent buffer descriptors. The logic in LGC is, as far as I can tell, incorrect in general, and, per discussions with @nhaehnle, isn't widely used. Therefore, divergent descriptors are handled with waterfall loops later in legalization. As a final matter, this commit updates atomic expansion to treat buffer operations analogously to global ones. (One question for reviewers: is the new pass is the right place? Should it be later in the pipeline?) Differential Revision: https://reviews.llvm.org/D158463	2024-03-06 09:49:58 -06:00
Mirko Brkušanin	1fd1f4c0e1	[AMDGPU] Handle amdgpu.last.use metadata (#83816 ) Convert !amdgpu.last.use metadata into MachineMemOperand for last use and handle it in SIMemoryLegalizer similar to nontemporal and volatile.	2024-03-06 16:33:52 +01:00
Joseph Huber	1fc5e50ceb	[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906 ) Summary: This patch implements the LLVM floating point environment control intrinsics and also exposes it through clang. We encode the floating point environment as a 64-bit value that simply concatenates the values of the mode registers and the current trap status. We only fetch the bits relevant for floating point instructions. That is, rounding mode, denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16 overflow, and active exceptions.	2024-03-06 08:11:54 -06:00
Shilei Tian	e9c1dbb408	Revert "[AMDGPU] Replace `isInlinableLiteral16` with specific version (#81345 )" This reverts commit 530f0e64ec11327879c44f2fd55c7c28efdbaa2d because it breaks downstream.	2024-03-06 08:42:54 -05:00
Sameer Sahasrabuddhe	60822637bf	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-06 12:19:32 +05:30
Martin Wehking	d35f2c439a	Remove constant local variable (#83850 ) Remove isThisReturn, which always has the value false. Replace its uses with false directly.	2024-03-06 00:53:09 +05:30
Mitch Phillips	f010b1bef4	Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785 )"" This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Reason: Broke the sanitizer buildbots. See the comments at https://github.com/llvm/llvm-project/pull/71785 for more information.	2024-03-04 17:05:34 +01:00
Shilei Tian	530f0e64ec	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#81345 )	2024-03-04 08:40:42 -05:00
Sameer Sahasrabuddhe	c7fdd8c11e	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" Original commit 79889734b940356ab3381423c93ae06f22e772c9. Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-04 13:28:04 +05:30
Pierre van Houtryve	756166e342	[AMDGPU] Improve detection of non-null addrspacecast operands (#82311 ) Use IR analysis to infer when an addrspacecast operand is nonnull, then lower it to an intrinsic that the DAG can use to skip the null check. I did this using an intrinsic as it's non-intrusive. An alternative would have been to allow something like `!nonnull` on `addrspacecast` then lower that to a custom opcode (or add an operand to the addrspacecast MIR/DAG opcodes), but it's a lot of boilerplate for just one target's use case IMO. I'm hoping that when we switch to GISel that we can move all this logic to the MIR level without losing info, but currently the DAG doesn't see enough so we need to act in CGP. Fixes: SWDEV-316445	2024-03-01 14:01:10 +01:00
Shilei Tian	bfcf7a0707	[AMDGPU] Remove `hasAtomicFaddRtnForTy` as it is not used anywhere (#82841 )	2024-02-23 21:14:38 -05:00
Ivan Kosarev	dfa1d9b027	[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772 ) These are hoped to provide more convenient and less error prone facilities to encode and decode fields than manually defined constants and functions.	2024-02-23 17:34:55 +00:00
Nick Anderson	c5bbf979ad	[AMDGPU] fixes mistake in #82018 (#82223 ) fixes #81766 #82018	2024-02-21 13:12:03 -05:00
Sameer Sahasrabuddhe	a2afcd5721	Revert "Implement convergence control in MIR using SelectionDAG (#71785 )" This reverts commit 79889734b940356ab3381423c93ae06f22e772c9. Encountered multiple buildbot failures.	2024-02-21 11:07:02 +05:30
Jie Fu	086280f4d1	[AMDGPU] Fix linking error of SIISelLowering.cpp.o (NFC) ld.lld: error: undefined symbol: llvm::MachineOperand::dump() const >>> referenced by SIISelLowering.cpp	2024-02-21 13:07:34 +08:00
Sameer Sahasrabuddhe	79889734b9	Implement convergence control in MIR using SelectionDAG (#71785 ) LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-02-21 10:06:37 +05:30
Nick Anderson	767433ba88	[AMDGPU] fixes duplicate expressions in if stmnts in SIISelLowering.cpp (#82018 ) fixes #81766	2024-02-18 21:19:15 +05:30
Shilei Tian	9c6a2de24b	[AMDGPU] Clean up functions for checking inline literals (#81282 ) This patch cleans up functions for checking inline literals.	2024-02-15 12:11:51 -05:00
Joseph Huber	11fcae69db	[LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (#81331 ) Summary: This patch adds a new intrinsic and builtin function mirroring the existing `__builtin_readcyclecounter`. The difference is that this implementation targets a separate counter that some targets have which returns a fixed frequency clock that can be used to determine elapsed time, this is different compared to the cycle counter which often has variable frequency. This patch only adds support for the NVPTX and AMDGPU targets. This is done as a new and separate builtin rather than an argument to `readcyclecounter` to avoid needing to change existing code and to make the separation more explicit.	2024-02-13 10:06:25 -06:00
Austin Kerbow	4bcbeaed63	[AMDGPU] Enable kernel arg preloading with gfx90a (#81180 ) Add a trap instruction to the beginning of the kernel prologue to handle cases where preloading is attempted on HW loaded with incompatible firmware.	2024-02-12 22:33:29 -08:00
Pranav Kant	c95693c746	[NFC][AMDGPU] Fix unused-variable warning (#81040 ) This is only used in assert statement.	2024-02-07 13:21:01 -08:00
Jeffrey Byrnes	3115ad8980	[AMDGPU] Accept arbitrary sized sources in CalculateByteProvider (#70240 ) Reland the original patch with additional commit containing fix for two issues: 1. Attempting to bitcast using MVTs with no corresponding LLVM type. getDWordFromOffset now works directly with the original vector to get the corresponding elements given the DWordOffset. 2. Improper bit tracking in CalculateByteProvider for vector types using certain ops. Previously, bit tracking for certain ops (e.g. ISD::TRUNCATE) assumed operands were scalar types, which is not correct since these ops have different semantics depending on vector / scalar. CalculateByteProvider / CalculateSrcByte now exit on vector types, handling which is a TODO.	2024-02-07 11:34:50 -08:00
Jay Foad	c2c650f62e	[AMDGPU] Stop combining arbitrary offsets into PAL relocs (#80034 ) PAL uses ELF REL (not RELA) relocations which can only store a 32-bit addend in the instruction, even for reloc types like R_AMDGPU_ABS32_HI which require the upper 32 bits of a 64-bit address calculation to be correct. This means that it is not safe to fold an arbitrary offset into a GlobalAddressSDNode, so stop doing that. In practice this is mostly a problem for small negative offsets which do not work as expected because PAL treats the 32-bit addend as unsigned.	2024-01-31 10:28:23 +00:00
Kazu Hirata	8582d41789	[Target] Use SDValue::getConstantOperandVal (NFC)	2024-01-29 18:46:16 -08:00
Jay Foad	8b429fc3fe	[AMDGPU] Update SITargetLowering::getAddrModeArguments (#78740 ) Handle every intrinsic for which getTgtMemIntrinsic returns with Info.ptrVal set to one of the intrinsic's operands. A bunch of these cases were missing.	2024-01-29 15:03:26 +00:00
Kazu Hirata	ae46855f53	[Target] Use getConstantOperand (NFC)	2024-01-28 18:03:38 -08:00
Jay Foad	66c710ec9d	[AMDGPU] Do not bother adding reserved registers to liveins (#79436 ) Tweak the implementation of llvm.amdgcn.wave.id to not add TTMP8 to the function liveins.	2024-01-25 15:17:06 +00:00
Jay Foad	45d2d7757f	[AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325 ) This is only valid on targets with architected SGPRs.	2024-01-25 07:48:06 +00:00
Jay Foad	fe9f3903f2	[AMDGPU] Update isLegalAddressingMode for GFX12 SMEM loads (#78728 )	2024-01-24 21:04:43 +00:00
Jay Foad	70fc970378	[AMDGPU] Move architected SGPR implementation into isel (#79120 )	2024-01-24 15:06:20 +00:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Changpeng Fang	32073b8356	AMDGPU: Do not generate non-temporal hint when Load_Tr intrinsic did not specify it (#79104 ) int_amdgcn_global_load_tr did not specify non-temporal load transpose, thus we should not genetrate the non-temporal hint for the load. We need to implement getTgtMemIntrinsic to create the corresponding MemSDNode. And we don't set the non-temporal flag because the intrinsic did not specify it. NOTE: We need to implement getTgtMemIntrinsic for any memory intrinsics.	2024-01-23 10:05:32 -08:00
Emma Pilkington	bc82cfb38d	[AMDGPU] Add an asm directive to track code_object_version (#76267 ) Named '.amdhsa_code_object_version'. This directive sets the e_ident[ABIVERSION] in the ELF header, and should be used as the assumed COV for the rest of the asm file. This commit also weakens the --amdhsa-code-object-version CL flag. Previously, the CL flag took precedence over the IR flag. Now the IR flag/asm directive take precedence over the CL flag. This is implemented by merging a few COV-checking functions in AMDGPUBaseInfo.h.	2024-01-21 11:54:47 -05:00
Jay Foad	1abf2570b3	[AMDGPU] Make use of CPol::SWZ_* in SelectionDAG. NFC. For GlobalISel this was already done in AMDGPUInstructionSelector::selectBufferLoadLds.	2024-01-19 15:48:45 +00:00
Jay Foad	7017efa1a1	Fix typo "widended"	2024-01-19 13:50:26 +00:00
Mariusz Sikora	3e6589f21c	[AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917 ) - image_atomic_pk_add_f16 - image_atomic_pk_add_bf16 - ds_pk_add_bf16 - ds_pk_add_f16 - ds_pk_add_rtn_bf16 - ds_pk_add_rtn_f16 - flat_atomic_pk_add_f16 - flat_atomic_pk_add_bf16 - global_atomic_pk_add_f16 - global_atomic_pk_add_bf16 - buffer_atomic_pk_add_f16 - buffer_atomic_pk_add_bf16	2024-01-18 14:01:09 +01:00
Mariusz Sikora	c99da46fc1	[AMDGPU][GFX12] Add Atomic cond_sub_u32 (#76224 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2024-01-17 19:23:42 +01:00
Matt Arsenault	af4f1766ae	AMDGPU: Allocate special SGPRs before user SGPR arguments (#78234 )	2024-01-17 21:41:50 +07:00
Jay Foad	a3fc0f9d2b	[AMDGPU] Add comments on SITargetLowering::widenLoad	2024-01-17 10:40:24 +00:00
Jay Foad	4a77414660	[AMDGPU] CodeGen for GFX12 8/16-bit SMEM loads (#77633 )	2024-01-17 10:28:03 +00:00
Kazu Hirata	7528cf5ef2	[Target] Use getConstantOperandVal (NFC)	2024-01-14 00:53:29 -08:00

1 2 3 4 5 ...

1333 Commits