llvm-project

Author	SHA1	Message	Date
Changpeng Fang	84927a6728	AMDGPU: Simplify instruction definitions for global_load_tr_b64(b128) (#83601 ) WaveSizePredicate is copied from pseudo to real	2024-03-01 10:03:54 -08:00
Jay Foad	53f89a0bb7	[AMDGPU] Remove AtomicNoRet class and getAtomicNoRetOp table (#83593 )	2024-03-01 17:18:55 +00:00
Martin Wehking	92fe6c61f9	Silence illegal address computation warning (#83244 ) Add an assertion before an access of ValMappings to ensure that it is within the array bounds. Silence a static analyzer warning through this.	2024-03-01 21:41:12 +05:30
Martin Wehking	dfec4ef1a2	Use object directly instead of accessing ArrayRef (#83263 ) Use RegOp directly inside debug code to silence a static analyzer that warns about accessing it through its ArrayRef wrapper.	2024-03-01 19:15:42 +05:30
Pierre van Houtryve	756166e342	[AMDGPU] Improve detection of non-null addrspacecast operands (#82311 ) Use IR analysis to infer when an addrspacecast operand is nonnull, then lower it to an intrinsic that the DAG can use to skip the null check. I did this using an intrinsic as it's non-intrusive. An alternative would have been to allow something like `!nonnull` on `addrspacecast` then lower that to a custom opcode (or add an operand to the addrspacecast MIR/DAG opcodes), but it's a lot of boilerplate for just one target's use case IMO. I'm hoping that when we switch to GISel that we can move all this logic to the MIR level without losing info, but currently the DAG doesn't see enough so we need to act in CGP. Fixes: SWDEV-316445	2024-03-01 14:01:10 +01:00
Jay Foad	4c8c335bcd	[AMDGPU] Rename hasGFX12Enc to hasRestrictedSOffset in BUF definitions. NFC. (#83434 ) This just renames a tablegen argument to match the corresponding subtarget feature.	2024-03-01 10:14:37 +00:00
Nick Anderson	ba8e9ace13	[AMDGPU] promote i1 arg type for amdgpu_cs (#82971 ) fixes #68087 Not sure where to put regression tests for this pr? Also, should i1 args not in reg also be promoted?	2024-03-01 14:25:46 +05:30
Leon Clark	5b07fd4799	[AMDGPU] Fix OpenCL conformance test failures for ctlz. (#83170 ) Remove LSH transform and restore previous lowering. Fixes conformance issue in [77615](https://github.com/llvm/llvm-project/pull/77615) where OpenCL integer_ops tests fail for integer_clz. Co-authored-by: Leon Clark <leoclark@amd.com>	2024-02-29 22:28:13 +00:00
Petar Avramovic	0d572c41f9	AMDGPU\GlobalISel: remove amdgpu-global-isel-risky-select flag (#83426 ) AMDGPUInstructionSelector should no longer attempt to select S1 G_PHIs. Remove MIR test that attempts to inst-select divergent vcc(S1) G_PHI. Lane mask merging algorithm for GlobalISel is now responsible for selecting divergent S1 G_PHIs in AMDGPUGlobalISelDivergenceLowering. Uniform S1 G_PHIs should be lowered to S32 G_PHIs in reg bank select pass. In summary S1 G_PHIs should not reach AMDGPUInstructionSelector.	2024-02-29 15:38:54 +01:00
Petar Avramovic	6c2eec5cea	AMDGPU/GlobalISel: lane masks merging (#73337 ) Basic implementation of lane mask merging for GlobalISel. Lane masks on GlobalISel are registers with sgpr register class and S1 LLT - required by machine uniformity analysis. Implements equivalent of lowerPhis from SILowerI1Copies.cpp in: patch 1: https://github.com/llvm/llvm-project/pull/75340 patch 2: https://github.com/llvm/llvm-project/pull/75349 patch 3: https://github.com/llvm/llvm-project/pull/80003 patch 4: https://github.com/llvm/llvm-project/pull/78431 patch 5: is in this commit: AMDGPU/GlobalISelDivergenceLowering: constrain incoming registers Previously, in PHIs that represent lane masks, incoming registers taken as-is were not selected as lane masks. Such registers are not being merged with another lane mask and most often only have S1 LLT. Implement constrainAsLaneMask by constraining incoming registers taken as-is with lane mask attributes, essentially transforming them to lane masks. This is final step in having PHI instructions created in this pass to be fully instruction-selected.	2024-02-29 13:57:59 +01:00
Jay Foad	20fe83bc85	[AMDGPU] Add new aliases ds_subrev_rtn_u32/u64 for ds_rsub_rtn_u32/u64 (#83408 ) Following on from #83118, this adds aliases for the "rtn" forms of these instructions. The fact that they were missing from SP3 was an oversight which has been fixed now.	2024-02-29 12:02:06 +00:00
Jay Foad	02bad7a858	[AMDGPU] Simplify !if condition. NFC.	2024-02-29 11:45:20 +00:00
Matt Arsenault	c757ca7417	AMDGPU: Remove dead declaration	2024-02-29 14:40:11 +05:30
Changpeng Fang	0fe4b9dae8	AMDGPU: Copy a few Predicates from Pseudo to Real (#83365 ) WaveSizePredicate for DS_Reaf and FLAT_Real OtherPredicates for MTBUF_Real	2024-02-28 17:24:23 -08:00
Stanislav Mekhanoshin	d7b73c8d01	[AMDGPU] Copy WaveSizePredicate into VOP3_Real. NFCI. (#83352 )	2024-02-28 15:42:31 -08:00
Petar Avramovic	3e35ba53e2	AMDGPU/GFX12: Insert waitcnts before stores with scope_sys (#82996 ) Insert waitcnts for loads and atomics before stores with system scope. Scope is field in instruction encoding and corresponds to desired coherence level in cache hierarchy. Intrinsic stores can set scope in cache policy operand. If volatile keyword is used on generic stores memory legalizer will set scope to system. Generic stores, by default, get lowest scope level. Waitcnts are not required if it is guaranteed that memory is cached. For example vulkan shaders can guarantee this. TODO: implement flag for frontends to give us a hint not to insert waits. Expecting vulkan flag to be implemented as vulkan:private MMRA.	2024-02-28 16:18:04 +01:00
Ivan Kosarev	680c780a36	[AMDGPU][AsmParser] Support structured HWREG operands. (#82805 ) Symbolic values are to be supported separately.	2024-02-28 14:44:34 +00:00
Valery Pykhtin	a845ea3878	[AMDGPU] Fix SDWA 'preserve' transformation for instructions in different basic blocks. (#82406 ) This fixes crash when operand sources for V_OR instruction reside in different basic blocks.	2024-02-28 14:47:33 +01:00
Jeffrey Byrnes	cf1c97b2d2	[AMDGPU] Do not attempt to fallback to default mutations (#83208 ) IGLP itself will be in SavedMutations via mutations added during Scheduler creation, thus falling back results in reapplying IGLP. In PostRA scheduling, if we have multiple regions with IGLP instructions, then we may have infinite loop. Disable the feature for now.	2024-02-27 18:04:59 -08:00
choikwa	04db60d150	[AMDGPU] Prevent hang in SIFoldOperands by caching uses (#82099 ) foldOperands() for REG_SEQUENCE has recursion that can trigger an infinite loop as the method can modify the operand order, which messes up the range-based for loop. This patch fixes the issue by caching the uses for processing beforehand, and then iterating over the cache rather using the instruction iterator.	2024-02-27 09:13:59 -06:00
Jay Foad	ca0560d8c8	[AMDGPU] Add new aliases ds_subrev_u32/u64 for ds_rsub_u32/u64 (#83118 ) Note that the instructions have not been renamed and that there are no corresponding aliases for ds_rsub_rtn_u32/u64. This matches SP3 behavior.	2024-02-27 10:58:20 +00:00
Jeffrey Byrnes	113052b2b0	[AMDGPU] Prefer lower total register usage in regions with spilling Change-Id: Ia5c434b0945bdcbc357c5e06c3164118fc91df25	2024-02-26 12:19:52 -08:00
Changpeng Fang	9de78c4e24	AMDGPU: Simplify FP8 conversion definitions. NFC. (#83043 ) Reals should inherit predicates from the corresponding Pseudo.	2024-02-26 10:13:40 -08:00
Francesco Petrogalli	969d7ecf0b	[llvm][CodeGen] Add ValueType v3i1. [NFCI] (#82338 )	2024-02-26 16:01:52 +01:00
Jay Foad	83feb84648	[AMDGPU] Reduce duplication in DS Real instruction definitions. NFC. (#83007 ) For renamed instructions, there is no need to mention the new name twice on every line defining a Real.	2024-02-26 14:29:42 +00:00
Jay Foad	d41615e91a	[AMDGPU] Rename a DS class template argument. NFC. The name hasGDS better reflects what it is used for.	2024-02-26 13:33:28 +00:00
Jay Foad	60e7ae3f30	[AMDGPU] Only try DecoderTables for the current subtarget. NFCI. (#82992 ) Speed up disassembly by only calling tryDecodeInst for DecoderTables that make sense for the current subtarget. This gives a 1.3x speed-up on check-llvm-mc-disassembler-amdgpu in my Release+Asserts build.	2024-02-26 13:02:08 +00:00
Rishabh Bali	fe42e72db2	[CodeGen] Port AtomicExpand to new Pass Manager (#71220 ) Port the `atomicexpand` pass to the new Pass Manager. Fixes #64559	2024-02-25 18:42:22 +05:30
Martin Wehking	4bf06c16fc	Initialize unsigned integer when declared (#81894 ) Initialize ModOpcode directly before the loop execution to silence static analyzer warnings about the usage of an uninitialized variable. This leads to a redundant assignment of ElV2F16 inside the first loop execution, but also avoids superfluous emptiness checks of EltsV2F16 after the first execution of the loop.	2024-02-25 18:26:12 +05:30
Shilei Tian	bfcf7a0707	[AMDGPU] Remove `hasAtomicFaddRtnForTy` as it is not used anywhere (#82841 )	2024-02-23 21:14:38 -05:00
Jeffrey Byrnes	8f2bd8ae68	[AMDGPU] Introduce iglp_opt(2): Generalized exp/mfma interleaving for select kernels (#81342 ) This implements the basic pipelining structure of exp/mfma interleaving for better extensibility. While it does have improved extensibility, there are controls which only enable it for DAGs with certain characteristics (matching the DAGs it has been designed against).	2024-02-23 17:13:20 -08:00
Jay Foad	42f6f95e08	[AMDGPU] Simplify AMDGPUDisassembler::getInstruction by removing Res. (#82775 ) Remove all the code that set and tested Res. Change all convert* functions to return void since none of them can fail. getInstruction only has one main point of failure, after all calls to tryDecodeInst have failed.	2024-02-23 18:44:02 +00:00
Ivan Kosarev	dfa1d9b027	[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772 ) These are hoped to provide more convenient and less error prone facilities to encode and decode fields than manually defined constants and functions.	2024-02-23 17:34:55 +00:00
Stanislav Mekhanoshin	3dfca24dda	[AMDGPU] Fix encoding of VOP3P dpp on GFX11 and GFX12 (#82710 ) The bug affects dpp forms of v_dot2_f32_f16. The encoding does not match SP3 and does not set op_sel_hi bits properly.	2024-02-23 03:50:00 -08:00
vangthao95	f37c6d55c6	[AMDGPU][NFC] Refactor SIInsertWaitcnts zero waitcnt generation (#82575 ) Move the allZero* waitcnt generation methods into WaitcntGenerator class.	2024-02-22 15:55:26 -08:00
Jay Foad	3b7d43301e	[AMDGPU] Remove DPP DecoderNamespaces. NFC. (#82491 ) Now that there is no special checking for valid DPP encodings, these instructions can use the same DecoderNamespace as other 64- or 96-bit instructions. Also clean up setting DecoderNamespace: in most cases it should be set as a pair with AssemblerPredicate.	2024-02-22 11:18:18 +00:00
Jay Foad	b9ce237980	[AMDGPU] Clean up conversion of DPP instructions in AMDGPUDisassembler (#82480 ) Convert DPP instructions after all calls to tryDecodeInst, just like we do for all other instruction types. NFCI.	2024-02-22 10:39:43 +00:00
Jay Foad	bcbffd99c4	[AMDGPU] Split Dpp8FI and Dpp16FI operands (#82379 ) Split Dpp8FI and Dpp16FI into two different operands sharing an AsmOperandClass. They are parsed and rendered identically as fi:1 but the encoding is different: for DPP16 FI is a single bit, but for DPP8 it uses two different special values in the src0 field. Having a dedicated decoder for Dpp8FI allows it to reject other (non-special) src0 values so that AMDGPUDisassembler::getInstruction no longer needs to call isValidDPP8 to do post hoc validation of decoded DPP8 instructions.	2024-02-22 09:40:46 +00:00
Nick Anderson	8bd327d6fe	[AMDGPU][GlobalISel] Add fdiv / sqrt to rsq combine (#78673 ) Fixes #64743	2024-02-22 09:47:36 +01:00
Nick Anderson	c5bbf979ad	[AMDGPU] fixes mistake in #82018 (#82223 ) fixes #81766 #82018	2024-02-21 13:12:03 -05:00
Ivan Kosarev	6d160a49c2	[AMDGPU][TableGen][NFC] Combine predicates without using classes. (#82346 ) Saves generating ~1200 instances of the PredConcat TableGen class. Also removes the default predicates from resulting predicate lists.	2024-02-21 11:45:36 +00:00
Sameer Sahasrabuddhe	a2afcd5721	Revert "Implement convergence control in MIR using SelectionDAG (#71785 )" This reverts commit 79889734b940356ab3381423c93ae06f22e772c9. Encountered multiple buildbot failures.	2024-02-21 11:07:02 +05:30
Jie Fu	086280f4d1	[AMDGPU] Fix linking error of SIISelLowering.cpp.o (NFC) ld.lld: error: undefined symbol: llvm::MachineOperand::dump() const >>> referenced by SIISelLowering.cpp	2024-02-21 13:07:34 +08:00
Sameer Sahasrabuddhe	79889734b9	Implement convergence control in MIR using SelectionDAG (#71785 ) LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-02-21 10:06:37 +05:30
Stanislav Mekhanoshin	98db8d0cb7	[AMDGPU] Fix v_dot2_f16_f16/v_dot2_bf16_bf16 operands (#82423 ) src0 and src1 are packed f16/bf16, we are printing literals like 0x40002000, but we cannot parse it.	2024-02-20 16:34:40 -08:00
Changpeng Fang	d3fcf31031	AMDGPU: Use HasFP8ConversionInsts appropriately, NFC (#82433 ) The corresponding fp8 conversion instructions are available for a subtarget when and only when the subtarget "HasFP8ConversionInsts". We should not assume all the future subtargets (gfx12+) have FP8ConversionInsts. In this patch, we use OtherPredicates to carry HasFP8ConversionInsts feature. This is because SubtargetPredicate is not copied from pseudos to reals for DPP16 and DPP6. To avoid overriding OtherPredicates in a few places, we use the newly introduced True16Predicate to hold UseRealTrue16Insts instead. This work repalces the inadvertently closed pull request: https://github.com/llvm/llvm-project/pull/82024	2024-02-20 16:03:54 -08:00
Stanislav Mekhanoshin	39cab1a0a0	[AMDGPU] Add v2bf16 for opsel immediate folding (#82435 ) This was previously enabled since v2bf16 was represented by v2f16. As of now it is NFC since we only have dot instructions which could use it, but currently folding is guarded by the hasDOTOpSelHazard().	2024-02-20 14:55:44 -08:00
Shilei Tian	2ad43fa467	[AMDGPU] Fix operand types for `V_DOT2_F32_BF16` (#82044 )	2024-02-20 08:25:01 -05:00
Jay Foad	ddba6b271c	[AMDGPU] Stop using SDWA DecoderNamespaces. NFCI. (#82233 ) 64-bit SDWA encodings have to be checked first because their first 32 bits are a special case of the corresponding 32-bit non-SDWA encoding of the same instruction. But all 64-bit encodings are checked first, so we don't need special handling for SDWA.	2024-02-20 12:58:07 +00:00
Jay Foad	a4d4615771	[AMDGPU] Try decoding instructions longest first. NFCI. (#82014 ) AMDGPUDisassembler::getInstruction tries decoding instructions using different DecoderTables in a confusing order: first 96-bit instructions, then some 64-bit, then 32-bit, then some more 64-bit. This patch changes it to always try longer encodings first. The motivation is to make getInstruction easier to understand, and to pave the way for combining some 64-bit tables that do not need to be separate.	2024-02-20 12:09:21 +00:00

1 2 3 4 5 ...

8857 Commits