8857 Commits

Author SHA1 Message Date
Changpeng Fang
84927a6728
AMDGPU: Simplify instruction definitions for global_load_tr_b64(b128) (#83601)
WaveSizePredicate is copied from pseudo to real
2024-03-01 10:03:54 -08:00
Jay Foad
53f89a0bb7
[AMDGPU] Remove AtomicNoRet class and getAtomicNoRetOp table (#83593) 2024-03-01 17:18:55 +00:00
Martin Wehking
92fe6c61f9
Silence illegal address computation warning (#83244)
Add an assertion before an access of ValMappings to ensure that it is
within the array bounds.
Silence a static analyzer warning through this.
2024-03-01 21:41:12 +05:30
Martin Wehking
dfec4ef1a2
Use object directly instead of accessing ArrayRef (#83263)
Use RegOp directly inside debug code to silence a static analyzer that
warns about accessing it through its ArrayRef wrapper.
2024-03-01 19:15:42 +05:30
Pierre van Houtryve
756166e342
[AMDGPU] Improve detection of non-null addrspacecast operands (#82311)
Use IR analysis to infer when an addrspacecast operand is nonnull, then
lower it to an intrinsic that the DAG can use to skip the null check.

I did this using an intrinsic as it's non-intrusive. An alternative
would have been to allow something like `!nonnull` on `addrspacecast`
then lower that to a custom opcode (or add an operand to the
addrspacecast MIR/DAG opcodes), but it's a lot of boilerplate for just
one target's use case IMO.

I'm hoping that when we switch to GISel that we can move all this logic
to the MIR level without losing info, but currently the DAG doesn't see
enough so we need to act in CGP.

Fixes: SWDEV-316445
2024-03-01 14:01:10 +01:00
Jay Foad
4c8c335bcd
[AMDGPU] Rename hasGFX12Enc to hasRestrictedSOffset in BUF definitions. NFC. (#83434)
This just renames a tablegen argument to match the corresponding
subtarget feature.
2024-03-01 10:14:37 +00:00
Nick Anderson
ba8e9ace13
[AMDGPU] promote i1 arg type for amdgpu_cs (#82971)
fixes #68087 
Not sure where to put regression tests for this pr? Also, should i1 args
not in reg also be promoted?
2024-03-01 14:25:46 +05:30
Leon Clark
5b07fd4799
[AMDGPU] Fix OpenCL conformance test failures for ctlz. (#83170)
Remove LSH transform and restore previous lowering.

Fixes conformance issue in
[77615](https://github.com/llvm/llvm-project/pull/77615) where OpenCL
integer_ops tests fail for integer_clz.

Co-authored-by: Leon Clark <leoclark@amd.com>
2024-02-29 22:28:13 +00:00
Petar Avramovic
0d572c41f9
AMDGPU\GlobalISel: remove amdgpu-global-isel-risky-select flag (#83426)
AMDGPUInstructionSelector should no longer attempt to select S1 G_PHIs.
Remove MIR test that attempts to inst-select divergent vcc(S1) G_PHI.
Lane mask merging algorithm for GlobalISel is now responsible for
selecting divergent S1 G_PHIs in AMDGPUGlobalISelDivergenceLowering.
Uniform S1 G_PHIs should be lowered to S32 G_PHIs in reg bank select
pass. In summary S1 G_PHIs should not reach AMDGPUInstructionSelector.
2024-02-29 15:38:54 +01:00
Petar Avramovic
6c2eec5cea
AMDGPU/GlobalISel: lane masks merging (#73337)
Basic implementation of lane mask merging for GlobalISel.
Lane masks on GlobalISel are registers with sgpr register class
and S1 LLT - required by machine uniformity analysis.
Implements equivalent of lowerPhis from SILowerI1Copies.cpp in:
patch 1: https://github.com/llvm/llvm-project/pull/75340
patch 2: https://github.com/llvm/llvm-project/pull/75349
patch 3: https://github.com/llvm/llvm-project/pull/80003
patch 4: https://github.com/llvm/llvm-project/pull/78431
patch 5: is in this commit:

AMDGPU/GlobalISelDivergenceLowering: constrain incoming registers

Previously, in PHIs that represent lane masks, incoming registers
taken as-is were not selected as lane masks. Such registers are not
being merged with another lane mask and most often only have S1 LLT.
Implement constrainAsLaneMask by constraining incoming registers
taken as-is with lane mask attributes, essentially transforming them
to lane masks. This is final step in having PHI instructions created
in this pass to be fully instruction-selected.
2024-02-29 13:57:59 +01:00
Jay Foad
20fe83bc85
[AMDGPU] Add new aliases ds_subrev_rtn_u32/u64 for ds_rsub_rtn_u32/u64 (#83408)
Following on from #83118, this adds aliases for the "rtn" forms of these
instructions. The fact that they were missing from SP3 was an oversight
which has been fixed now.
2024-02-29 12:02:06 +00:00
Jay Foad
02bad7a858 [AMDGPU] Simplify !if condition. NFC. 2024-02-29 11:45:20 +00:00
Matt Arsenault
c757ca7417 AMDGPU: Remove dead declaration 2024-02-29 14:40:11 +05:30
Changpeng Fang
0fe4b9dae8
AMDGPU: Copy a few Predicates from Pseudo to Real (#83365)
WaveSizePredicate for DS_Reaf and FLAT_Real
  OtherPredicates for MTBUF_Real
2024-02-28 17:24:23 -08:00
Stanislav Mekhanoshin
d7b73c8d01
[AMDGPU] Copy WaveSizePredicate into VOP3_Real. NFCI. (#83352) 2024-02-28 15:42:31 -08:00
Petar Avramovic
3e35ba53e2
AMDGPU/GFX12: Insert waitcnts before stores with scope_sys (#82996)
Insert waitcnts for loads and atomics before stores with system scope.
Scope is field in instruction encoding and corresponds to desired
coherence level in cache hierarchy.
Intrinsic stores can set scope in cache policy operand.
If volatile keyword is used on generic stores memory legalizer will set
scope to system. Generic stores, by default, get lowest scope level.
Waitcnts are not required if it is guaranteed that memory is cached.
For example vulkan shaders can guarantee this.
TODO: implement flag for frontends to give us a hint not to insert
waits.
Expecting vulkan flag to be implemented as vulkan:private MMRA.
2024-02-28 16:18:04 +01:00
Ivan Kosarev
680c780a36
[AMDGPU][AsmParser] Support structured HWREG operands. (#82805)
Symbolic values are to be supported separately.
2024-02-28 14:44:34 +00:00
Valery Pykhtin
a845ea3878
[AMDGPU] Fix SDWA 'preserve' transformation for instructions in different basic blocks. (#82406)
This fixes crash when operand sources for V_OR instruction reside in
different basic blocks.
2024-02-28 14:47:33 +01:00
Jeffrey Byrnes
cf1c97b2d2
[AMDGPU] Do not attempt to fallback to default mutations (#83208)
IGLP itself will be in SavedMutations via mutations added during
Scheduler creation, thus falling back results in reapplying IGLP.

In PostRA scheduling, if we have multiple regions with IGLP
instructions, then we may have infinite loop.

Disable the feature for now.
2024-02-27 18:04:59 -08:00
choikwa
04db60d150
[AMDGPU] Prevent hang in SIFoldOperands by caching uses (#82099)
foldOperands() for REG_SEQUENCE has recursion that can trigger an infinite loop
as the method can modify the operand order, which messes up the range-based
for loop. This patch fixes the issue by caching the uses for processing beforehand,
and then iterating over the cache rather using the instruction iterator.
2024-02-27 09:13:59 -06:00
Jay Foad
ca0560d8c8
[AMDGPU] Add new aliases ds_subrev_u32/u64 for ds_rsub_u32/u64 (#83118)
Note that the instructions have not been renamed and that there are no
corresponding aliases for ds_rsub_rtn_u32/u64. This matches SP3
behavior.
2024-02-27 10:58:20 +00:00
Jeffrey Byrnes
113052b2b0 [AMDGPU] Prefer lower total register usage in regions with spilling
Change-Id: Ia5c434b0945bdcbc357c5e06c3164118fc91df25
2024-02-26 12:19:52 -08:00
Changpeng Fang
9de78c4e24
AMDGPU: Simplify FP8 conversion definitions. NFC. (#83043)
Reals should inherit predicates from the corresponding Pseudo.
2024-02-26 10:13:40 -08:00
Francesco Petrogalli
969d7ecf0b
[llvm][CodeGen] Add ValueType v3i1. [NFCI] (#82338) 2024-02-26 16:01:52 +01:00
Jay Foad
83feb84648
[AMDGPU] Reduce duplication in DS Real instruction definitions. NFC. (#83007)
For renamed instructions, there is no need to mention the new name twice
on every line defining a Real.
2024-02-26 14:29:42 +00:00
Jay Foad
d41615e91a [AMDGPU] Rename a DS class template argument. NFC.
The name hasGDS better reflects what it is used for.
2024-02-26 13:33:28 +00:00
Jay Foad
60e7ae3f30
[AMDGPU] Only try DecoderTables for the current subtarget. NFCI. (#82992)
Speed up disassembly by only calling tryDecodeInst for DecoderTables
that make sense for the current subtarget.

This gives a 1.3x speed-up on check-llvm-mc-disassembler-amdgpu in my
Release+Asserts build.
2024-02-26 13:02:08 +00:00
Rishabh Bali
fe42e72db2
[CodeGen] Port AtomicExpand to new Pass Manager (#71220)
Port the `atomicexpand` pass to the new Pass Manager. 
Fixes #64559
2024-02-25 18:42:22 +05:30
Martin Wehking
4bf06c16fc
Initialize unsigned integer when declared (#81894)
Initialize ModOpcode directly before the loop execution to silence
static analyzer warnings about the usage of an uninitialized variable.

This leads to a redundant assignment of ElV2F16 inside the first loop
execution, but also avoids superfluous emptiness checks of EltsV2F16
after the first execution of the loop.
2024-02-25 18:26:12 +05:30
Shilei Tian
bfcf7a0707
[AMDGPU] Remove hasAtomicFaddRtnForTy as it is not used anywhere (#82841) 2024-02-23 21:14:38 -05:00
Jeffrey Byrnes
8f2bd8ae68
[AMDGPU] Introduce iglp_opt(2): Generalized exp/mfma interleaving for select kernels (#81342)
This implements the basic pipelining structure of exp/mfma interleaving
for better extensibility. While it does have improved extensibility,
there are controls which only enable it for DAGs with certain
characteristics (matching the DAGs it has been designed against).
2024-02-23 17:13:20 -08:00
Jay Foad
42f6f95e08
[AMDGPU] Simplify AMDGPUDisassembler::getInstruction by removing Res. (#82775)
Remove all the code that set and tested Res. Change all convert*
functions to return void since none of them can fail. getInstruction
only has one main point of failure, after all calls to tryDecodeInst
have failed.
2024-02-23 18:44:02 +00:00
Ivan Kosarev
dfa1d9b027
[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772)
These are hoped to provide more convenient and less error prone
facilities to encode and decode fields than manually defined constants
and functions.
2024-02-23 17:34:55 +00:00
Stanislav Mekhanoshin
3dfca24dda
[AMDGPU] Fix encoding of VOP3P dpp on GFX11 and GFX12 (#82710)
The bug affects dpp forms of v_dot2_f32_f16. The encoding does not match
SP3 and does not set op_sel_hi bits properly.
2024-02-23 03:50:00 -08:00
vangthao95
f37c6d55c6
[AMDGPU][NFC] Refactor SIInsertWaitcnts zero waitcnt generation (#82575)
Move the allZero* waitcnt generation methods into WaitcntGenerator
class.
2024-02-22 15:55:26 -08:00
Jay Foad
3b7d43301e
[AMDGPU] Remove DPP DecoderNamespaces. NFC. (#82491)
Now that there is no special checking for valid DPP encodings, these
instructions can use the same DecoderNamespace as other 64- or 96-bit
instructions.

Also clean up setting DecoderNamespace: in most cases it should be set
as a pair with AssemblerPredicate.
2024-02-22 11:18:18 +00:00
Jay Foad
b9ce237980
[AMDGPU] Clean up conversion of DPP instructions in AMDGPUDisassembler (#82480)
Convert DPP instructions after all calls to tryDecodeInst, just like we
do for all other instruction types. NFCI.
2024-02-22 10:39:43 +00:00
Jay Foad
bcbffd99c4
[AMDGPU] Split Dpp8FI and Dpp16FI operands (#82379)
Split Dpp8FI and Dpp16FI into two different operands sharing an
AsmOperandClass. They are parsed and rendered identically as fi:1 but
the encoding is different: for DPP16 FI is a single bit, but for DPP8 it
uses two different special values in the src0 field. Having a dedicated
decoder for Dpp8FI allows it to reject other (non-special) src0 values
so that AMDGPUDisassembler::getInstruction no longer needs to call
isValidDPP8 to do post hoc validation of decoded DPP8 instructions.
2024-02-22 09:40:46 +00:00
Nick Anderson
8bd327d6fe
[AMDGPU][GlobalISel] Add fdiv / sqrt to rsq combine (#78673)
Fixes #64743
2024-02-22 09:47:36 +01:00
Nick Anderson
c5bbf979ad
[AMDGPU] fixes mistake in #82018 (#82223)
fixes #81766 #82018
2024-02-21 13:12:03 -05:00
Ivan Kosarev
6d160a49c2
[AMDGPU][TableGen][NFC] Combine predicates without using classes. (#82346)
Saves generating ~1200 instances of the PredConcat TableGen class.

Also removes the default predicates from resulting predicate lists.
2024-02-21 11:45:36 +00:00
Sameer Sahasrabuddhe
a2afcd5721 Revert "Implement convergence control in MIR using SelectionDAG (#71785)"
This reverts commit 79889734b940356ab3381423c93ae06f22e772c9.

Encountered multiple buildbot failures.
2024-02-21 11:07:02 +05:30
Jie Fu
086280f4d1 [AMDGPU] Fix linking error of SIISelLowering.cpp.o (NFC)
ld.lld: error: undefined symbol: llvm::MachineOperand::dump() const
>>> referenced by SIISelLowering.cpp
2024-02-21 13:07:34 +08:00
Sameer Sahasrabuddhe
79889734b9
Implement convergence control in MIR using SelectionDAG (#71785)
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-02-21 10:06:37 +05:30
Stanislav Mekhanoshin
98db8d0cb7
[AMDGPU] Fix v_dot2_f16_f16/v_dot2_bf16_bf16 operands (#82423)
src0 and src1 are packed f16/bf16, we are printing literals like
0x40002000, but we cannot parse it.
2024-02-20 16:34:40 -08:00
Changpeng Fang
d3fcf31031
AMDGPU: Use HasFP8ConversionInsts appropriately, NFC (#82433)
The corresponding fp8 conversion instructions are available for a
subtarget when and only when the subtarget "HasFP8ConversionInsts". We
should not assume all the future subtargets (gfx12+) have
FP8ConversionInsts.
  In this patch, we use OtherPredicates to carry HasFP8ConversionInsts
feature. This is because SubtargetPredicate is not copied from pseudos
to reals for DPP16 and DPP6. To avoid overriding OtherPredicates in a
few places, we use the newly introduced True16Predicate to hold
UseRealTrue16Insts instead.
 This work repalces the inadvertently closed pull request:
https://github.com/llvm/llvm-project/pull/82024
2024-02-20 16:03:54 -08:00
Stanislav Mekhanoshin
39cab1a0a0
[AMDGPU] Add v2bf16 for opsel immediate folding (#82435)
This was previously enabled since v2bf16 was represented by v2f16. As of
now it is NFC since we only have dot instructions which could use it,
but currently folding is guarded by the hasDOTOpSelHazard().
2024-02-20 14:55:44 -08:00
Shilei Tian
2ad43fa467
[AMDGPU] Fix operand types for V_DOT2_F32_BF16 (#82044) 2024-02-20 08:25:01 -05:00
Jay Foad
ddba6b271c
[AMDGPU] Stop using SDWA DecoderNamespaces. NFCI. (#82233)
64-bit SDWA encodings have to be checked first because their first 32
bits are a special case of the corresponding 32-bit non-SDWA encoding of
the same instruction. But all 64-bit encodings are checked first, so we
don't need special handling for SDWA.
2024-02-20 12:58:07 +00:00
Jay Foad
a4d4615771
[AMDGPU] Try decoding instructions longest first. NFCI. (#82014)
AMDGPUDisassembler::getInstruction tries decoding instructions using
different DecoderTables in a confusing order: first 96-bit instructions,
then some 64-bit, then 32-bit, then some more 64-bit.

This patch changes it to always try longer encodings first. The
motivation is to make getInstruction easier to understand, and to pave
the way for combining some 64-bit tables that do not need to be
separate.
2024-02-20 12:09:21 +00:00