123 Commits

Author SHA1 Message Date
Brox Chen
ce831a231a
[AMDGPU][True16][MC] true16 for v_fma_f16 (#119477)
Support true16 format for v_fma_f16 in MC.

Since we are replacing v_fma_f16 to v_fma_f16_t16/v_fma_f16_fake16 in
Post-GFX11, have to update the CodeGen pattern for v_fma_f16_fake16 to
get CodeGen test passing. There is no pattern modified/created, but just
replacing the v_fma_f16 with fake16 format.
2025-01-06 15:02:04 -05:00
Brox Chen
e8644e3b47
[AMDGPU][True16][MC] VOP2 update instructions with fake16 format (#114436)
Some old "t16" VOP2 instructions are actually in fake16 format. Correct
and update test file
2024-11-05 16:12:49 -05:00
Nikita Popov
255a99c29f
[APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (#80309)
This fixes all the places that hit the new assertion added in
https://github.com/llvm/llvm-project/pull/106524 in tests. That is,
cases where the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.

The fixes either set the correct value for isSigned, set the
implicitTrunc flag, or perform more calculations inside APInt.

Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
2024-10-17 08:48:08 +02:00
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Christudasan Devadasan
042104985c
[AMDGPU][NewPM] Port SIShrinkInstructions to new pass manager. (#106967) 2024-09-03 10:52:50 +05:30
Brox Chen
afd42fb303
[AMDGPU][True16][CodeGen] Support AND/OR/XOR and LDEXP True16 format (#102620)
Support AND/OR/XOR true16 and LDEXP true/fake16 format.

These instructions are previously implemented with fake16 profile.
Fixing the implementation.

Added a RA hint so that when using 16bit register in a 32bit
instruction, try to use the register directly without an extra 16bit
move

---------

Co-authored-by: guochen2 <guochen2@amd.com>
2024-08-13 12:23:39 -04:00
Brox Chen
6b7afaa9db
[AMDGPU][True16] fix a bug in codeGen causing e64 with wrong vgpr type to shrink (#102942)
This bug is introduced in
https://github.com/llvm/llvm-project/pull/102198

The previous path change to use realTrue16 flag, however, we have some
t16 instructions that are implemented with fake16, and has Lo128
registers types. Thus we should still using hasTrue16Bit flag for
shrinking check

---------

Co-authored-by: guochen2 <guochen2@amd.com>
2024-08-12 17:03:05 -04:00
Brox Chen
ae059a1f9f
[AMDGPU][True16][CodeGen] support v_mov_b16 and v_swap_b16 in true16 format (#102198)
support v_swap_b16 in true16 format.
update tableGen pattern and folding for v_mov_b16.

---------

Co-authored-by: guochen2 <guochen2@amd.com>
2024-08-08 16:52:59 -04:00
Matt Arsenault
73a2232720
AMDGPU: Materialize bitwise not of inline immediates (#95960)
If we have a bitwise negated inline immediate, we can materialize
it with s_not_b32/v_not_b32. This mirrors the current bitreverse
handling.
    
As a side effect, we also now handle the bitreversed FP immediate
case.
    
One test shows some VOPD regressions on gfx11 which should
probably be fixed. Previously the 2 v_mov_b32 could be packed,
but now the mismatched opcode + mov can't. This problem already
already existed for the bfrev case, it just happens more often now.
2024-06-22 00:40:59 +02:00
Xu Zhang
f6d431f208
[CodeGen] Make the parameter TRI required in some functions. (#85968)
Fixes #82659

There are some functions, such as `findRegisterDefOperandIdx` and  `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI  parameters, as shown in issue #82411.

Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`,  `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.

After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
2024-04-24 14:24:14 +01:00
Konstantin Zhuravlyov
fcef407aa2
AMDGPU/NFC: Remove some bits from TSFlags (#81525)
- AMDGPU/NFC: Purge SOPK_ZEXT from TSFlags
  - Moved to helper function in SIInstInfo
- AMDGPU/NFC: Purge VOPAsmPrefer32Bit from TSFlags
  - This flag did not make sense / remnants of something else I think
2024-02-12 16:43:48 -05:00
Jay Foad
c5a068a196
[AMDGPU] Remove s_cmpk_* for GFX12 (#75497)
No GFX12 encoding was added for these. This patch adds tests that they
are not recognized by the assembler and defends against generating them
in codegen.
2023-12-14 21:10:53 +00:00
Stanislav Mekhanoshin
4602802240
[AMDGPU] Shrink to SOPK with 32-bit signed literals (#70263)
A literal like 0xffff8000 is valid to be used as KIMM in a SOPK
instruction, but at the moment our checks expect it to be fully sign
extended to a 64-bit signed integer. This is not required since all
cases which are being shrunk only accept 32-bit operands.

We need to sign extend the operand to 64-bit though so it passes the
verifier and properly printed.
2023-10-26 00:26:54 -07:00
Stanislav Mekhanoshin
906d3ff054
[AMDGPU] Remove legality checks from imm folding in shrink. NFCI. (#69539)
The immediate legality checks are now embedded into the
isOperandLegal(). It is not needed to check it again.
2023-10-19 03:52:28 -07:00
Pranav Taneja
c41a62e924 [AMDGPU] [NFC] Fixed a typo in SIShrinkInstructions.cpp
Reviewed By: pravinjagtap

Differential Revision: https://reviews.llvm.org/D155785
2023-07-21 15:35:19 +05:30
Mirko Brkusanin
926746d22a [AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions
If more registers are needed for VAddr then the NSA format allows then the
final register can act as a contigous set of remaining addresses. Update
legalizer to pack register for this new format and allow instruction
selection to use NSA encoding when number of addresses exceeds max size.
Also update SIShrinkInstructions to handle partial NSA.

Differential Revision: https://reviews.llvm.org/D144034
2023-02-23 13:33:34 +01:00
Jay Foad
a07584d57d [CodeGen] Make more use of MachineOperand::getOperandNo. NFC.
Differential Revision: https://reviews.llvm.org/D143252
2023-02-07 11:50:57 +00:00
Kazu Hirata
e078201835 [Target] Use llvm::count{l,r}_{zero,one} (NFC) 2023-01-28 09:23:07 -08:00
Jay Foad
073401e59c [MC] Define and use MCInstrDesc implicit_uses and implicit_defs. NFC.
The new methods return a range for easier iteration. Use them everywhere
instead of getImplicitUses, getNumImplicitUses, getImplicitDefs and
getNumImplicitDefs. A future patch will remove the old methods.

In some use cases the new methods are less efficient because they always
have to scan the whole uses/defs array to count its length, but that
will be fixed in a future patch by storing the number of implicit
uses/defs explicitly in MCInstrDesc. At that point there will be no need
to 0-terminate the arrays.

Differential Revision: https://reviews.llvm.org/D142215
2023-01-23 14:44:58 +00:00
Mateja Marjanovic
595a08847a [AMDGPU] Add support for new LLVM vector types
Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384.

Differential Revision: https://reviews.llvm.org/D138205
2022-11-29 17:02:04 +01:00
Jay Foad
38302c60ef [AMDGPU] Stop looking for implicit M0 uses on MOV instructions
Before D114230, indirect moves used regular MOV opcodes and were
identified by having an implicit use of M0. Since D114230 they use
dedicated opcodes instead, so remove some old code that checks for
implicit uses of M0. NFCI.

Differential Revision: https://reviews.llvm.org/D138308
2022-11-18 16:57:55 +00:00
Joe Nash
b982ba2a6e [AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C
Due to the encoding changes in GFX11, we had a hack in place that
    disables the use of VGPRs above 128. This patch removes the need for
    that hack.

    We introduce a new register class VGPR_32_Lo128 which is used for 16-bit
    operands of VOP1, VOP2, and VOPC instructions. This register class only has the
    low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1,
    VOP2, and VOPC instructions are correctly limited to use the first 128
    VGPRs, while the other instructions can freely use all 256.

    We introduce new pseduo-instructions used on GFX11 which have the suffix
    t16 (True 16) to use the VGPR_32_Lo128 register class.

Reviewed By: foad, rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D133723
2022-09-20 09:56:28 -04:00
Jay Foad
2e8863b6a1 [AMDGPU] Don't shrink VOP3 instructions pre-RA on GFX10+
In GFX10, there is no advantage to shrinking these instructions pre-RA,
so this just saves a bit of work.

In GFX11 there is an advantage to *not* shrinking them pre-RA, because
the register classes for 16-bit operands are less restrictive in the
VOP3 form than in the shrunk form. This patch is a prerequisite for
actually setting up those register classes correctly for 16-bit vs
non-16-bit operands.

Differential Revision: https://reviews.llvm.org/D133769
2022-09-13 20:26:08 +01:00
Jay Foad
afa0ed33df [AMDGPU] Fix shrinking of F16 FMA on newer subtargets
D125803 introduced shrinking of F16 FMA to FMAAK/FMAMK in
SIShrinkInstructions (useful on GFX10+ where VOP3 instructions may have
a literal operand) but failed to handle the V_FMA_F16_gfx9_e64 form of
the opcode which is used on GFX9+.

Differential Revision: https://reviews.llvm.org/D133489
2022-09-08 16:41:04 +01:00
Jay Foad
c155a944fb [AMDGPU] GFX11 CodeGen support for MIMG instructions
This includes:
- New llvm.amdgcn.image.msaa.load.* intrinsics
- NSA changes, because MIMG-NSA is now limited to 3 dwords
- Split CD forms of IMAGE_SAMPLE instructions out into separate
  test files since they are no longer supported in GFX11

Differential Revision: https://reviews.llvm.org/D127837
2022-06-16 18:23:14 +01:00
David Stuttard
77851cc1cf [AMDGPU] Change use null for dead sdst to be gfx1030+
Pre gfx1030 null for sdst is different.
c97436f8b6e2 [AMDGPU] Use null for dead sdst operand - requires a change to make
it not apply to pre gfx1030

Differential Revision: https://reviews.llvm.org/D127869
2022-06-16 10:39:06 +01:00
Stanislav Mekhanoshin
c97436f8b6 [AMDGPU] Use null for dead sdst operand
Differential Revision: https://reviews.llvm.org/D127542
2022-06-13 14:41:40 -07:00
Jay Foad
e2926501d8 [AMDGPU] Aggressively fold immediates in SIShrinkInstructions
Fold immediates regardless of how many uses they have. This is expected
to increase overall code size, but decrease register usage.

Differential Revision: https://reviews.llvm.org/D114644
2022-05-18 11:04:33 +01:00
Jay Foad
dd12c3433e [AMDGPU] Shrink F16 MAD/FMA to MADAK/MADMK/FMAAK/FMAMK on GFX10
Differential Revision: https://reviews.llvm.org/D125803
2022-05-18 10:00:06 +01:00
Jay Foad
27fa41583f [AMDGPU] Shrink MAD/FMA to MADAK/MADMK/FMAAK/FMAMK on GFX10
On GFX10 VOP3 instructions can have a literal operand, so the conversion
from VOP3 MAD/FMA to VOP2 MADAK/MADMK/FMAAK/FMAMK will not happen in
SIFoldOperands. The only benefit of the VOP2 form is code size, so do it
in SIShrinkInstructions instead.

Differential Revision: https://reviews.llvm.org/D125567
2022-05-16 15:15:23 +01:00
Jay Foad
c1af2d329f [AMDGPU] SIShrinkInstructions: change static functions to methods
This is a mechanical change to avoid passing MRI and TII around
explicitly. NFC.

Differential Revision: https://reviews.llvm.org/D125566
2022-05-16 09:43:41 +01:00
Thomas Symalla
718aec209c [AMDGPU] Improve v_cmpx usage on GFX10.3.
On GFX10.3 targets, the following instruction sequence

v_cmp_* SGPR, ...
s_and_saveexec ..., SGPR

leads to a fairly long stall caused by a VALU write to a SGPR and having the
following SALU wait for the SGPR.

An equivalent sequence is to save the exec mask manually instead of letting
s_and_saveexec do the work and use a v_cmpx instruction instead to do the
comparison.

This patch modifies the SIOptimizeExecMasking pass as this is the last position
where s_and_saveexec instructions are inserted. It does the transformation by
trying to find the pattern, extracting the operands and generating the new
instruction sequence.

It also changes some existing lit tests and introduces a few new tests to show
the changed behavior on GFX10.3 targets.

Same as D119696 including a buildbot and MIR test fix.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D122332
2022-03-25 11:40:18 +01:00
Thomas Symalla
7de6107dce Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3."
This reverts commit 011c64191ef9ccc6538d52f4b57f98f37d4ea36e and
e725e2afe02e18398525652c9bceda1eb055ea64.

Differential Revision: https://reviews.llvm.org/D122117
2022-03-21 09:50:44 +01:00
Thomas Symalla
011c64191e [AMDGPU] Improve v_cmpx usage on GFX10.3.
On GFX10.3 targets, the following instruction sequence

v_cmp_* SGPR, ...
s_and_saveexec ..., SGPR

leads to a fairly long stall caused by a VALU write to a SGPR and having the
following SALU wait for the SGPR.

An equivalent sequence is to save the exec mask manually instead of letting
s_and_saveexec do the work and use a v_cmpx instruction instead to do the
comparison.

This patch modifies the SIOptimizeExecMasking pass as this is the last position
where s_and_saveexec instructions are inserted. It does the transformation by
trying to find the pattern, extracting the operands and generating the new
instruction sequence.

It also changes some existing lit tests and introduces a few new tests to show
the changed behavior on GFX10.3 targets.

Reviewed By: sebastian-ne, critson

Differential Revision: https://reviews.llvm.org/D119696
2022-03-21 09:31:59 +01:00
Shengchen Kan
37b378386e [NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments 2022-03-16 20:25:42 +08:00
Sebastian Neubauer
6527b2a4d5 [AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.

Differential Revision: https://reviews.llvm.org/D119235
2022-02-18 15:05:21 +01:00
Jay Foad
476bb2d94e [AMDGPU] Remove dead code from shrinkScalarLogicOp
It looks like this code has been dead since shrinkScalarLogicOp
was introduced in svn r348601.
2022-02-09 17:07:12 +00:00
Jay Foad
16de2c09dd [AMDGPU] SIShrinkInstructions: sink code to where it's used. NFC. 2021-12-13 14:46:40 +00:00
Jay Foad
63681527ee [AMDGPU] SIShrinkInstructions: remove redundant check
canShrink already calls hasVALU32BitEncoding, so there is no need
to call it again here.
2021-12-13 14:46:40 +00:00
Neubauer, Sebastian
d1f45ed58f [AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
2021-11-12 11:37:21 +01:00
Jay Foad
74cd4dee20 [AMDGPU] Preserve deadness of vcc when shrinking instructions
This doesn't have any effect on codegen now, but it might do in the
future if we shrink instructions before post-RA scheduling, which is
sensitive to live vs dead defs.

Differential Revision: https://reviews.llvm.org/D112305
2021-10-22 14:22:24 +01:00
Carl Ritson
6efb3220b4 [AMDGPU] Add VReg_192/VReg_224 support for MIMG instructions
Allow MIMG instructions to be selected with 6/7 VGPRs for vaddr.
Previously these were rounded up to VReg_256 this saves VGPRs.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D103800
2021-07-22 10:42:15 +09:00
Carl Ritson
f8816c7400 [AMDGPU] Add v5f32/VReg_160 support for MIMG instructions
Avoid having to round up to v8f32/VReg_256 when only 5 VGPRs are
required for a MIMG address operand.

Maintain _V8 instruction variants of pseudo instructions allowing
assembly prior to GFX10 to work as-is.  Currently the validator
can tell for GFX10 what the correct size is, so will disallow
oversize address registers.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D103672
2021-06-08 11:11:40 +09:00
Jay Foad
16d707e656 [AMDGPU] Fix v_swap_b32 formation on physical registers
As explained in the comments, matchSwap matches:

// mov t, x
// mov x, y
// mov y, t

and turns it into:

// mov t, x (t is potentially dead and move eliminated)
// v_swap_b32 x, y

On physical registers we don't have full use-def chains so the check
for T being live-out was not working properly with subregs/superregs.

Differential Revision: https://reviews.llvm.org/D101546
2021-04-29 20:53:40 +01:00
Piotr Sobczak
fc8e741121 [AMDGPU] Avoid an illegal operand in si-shrink-instructions
Before the patch it was possible to trigger a constant bus
violation when folding immediates into a shrunk instruction.

The patch adds a check to enforce the legality of the new operand.

Differential Revision: https://reviews.llvm.org/D95527
2021-01-28 08:49:21 +01:00
dfukalov
560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
dfukalov
6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Stanislav Mekhanoshin
ad8131bb03 [AMDGPU] Fix VC warning about singed/unsigned comparison. NFC.
This is the warning reported in https://reviews.llvm.org/D89599
2020-10-26 11:55:57 -07:00
Stanislav Mekhanoshin
611959f004 [AMDGPU] Fixed v_swap_b32 match
1. Fixed liveness issue with implicit kills.
2. Fixed potential problem with an indirect mov.

Fixes: SWDEV-256848

Differential Revision: https://reviews.llvm.org/D89599
2020-10-21 10:14:24 -07:00
Austin Kerbow
ebdcef20ce [AMDGPU] Avoid inserting noops during scheduling
Passes that are run after the post-RA scheduler may insert instructions like
waitcnt which eliminate the need for certain noops. After this patch the
scheduler is still aware of possible latency from hazards but noops will
not be inserted until the dedicated hazard recognizer pass is run.

Depends on D89753.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D89754
2020-10-20 17:11:36 -07:00