652 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
a0b854d576
[AMDGPU] MC support for gfx1250 scale_offset modifier (#149881) 2025-07-21 15:04:59 -07:00
Stanislav Mekhanoshin
b66084acd9
[AMDGPU] Verify asm VGPR alignment on gfx1250 (#149880)
Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>
2025-07-21 14:23:27 -07:00
Changpeng Fang
d6094370cb
AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-21 10:09:42 -07:00
Stanislav Mekhanoshin
6d8e53d4af
[AMDGPU] Support nv memory instructions modifier on gfx1250 (#149582) 2025-07-18 14:38:46 -07:00
Kazu Hirata
ff5f355d9b
[AMDGPU] Use a range-based for loop (NFC) (#148767) 2025-07-15 08:02:46 -07:00
Changpeng Fang
b80b02536b
AMDGPU: Implement MC layer support for gfx1250 wmma instructions. (#148570)
Regular wmma/swmmac plus matrix reuse only.

---------

Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>
2025-07-15 00:48:57 -07:00
Stanislav Mekhanoshin
f090554359
[AMDGPU] MC support for v_fmaak_f64/v_fmamk_f64 gfx1250 intructions (#148282) 2025-07-11 14:17:03 -07:00
Stanislav Mekhanoshin
7920dff394
[AMDGPU] VOPD/VOPD3 changes for gfx1250 (#147602) 2025-07-10 14:15:01 -07:00
Stanislav Mekhanoshin
00a85e5704
[AMDGPU] gfx1250: MC support for 64-bit literals (#147861) 2025-07-09 22:25:47 -07:00
Shilei Tian
d258457d42
[AMDGPU] Add support for v_cvt_f32_fp8 on gfx1250 (#147579)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-08 16:21:24 -04:00
Changpeng Fang
eda3161c35
AMDGPU: Implement tensor load and store instructions for gfx1250 (#146636)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-03 13:49:34 -07:00
Fangrui Song
e878b7e349 MCParsedAsmOperand::print: Add MCAsmInfo parameter
so that subclasses can provide the appropriate MCAsmInfo to print
MCExpr objects.

At present, llvm/utils/TableGen/AsmMatcherEmitter.cpp constucts a
generic MCAsmInfo.
2025-06-28 12:05:33 -07:00
Fangrui Song
d93aff42c2 MC: Migrate away from operator<< MCExpr
MCExpr::print has an optional MCAsmInfo argument, which is error-prone
when omitted. MCExpr::print and the convenience helper operator<< are
discouraged to use. Switch to MCAsmInfo::printExpr instead. Use the
target-specific MCAsmInfo if available.
2025-06-28 10:58:09 -07:00
Kazu Hirata
eff28bdd46
[AMDGPU] Use StringRef::consume_back (NFC) (#146194)
Note that StringRef::consume_back returns true while consuming the
given prefix if present.
2025-06-27 22:07:27 -07:00
Andrew Rogers
19658d1474
[llvm] annotate interfaces in llvm/Target for DLL export (#143615)
## Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Target` library.
These annotations currently have no meaningful impact on the LLVM build;
however, they are a prerequisite to support an LLVM Windows DLL (shared
library) build.

## Background

This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

A sub-set of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.

The bulk of this change is manual additions of `LLVM_ABI` to
`LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target.
Adding `LLVM_ABI` to the function implementation is required here
because they do not `#include "llvm/Support/TargetSelect.h"`, which
contains the declarations for this functions and was already updated
with `LLVM_ABI` in a previous patch. I considered patching these files
with `#include "llvm/Support/TargetSelect.h"` instead, but since
TargetSelect.h is a large file with a bunch of preprocessor x-macro
stuff in it I was concerned it would unnecessarily impact compile times.

In addition, a number of unit tests under llvm/unittests/Target required
additional dependencies to make them build correctly against the LLVM
DLL on Windows using MSVC.

## Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
2025-06-17 13:28:45 -07:00
Shilei Tian
9a237f35ef
[AMDGPU][AsmParser] Support true16 register suffix for valid register range (#143997) 2025-06-13 08:39:00 -04:00
Brox Chen
d2f06b2729
[AMDGPU][True16][MC][CodeGen] true16 mode for v_cvt_pk_bf8/fp8_f32 (#141881)
Update true16/fake16 profile with v_cvt_pk_bf8/fp8_f32, keeping the
vdst_in profile, and update codegen pattern.

update mc test and codegen test.
2025-06-04 11:29:26 -04:00
Vigneshwar Jayakumar
b3a8c1ef3a
[AMDGPU] Bugfix for scaled MFMA parsing FP literals (#142493)
bugfix on parsing FP literals for scale values in the scaled MFMA.

Due to the change in order of operands between MCinst and parsed
operands, the FP literal imms for scale values were not parsed
correctly.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-06-03 19:27:57 -05:00
Fangrui Song
b3873e8aa4 MCSymbol: Remove the default argument of getVariableValue
It has been made ineffective by e015626f189dc76f8df9fdc25a47638c6a2f3feb.
This change migrates the users.
2025-05-27 20:34:18 -07:00
Fangrui Song
76ee2d34f7 MCParser: Error when .set reassigns a non-redefinable variable
The conditions in parseAssignmentExpression are conservative. We should
also report an error when a non-redefiniable variable (e.g. .equiv
followed by .set; .weakref followed by .set).

Make MCAsmStreamer::emitLabel call setOffset to make the behavior
similar to MCObjectStreamer. `isUndefined()` can now be replaced with
`isUnset()`.

Additionally, fix an AMDGPU API user (tested by a few tests including
MC/AMDGPU/hsa-v4.s)
2025-05-26 20:19:52 -07:00
Kazu Hirata
36d918014a
[AMDGPU] Use StringRef::consume_front (NFC) (#141442) 2025-05-26 09:13:14 -07:00
Fangrui Song
a0901a2f87 Replace #include MCAsmLexer.h with AsmLexer.h
MCAsmLexer.h has been made a forwarder header since #134207
2025-05-25 11:57:29 -07:00
Vigneshwar Jayakumar
e12cbd8339
[AMDGPU] Fix scale opsel flags for scaled MFMA operations (#140183)
Fix for src scale opsel flags encoding and ASM parsing for gfx950 scaled MFMA.
2025-05-21 12:30:22 -05:00
Ivan Kosarev
66d3980b53
[AMDGPU][NFC] Remove _DEFERRED operands. (#139123)
All immediates are deferred now.
2025-05-09 10:10:53 +01:00
Ivan Kosarev
c290f48a45
[AMDGPU][NFC] Remove unused operand types. (#139062) 2025-05-08 12:48:25 +01:00
Stanislav Mekhanoshin
2b05c7cc4d
[AMDGPU] Fix regclass check for PackedF32InputMods in AsmParser. (#138767)
Downstream patch by Pravin Jagtap.
2025-05-07 00:19:25 -07:00
Mirko Brkušanin
b0428870da
[AMDGPU] Rename TH_STORE_RT_WB to TH_STORE_WB (#135171)
So it matches the documentation

Fixes: SWDEV-526726
2025-04-10 16:01:55 +02:00
Stanislav Mekhanoshin
7d869045e0
[AMDGPU] Hoist some constant stuff out of the loop in AMDGPUAsmParser.cpp. NFC. (#133398) 2025-03-28 03:27:16 -07:00
Shilei Tian
dccc0a836c
[NFC][AMDGPU] Replace more direct arch comparison with isAMDGCN() (#131379)
This is an extension of #131357. Hopefully this would be the last one.
2025-03-14 17:02:15 -04:00
Fangrui Song
98a640a2fa [MC] Move VariantKind info to MCAsmInfo
Follow-up to 14951a5a3120e50084b3c5fb217e2d47992a24d1

* Unify getVariantKindName and getVariantKindForName
* Allow each target to specify the preferred case (albeit ignored in MCParser)

Note: targets that use variant kinds should call MCExpr::print with a
non-null MAI to print variant kinds. operator<< passes a nullptr to
`MCExpr::print`, which should be avoided (e.g. Hexagon; fixed in
commit cf00ac81ac049cddb80aec1d6d88b8fab4f209e8).
2025-03-02 20:36:20 -08:00
Fangrui Song
14951a5a31 [MCParser] Extract some VariantKind from getVariantKindForName
All VariantKinds except VK_None/VK_Invalid are target-specific (e.g. a
target may not support "@plt" even if it is widely available).
Move the parsers to lib/Target to ensure that VariantKind from unrelated
targets will not be parsed.
2025-03-02 17:08:17 -08:00
Rahul Joshi
bee9664970
[TableGen] Emit OpName as an enum class instead of a namespace (#125313)
- Change InstrInfoEmitter to emit OpName as an enum class
  instead of an anonymous enum in the OpName namespace.
- This will help clearly distinguish between values that are 
  OpNames vs just operand indices and should help avoid
  bugs due to confusion between the two.
- Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES.
- Emit declaration of getOperandIdx() along with the OpName
  enum so it doesn't have to be repeated in various headers.
- Also updated AMDGPU, RISCV, and WebAssembly backends
  to conform to the new definition of OpName (mostly
  mechanical changes).
2025-02-12 08:19:30 -08:00
Stanislav Mekhanoshin
7639242155
[AMDGPU] Create new directive .amdhsa_inst_pref_size (#126622)
The field INST_PREF_SIZE is available since gfx11.
2025-02-11 08:35:45 -08:00
Brox Chen
1eeca67c57
[AMDGPU][True16][MC] validate op_sel and .l/.h syntax (#125872)
check if op_sel is consistent with .l/.h syntax if both are presented

reopen this https://github.com/llvm/llvm-project/pull/123250 since
problem is resolved in https://github.com/llvm/llvm-project/pull/125561
2025-02-05 13:30:03 -05:00
Pravin Jagtap
e6d16f93b3
[AMDGPU] Allow unaligned VGPR for ds_read_b96_tr_b6 (#125169)
All load transpose instructions follow gfx950 standard of even aligned
VGPR except ds_read_b96_tr_b6, which allows unaligned VGPR.

Co-authored-by: Sirish Pande
[Sirish.Pande@amd.com](mailto:Sirish.Pande@amd.com)
2025-01-31 12:23:48 +05:30
Kazu Hirata
7eb193bd0e Revert "[AMDGPU][True16][MC] validate op_sel and .l/.h syntax (#123250)"
This reverts commit fabe747bf051697cde72a963f1012d6ba9c3f5f5.

Multiple buildbots are failing.  See:
https://github.com/llvm/llvm-project/pull/123250
2025-01-30 14:52:12 -08:00
Brox Chen
fabe747bf0
[AMDGPU][True16][MC] validate op_sel and .l/.h syntax (#123250)
check if op_sel is consistent with .l/.h syntax if both are presented
2025-01-30 16:38:23 -05:00
Jun Wang
b2adeae865
[AMDGPU][MC] Allow null where 128b or larger dst reg is expected (#115200)
For GFX10+, currently null cannot be used as dst reg in instructions
that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4).
This patch fixes this problem while ensuring null cannot be used as S#,
T#, or V#.
2025-01-03 11:49:51 -08:00
Matt Arsenault
431581b22a
AMDGPU: Simplify definition of bitop3 operand. NFC. (#118648)
Co-authored-by: Jay Foad <jay.foad@amd.com>
2024-12-04 15:47:20 -05:00
Matt Arsenault
d9c4e9ffe7
AMDGPU: Verify f8f6f4 formats in assembler (#117826)
Verify the register widths of the corresponding operands match
the floating point format expected size.
2024-11-26 23:45:03 -05:00
Matt Arsenault
d3c103b80e
AMDGPU: MC support for V_CVT_SCALE_SR_FP4 instructions (#117795)
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
2024-11-26 19:41:52 -05:00
Matt Arsenault
4527894143
Builtins & Codegen support for v_cvt_scalef32_pk_{fp|bf}8_{f|bf}16 for gfx950 (#117742)
OPSEL[3] determines low/high 16 bits of word to write.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:16:08 -05:00
Matt Arsenault
d727b6f777
AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f|bf}16 on gfx950. (#117594)
These instructions have non-standard use of OPSEL bits to select
dest write byte. The src2_modifiers operand is used without having
its corresponding src2 operand by introducing dummy src2.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-25 19:37:04 -08:00
Matt Arsenault
6f8e7c11cf
AMDGPU: Add MC support for gfx950 V_BITOP3_B32/B16 (#117379)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-25 09:42:07 -08:00
Kazu Hirata
be187369a0
[AMDGPU] Remove unused includes (NFC) (#116154)
Identified with misc-include-cleaner.
2024-11-13 21:10:03 -08:00
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Craig Topper
fd50cdfb94 [AMDGPU] Use MCRegister. NFC 2024-09-28 11:40:25 -07:00
Jun Wang
cd5f5b7690
[AMDGPU][MC] Implement fft and rotate modes for ds_swizzle_b32 (#108064)
In addition to the basic mode, the ds_swizzle_b32 is supposed to support
two specific modes: fft and rotate. This patch implements those two
modes.
2024-09-27 10:18:34 -07:00
Austin Kerbow
954ab83e6a
[AMDGPU] Include unused preload kernarg in KD total SGPR count (#104743)
Unlike with implicitly preloaded data UserSGPRs firmware is unable to
handle cases where SGPRs for kernel arguments contain preloaded data but
not are not explicitly referenced in the kernel. We need to include
these preloaded SGPRs in the GRANULATED_WAVEFRONT_SGPR_COUNT calculation
to not clobber SGPRs in adjacent waves.
2024-09-23 13:48:22 -07:00
Kazu Hirata
e4e3ff5adc
[llvm] Use std::optional::value_or (NFC) (#109568) 2024-09-22 01:00:24 -07:00