868 Commits

Author SHA1 Message Date
Juan Manuel Martinez Caamaño
041e84261a
[Clang][AMDGPU] Expose buffer load lds as a clang builtin (#132048)
CK is using either inline assembly or inline LLVM-IR builtins to
generate buffer_load_dword lds instructions.

This patch exposes this instruction as a Clang builtin available on gfx9 and gfx10.

Related to SWDEV-519702 and SWDEV-518861
2025-04-03 09:22:38 +02:00
Juan Manuel Martinez Caamaño
beae0e9f1a
[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055)
This patch introduces the `vmem-to-lds-load-insts` target feature, which
can be used to enable builtins `__builtin_amdgcn_global_load_lds` and
`__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this
feature.

This feature is only available on gfx9/10.

A limitation of using a common target feature for both builtins is that
we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available
on gfx6,7,8.
2025-04-02 20:00:09 +02:00
Juan Manuel Martinez Caamaño
0375ef07c3
[Clang][AMDGPU] Add __builtin_amdgcn_cvt_off_f32_i4 (#133741)
This built-in maps to `V_CVT_OFF_F32_I4` which treats its input as
a 4-bit signed integer and returns `0.0625f * src`.

SWDEV-518861
2025-04-02 19:51:40 +02:00
Joseph Huber
772173f548
[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870)
Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.

This isn't a great solution because it forced every TU to have this
variable in it. The patch in
https://github.com/llvm/llvm-project/pull/131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).

So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.
2025-03-28 07:35:16 -05:00
Shilei Tian
f1ac2afe21
Reapply "[AMDGPU] Use COV6 by default (#118515)" (#130963)
This reverts commit 68bcba6d7a1cc18996c0bcb7c62267c62d2040d0.
2025-03-21 15:26:45 -04:00
Pedro Lobo
ccf2109471
[Metadata] Change placeholder from undef to poison (#131469)
Replace `undef` constant metadata uses with `poison`.
2025-03-17 22:16:18 +00:00
Matt Arsenault
0d2c55cb96
AMDGPU: Move enqueued block handling into clang (#128519)
The previous implementation wasn't maintaining a faithful IR
representation of how this really works. The value returned by
createEnqueuedBlockKernel wasn't actually used as a function, and
hacked up later to be a pointer to the runtime handle global
variable. In reality, the enqueued block is a struct where the first
field is a pointer to the kernel descriptor, not the kernel itself. We
were also relying on passing around a reference to a global using a
string attribute containing its name. It's better to base this on a
proper IR symbol reference during final emission.

This now avoids using a function attribute on kernels and avoids using
the additional "runtime-handle" attribute to populate the final
metadata. Instead, associate the runtime handle reference to the
kernel with the !associated global metadata. We can then get a final,
correctly mangled name at the end.

I couldn't figure out how to get rename-with-external-symbol behavior
using a combination of comdats and aliases, so leaves an IR pass to
externalize the runtime handles for codegen. If anything breaks, it's
most likely this, so leave avoiding this for a later step. Use a
special section name to enable this behavior. This also means it's
possible to declare enqueuable kernels in source without going through
the dedicated block syntax or other dedicated compiler support.

We could move towards initializing the runtime handle in the
compiler/linker. I have a working patch where the linker sets up the
first field of the handle, avoiding the need to export the block
kernel symbol for the runtime. We would need new relocations to get
the private and group sizes, but that would avoid the runtime's
special case handling that requires the device_enqueue_symbol metadata
field.

https://reviews.llvm.org/D141700
2025-03-10 19:54:04 +07:00
Matt Arsenault
bfea84946d
clang: Hack around opencl enqueue_block using wrong ABI for aggregrate (#130011)
EmitAggExprToLValue started wrapping the temporary alloca in an
addrspacecast
at some point. We take the direct type from this as the pointer argument
for the
runtime function type, but this isn't correct. Technically, we should be
querying
the target's ABI for what IR to produce for this sequence. The
assumption seems to
always have been that this will be indirectly passed with byval (or
byref).

I started working on a patch to go through the ABI handling, but it
seems to
require more time and/or clang expertise than I have at the moment.
2025-03-06 23:13:28 +07:00
Mariusz Sikora
0aa92d23b2
[AMDGPU] Run DL builtin tests for new GFX (#130054) 2025-03-06 14:24:49 +01:00
Mariusz Sikora
cd3acd1bff
[AMDGPU] Remove unused s_barrier_{init,join,leave} instructions (#129548) 2025-03-04 17:52:43 +01:00
Shilei Tian
746d8b0740
[Clang][AMDGPU] Use 32-bit index for SWMMAC builtins (#129101)
Currently, the index of SWMMAC builtins is of type `short`, likely based
on the
assumption that K can only be up to 32, meaning there are only 16
non-zero
elements. However, this is not future-proof. This patch updates all of
them to
`int`.

The intrinsics themselves don't need to be updated since they accept any
integer
type, and in the backend, they are already extended to 32-bit.
Additionally, the
tests already use various kinds of integers.

Partially fixes SWDEV-518183.
2025-02-27 23:28:48 -05:00
Yaxun (Sam) Liu
240f2269ff
Add clang atomic control options and attribute (#114841)
Add option and statement attribute for controlling emitting of
target-specific
metadata to atomicrmw instructions in IR.

The RFC for this attribute and option is

https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641,
Originally a pragma was proposed, then it was changed to clang
attribute.

This attribute allows users to specify one, two, or all three options
and must be applied
to a compound statement. The attribute can also be nested, with inner
attributes
overriding the options specified by outer attributes or the target's
default
options. These options will then determine the target-specific metadata
added to atomic
instructions in the IR.

In addition to the attribute, three new compiler options are introduced:
`-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`,
 `-f[no-]atomic-ignore-denormal-mode`.
These compiler options allow users to override the default options
through the
Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to
`-f[no-]ignore-denormal-mode`.

In terms of implementation, the atomic attribute is represented in the
AST by the
existing AttributedStmt, with minimal changes to AST and Sema.

During code generation in Clang, the CodeGenModule maintains the current
atomic options,
which are used to emit the relevant metadata for atomic instructions.
RAII is used
to manage the saving and restoring of atomic options when entering
and exiting nested AttributedStmt.
2025-02-27 10:41:04 -05:00
Nikita Popov
e56a6a2683
Reapply [CaptureTracking][FunctionAttrs] Add support for CaptureInfo (#125880) (#128020)
Relative to the previous attempt this includes two fixes:
 * Adjust callCapturesBefore() to not skip captures(ret: address,
    provenance) arguments, as these will not count as a capture
    at the call-site.
 * When visiting uses during stack slot optimization, don't skip
    the ModRef check for passthru captures. Calls can both modref
    and be passthru for captures.

------

This extends CaptureTracking to support inferring non-trivial
CaptureInfos. The focus of this patch is to only support FunctionAttrs,
other users of CaptureTracking will be updated in followups.

The key API changes here are:

* DetermineUseCaptureKind() now returns a UseCaptureInfo where the UseCC
component specifies what is captured at that Use and the ResultCC
component specifies what may be captured via the return value of the
User. Usually only one or the other will be used (corresponding to
previous MAY_CAPTURE or PASSTHROUGH results), but both may be set for
call captures.
* The CaptureTracking::captures() extension point is passed this
UseCaptureInfo as well and then can decide what to do with it by
returning an Action, which is one of: Stop: stop traversal.
ContinueIgnoringReturn: continue traversal but don't follow the
instruction return value. Continue: continue traversal and follow the
instruction return value if it has additional CaptureComponents.

For now, this patch retains the (unsound) special logic for comparison
of null with a dereferenceable pointer. I'd like to switch key code to
take advantage of address/address_is_null before dropping it.

This PR mainly intends to introduce necessary API changes and basic
inference support, there are various possible improvements marked with
TODOs.
2025-02-27 09:38:29 +01:00
Sirraide
b0210fee94
[Clang] [NFC] Fix more -Wreturn-type warnings in tests everywhere (#123470)
With the goal of eventually being able to make `-Wreturn-type` default to an 
error in all language modes, this is a follow-up to #123464 and updates even
more tests, mainly clang-tidy and clangd tests.
2025-02-20 19:49:37 +01:00
Nico Weber
e2ba1b6ffd Revert "Reapply [CaptureTracking][FunctionAttrs] Add support for CaptureInfo (#125880)"
This reverts commit 0fab404ee874bc5b0c442d1841c7d2005c3f8729.
Seems to break LTO builds of clang on Windows, see comments on
https://github.com/llvm/llvm-project/pull/125880
2025-02-19 11:32:57 -05:00
Fabian Ritter
029c8e783d
[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.

For SWDEV-512631
2025-02-19 10:11:48 +01:00
Krzysztof Drewniak
f7d03707d1
[AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (#126828)
Attempting to pass a `ptr addrspace(7)` to functions that take `ptr`
arguments produces undesirable `addrspacecast(addrspacecast(p8 x to p7)
to p0) => addrspacecast(p8 x to p0)` folds. This results in illegal GEP
operations on buffer resources, which can't be GEP'd. (However, note
that, while unimplemneted, addressspacecast from ptr addrspace(7) to ptr
is legal - it's just an effective address computation)

To resolve this problem, and thus prevent illegal
`getelementptr T, ptr addrspace(8) %x, ...` s from being produces, this
commit extends amdgcn.make.buffer.rsrc to also be variadic in its result
type, auto-upgrading old manglings.

The logic for handling a make.buffer.rsrc in instruction selection
remains untouched and expects the output type to be a ptr addrspace(8),
as does the Clang lowering for its builtin (the pointer-to-pointer
version might want a different name in clang). LowerBufferFatPointers
has been updated to lower
amdgcn.make.buffer.rsrc.p7.p* to amdgcn.make.buffer.rsrc.p8.p* .

This'll also make exposing buffer fat pointers in Clang easier, since
you don't have to cast between a `__amdgcn_rsrc_t` and a pointer.
2025-02-18 14:15:28 -06:00
Nikita Popov
7e3735d1a1 Reapply [CaptureTracking][FunctionAttrs] Add support for CaptureInfo (#125880)
Relative to the previous attempt, this adjusts isEscapeSource()
to not treat calls with captures(ret: address, provenance) or similar
arguments as escape sources. This addresses the miscompile reported at:
https://github.com/llvm/llvm-project/pull/125880#issuecomment-2656632577

The implementation uses a helper function on CallBase to make this
check a bit more efficient (e.g. by skipping the byval checks) as
checking attributes on all arguments if fairly expensive.

------

This extends CaptureTracking to support inferring non-trivial
CaptureInfos. The focus of this patch is to only support FunctionAttrs,
other users of CaptureTracking will be updated in followups.

The key API changes here are:

* DetermineUseCaptureKind() now returns a UseCaptureInfo where the UseCC
component specifies what is captured at that Use and the ResultCC
component specifies what may be captured via the return value of the
User. Usually only one or the other will be used (corresponding to
previous MAY_CAPTURE or PASSTHROUGH results), but both may be set for
call captures.
* The CaptureTracking::captures() extension point is passed this
UseCaptureInfo as well and then can decide what to do with it by
returning an Action, which is one of: Stop: stop traversal.
ContinueIgnoringReturn: continue traversal but don't follow the
instruction return value. Continue: continue traversal and follow the
instruction return value if it has additional CaptureComponents.

For now, this patch retains the (unsound) special logic for comparison
of null with a dereferenceable pointer. I'd like to switch key code to
take advantage of address/address_is_null before dropping it.

This PR mainly intends to introduce necessary API changes and basic
inference support, there are various possible improvements marked with
TODOs.
2025-02-14 12:38:04 +01:00
Alex Voicu
39ec9de7c2
[clang][CodeGen] sret args should always point to the alloca AS, so use that (#114062)
`sret` arguments are always going to reside in the stack/`alloca`
address space, which makes the current formulation where their AS is
derived from the pointee somewhat quaint. This patch ensures that `sret`
ends up pointing to the `alloca` AS in IR function signatures, and also
guards agains trying to pass a casted `alloca`d pointer to a `sret` arg,
which can happen for most languages, when compiled for targets that have
a non-zero `alloca` AS (e.g. AMDGCN) / map `LangAS::default` to a
non-zero value (SPIR-V). A target could still choose to do something
different here, by e.g. overriding `classifyReturnType` behaviour.

In a broader sense, this patch extends non-aliased indirect args to also
carry an AS, which leads to changing the `getIndirect()` interface. At
the moment we're only using this for (indirect) returns, but it allows
for future handling of indirect args themselves. We default to using the
AllocaAS as that matches what Clang is currently doing, however if, in
the future, a target would opt for e.g. placing indirect returns in some
other storage, with another AS, this will require revisiting.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2025-02-14 11:20:45 +00:00
Nikita Popov
1e64ea9914 Revert "[CaptureTracking][FunctionAttrs] Add support for CaptureInfo (#125880)"
This reverts commit ee655ca27aad466bcc54f6eba03f7e564940ad5a.

A miscompilation has been reported at:
https://github.com/llvm/llvm-project/pull/125880#issuecomment-2656632577
2025-02-13 14:56:12 +01:00
Nikita Popov
ee655ca27a
[CaptureTracking][FunctionAttrs] Add support for CaptureInfo (#125880)
This extends CaptureTracking to support inferring non-trivial
CaptureInfos. The focus of this patch is to only support FunctionAttrs,
other users of CaptureTracking will be updated in followups.

The key API changes here are:

* DetermineUseCaptureKind() now returns a UseCaptureInfo where the UseCC
component specifies what is captured at that Use and the ResultCC
component specifies what may be captured via the return value of the
User. Usually only one or the other will be used (corresponding to
previous MAY_CAPTURE or PASSTHROUGH results), but both may be set for
call captures.
* The CaptureTracking::captures() extension point is passed this
UseCaptureInfo as well and then can decide what to do with it by
returning an Action, which is one of: Stop: stop traversal.
ContinueIgnoringReturn: continue traversal but don't follow the
instruction return value. Continue: continue traversal and follow the
instruction return value if it has additional CaptureComponents.

For now, this patch retains the (unsound) special logic for comparison
of null with a dereferenceable pointer. I'd like to switch key code to
take advantage of address/address_is_null before dropping it.

This PR mainly intends to introduce necessary API changes and basic
inference support, there are various possible improvements marked with
TODOs.
2025-02-13 09:36:35 +01:00
Florian Hahn
77d3f8a925
[TBAA] Don't emit pointer-tbaa for void pointers. (#122116)
While there are no special rules in the standards regarding void
pointers and strict aliasing, emitting distinct tags for void pointers
break some common idioms and there is no good alternative to re-write
the code without strict-aliasing violations. An example is to count the
entries in an array of pointers:

    int count_elements(void * values) {
      void **seq = values;
      int count;
      for (count = 0; seq && seq[count]; count++);
      return count;
    }

https://clang.godbolt.org/z/8dTv51v8W

An example in the wild is from
https://github.com/llvm/llvm-project/issues/119099

This patch avoids emitting distinct tags for void pointers, to avoid
those idioms causing mis-compiles for now.

Fixes https://github.com/llvm/llvm-project/issues/119099.
Fixes https://github.com/llvm/llvm-project/issues/122537.

PR: https://github.com/llvm/llvm-project/pull/122116
2025-01-31 11:38:14 +00:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Acim Maravic
3a29dfe37c
[LLVM][AMDGPU] Add Intrinsic and Builtin for ds_bpermute_fi_b32 (#124616) 2025-01-29 14:04:10 +01:00
Shilei Tian
03744d2aaf
[Clang] Remove 3-element vector load and store special handling (#104661)
Clang uses a long-time special handling of the case where 3 element
vector loads and stores are performed as 4 element, and then a
shufflevector is used to extract the used elements. Odd sized vector
codegen should now work reasonably well.

This patch removes the compiler argument `-fpreserve-vec3-type` and adds
a target hook to determine if the special handling of vector type is
needed.

---------

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2025-01-21 09:18:16 -05:00
Shilei Tian
ac2d529be3 [NFC][Clang] Auto generate check lines for preserve_vec3.cl 2025-01-10 12:22:25 -05:00
Mirko Brkušanin
3def49cb64
[AMDGPU] Remove s_wakeup_barrier instruction (#122277) 2025-01-10 11:30:22 +01:00
Alex MacLean
4583f6d344
[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806)
the `ptx_kernel` calling convention is a more idiomatic and standard way
of specifying a NVPTX kernel than using the metadata which is not
supposed to change the meaning of the program. Further, checking the
calling convention is significantly faster than traversing the metadata,
improving compile time.

This change updates the clang and mlir frontends as well as the
NVPTXCtorDtorLowering pass to emit kernels using the calling convention.
In addition, this updates all NVPTX unit tests to use the calling
convention as well.
2025-01-07 18:24:50 -08:00
Matt Arsenault
1100d6a995
AMDGPU: Fix libcall recognition of image array types (#119832)
Add tests with get_image_width as a sample for all of the non-extension
image types. The transform doesn't do anything, but this runs through
all the mangled libfunc parsing and shows it does not crash. It would
probably be smarter to check for exact match of the types, rather than
checking the prefix.
2024-12-16 15:04:53 +09:00
Matt Arsenault
37d0e2f46e
clang: Fix broken check prefix in test (#119821) 2024-12-13 15:57:53 +09:00
Nikita Popov
462cb3cd6c
[InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144)
If the gep is nusw (usually via inbounds) and the offset is
non-negative, we can infer nuw.

Proof: https://alive2.llvm.org/ce/z/ihztLy
2024-12-05 14:36:40 +01:00
Florian Hahn
7954a0514b
[Clang] Enable -fpointer-tbaa by default. (#117244)
Support for more precise TBAA metadata has been added a while ago
(behind the -fpointer-tbaa flag). The more precise TBAA metadata allows
treating accesses of different pointer types as no-alias.

This helps to remove more redundant loads and stores in a number of
workloads.

Some highlights on the impact across llvm-test-suite's MultiSource,
SPEC2006 & SPEC2017 include:
 * +2% more NoAlias results for memory accesses
 * +3% more stores removed by DSE,
 * +4% more loops vectorized.

This closes a relatively big gap to GCC, which has been supporting
disambiguating based on pointer types for a long time.
(https://clang.godbolt.org/z/K7Wbhrz4q)

Pointer-TBAA support for pointers to builtin types has been added in
https://github.com/llvm/llvm-project/pull/76612.

Support for user-defined types has been added in
https://github.com/llvm/llvm-project/pull/110569.

There are 2 recent PRs with bug fixes for special cases uncovered during
testing:
 * https://github.com/llvm/llvm-project/pull/116991
 * https://github.com/llvm/llvm-project/pull/116596

PR: https://github.com/llvm/llvm-project/pull/117244
2024-12-04 20:55:18 +00:00
Matt Arsenault
e0f52538c9
AMDGPU: Change bitop3 intrinsic operand to i32 (#118647) 2024-12-04 15:44:04 -05:00
Shilei Tian
68bcba6d7a Revert "[AMDGPU] Use COV6 by default (#118515)"
This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some
buildbots are not ready yet.
2024-12-03 20:17:06 -05:00
Shilei Tian
410cbe3cf2
[AMDGPU] Use COV6 by default (#118515) 2024-12-03 19:38:35 -05:00
Matt Arsenault
a796f597cd
AMDGPU: Allow f16/bf16 for DS_READ_TR16_B64 gfx950 builtins (#118297)
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-12-02 14:40:36 -05:00
Matt Arsenault
a2c3e0c4cb
AMDGPU/clang: Add global_load_lds size check support for gfx950 (#117825)
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
2024-11-26 23:41:09 -05:00
Matt Arsenault
5615657209
AMDGPU: Builtin & CodeGen support for v_cvt_sr_{bf16|f16}_f32 instructions (#117824)
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
2024-11-26 23:37:05 -05:00
Matt Arsenault
62dc8f3069
AMDGPU: Add builtins & codegen support for bitop3_b{16|32} of gfx950. (#117823)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 23:33:07 -05:00
Matt Arsenault
265e209ceb
AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_sr_{bf8|fp8}_{f16|bf16|f32} (#117821)
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
2024-11-26 23:24:01 -05:00
Matt Arsenault
301c8e6047
AMDGPU: Add support for v_cvt_scalef32_sr instructions (#117820)
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
2024-11-26 23:20:16 -05:00
Matt Arsenault
76715787f4
AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_sr_pk_fp4 instructions (#117798)
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
2024-11-26 19:59:14 -05:00
Matt Arsenault
c8ee1ee057
AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk_fp4_{f|bf}16 for gfx950 (#117794)
These instructions have non-standard use of OPSEL bits to select
dest write byte. The src2_modifiers operand is used without having
its corresponding src2 operand by introducing dummy src2.

OPSEL ASM OPSEL Syntax: opsel:[a,b,c,d]
a & b are meaningless, c & d together decides byte to write in dst reg.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:38:23 -05:00
Matt Arsenault
065dc93d96
AMDGPU: Builtins & CodeGen support for v_cvt_scalef32_pk_{bf|f}16_{bf|fp}8 for gfx950 (#117793)
OPSEL[0] selects src_word to read.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:35:18 -05:00
Matt Arsenault
991dcbc468
AMDGPU: Builtin & codegen support for v_cvt_scalef32_pk32_{bf|f}16_{bf|fp}6 for gfx950 (#117747)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:30:04 -05:00
Matt Arsenault
0f4fcca546
AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk32_f32_[fp|bf]6 for gfx950 (#117745)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:26:07 -05:00
Matt Arsenault
eeb76880f3
AMDGPU: Builtins & CodeGen support for v_cvt_scalef32_pk_{f|bf}16_fp4 for gfx950 (#117744)
OPSEL ASM Syntax for v_cvt_scalef32_pk_{f|bf}16_fp4 : opsel:[x,y,z]
where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read.

Note: Conventional Inst{13} i.e. OPSEL[2] is ignored in asm syntax.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:23:15 -05:00
Matt Arsenault
2b9e947d43
AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743)
OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z]
where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read.

OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d]
where, c & d i.e. OPSEL[3 : 2] selects which dst_byte  to write.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:20:09 -05:00
Matt Arsenault
4527894143
Builtins & Codegen support for v_cvt_scalef32_pk_{fp|bf}8_{f|bf}16 for gfx950 (#117742)
OPSEL[3] determines low/high 16 bits of word to write.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:16:08 -05:00
Matt Arsenault
62584f32eb
AMDGPU: Builtins & Codegen support for v_cvt_scalef32_pk_f32_{fp8|bf8} for gfx950 (#117741)
OPSEL[0] determines low/high 16 bits of src0 to read.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:12:18 -05:00