24 Commits

Author SHA1 Message Date
Changpeng Fang
350bda4419
AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313)
Rename the intrinsics to close to the instruction mnemonic names:
Use global_load_tr_b64 and global_load_tr_b128 instead of
global_load_tr.

This patch also removes f16/bf16 versions of builtins/intrinsics. To
simplify the design, we should avoid enumerating all possible types in
implementing builtins. We can always use bitcast.
2024-03-25 16:55:22 -07:00
Nikita Popov
1aee1e1f4c [Analysis] Convert tests to opaque pointers (NFC) 2024-02-05 12:04:39 +01:00
Changpeng Fang
3564666fe1
[AMDGPU]: Fix type signatures for wmma intrinsics, NFC (#80087)
Make the wmma intrinsic type signatures to be canonical. We need
a type signature as long as the type is not fixed. However, when an
argument's type matches a previous argument's type, we do not need the
signature for this argument.

 This patch fixes three general cases:
  1. add missing signatures
  2. remove signatures for matching arguments
3. reorer the signatures -- return type signature should always appear
first
2024-01-30 23:17:35 -08:00
Mirko Brkušanin
7fdf608cef
[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Changpeng Fang
1a300d6da3
AMDGPU: Add SourceOfDivergence for int_amdgcn_global_load_tr (#79218) 2024-01-23 14:30:11 -08:00
Mariusz Sikora
c99da46fc1
[AMDGPU][GFX12] Add Atomic cond_sub_u32 (#76224)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2024-01-17 19:23:42 +01:00
Mariusz Sikora
966416b9e8
[AMDGPU][GFX12] Add new v_permlane16 variants (#75475) 2023-12-15 10:14:38 +01:00
Jun Wang
54470176af
[AMDGPU] Add inreg support for SGPR arguments (#67182)
Function parameters marked with inreg are supposed to be allocated to
SGPRs. However, for compute functions, this is ignored and function
parameters are allocated to VGPRs. This fix modifies CC_AMDGPU_Func in
AMDGPUCallingConv.td to use SGPRs if input arg is marked inreg.
---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2023-11-08 11:35:52 -08:00
Ruiling, Song
45e425e355
AMDGPU: Teach isArgPassedInSGPR() about cs_chain* calling convention (#67086)
This cs_chain and cs_chain_preserve use InReg attribute to indicate
argument passed through SGPR.
2023-09-22 22:24:17 +08:00
Mirko Brkusanin
de82fde22d AMDGPU/Uniformity/GlobalISel: G_AMDGPU atomics are always divergent
Patch by: Acim Maravic

Differential Revision: https://reviews.llvm.org/D157091
2023-08-18 18:23:40 +02:00
Sameer Sahasrabuddhe
4d081560cd [Uniformity] fix assert in a cycle made divergent by outside branch
When diverged paths reach an irreducible cycle C, every block inside C gets
marked as a join block. Such a join block J may be contained in a nest of
reducible cycles inside C. When visiting J, we can only expect that the
outermost C is irreducible, which we now correctly assert.
2023-08-18 13:25:13 +05:30
Sameer Sahasrabuddhe
d9847cde48 [GlobalISel] convergent intrinsics
Introduced the convergent equivalent of the existing G_INTRINSIC opcodes:

- G_INTRINSIC_CONVERGENT
- G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS

Out of the targets that currently have some support for GlobalISel, the patch
assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154766
2023-07-31 12:15:39 +05:30
Sameer Sahasrabuddhe
da61c865e7 [RFC] Introduce convergence control intrinsics
This is a reboot of the original design and implementation by
Nicolai Haehnle <nicolai.haehnle@amd.com>:
https://reviews.llvm.org/D85603

This change also obsoletes an earlier attempt at restarting the work on
convergence tokens:
https://reviews.llvm.org/D104504

Changes relative to D85603:

 1. Clean up the definition of a "convergent operation", a convergent
    call and convergent function.
 2. Clean up the relationship between dynamic instances, sets of threads and
    convergence tokens.
 3. Redistribute the formal rules into the definitions of the convergence
    intrinsics.
 4. Expand on the semantics of entering a function from outside LLVM,
    and the environment-defined outcome of the entry intrinsic.
 5. Replace the term "cycle" with "closed path". The static rules are defined
    in terms of closed paths, and then a relation is established with cycles.
 6. Specify that if a function contains a controlled convergent operation, then
    all convergent operations in that function must be controlled.
 7. Describe an optional procedure to infer tokens for uncontrolled convergent
    operations.
 8. Introduce controlled maximal convergence-before and controlled m-converged
    property as an update to the original properties in UniformityAnalysis.
 9. Additional constraint that a cycle heart can only occur in the header of a
    reducible cycle (natural loop).

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D147116
2023-07-12 12:31:42 +05:30
Matt Arsenault
53fb907df4 AMDGPU: Special case uniformity info for single lane workgroups
Constructors/destructors and OpenMP make use of single lane groups
in some cases.
2023-06-28 07:25:48 -04:00
Matt Arsenault
92ee60b66f AMDGPU: Drop and upgrade llvm.amdgcn.atomic.inc/dec to atomicrmw 2023-06-21 21:20:26 -04:00
Krzysztof Drewniak
faa2c678aa [AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.

The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.

One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.

In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.

This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.

Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.

Depends on D145441

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D147547
2023-06-05 16:59:07 +00:00
Sameer Sahasrabuddhe
9615d48540 [AMDGPU][Uniformity] SI_IF and SI_ELSE pseudos are always divergent
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D150861
2023-05-19 11:49:09 +05:30
Carl Ritson
9602c7a081 [AMDGPU][Uniformity] V_MBCNT* is never uniform
Mark V_MBCNT instructions add thread/lane position so will never
be uniform.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D150759
2023-05-18 13:50:12 +09:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Carl Ritson
cd811e2421 [AMDGPU][UniformityAnalysis] Fix typos in test comment (NFC) 2023-05-17 16:23:10 +09:00
Sameer Sahasrabuddhe
0a170eb786 [Uniformity] Propagate divergence only along divergent outputs.
When an instruction is determined to be divergent, not all its outputs are
divergent. The users of only divergent outputs should now be examined for
divergence.

Also, replaced a repeating pattern of "if new divergent instruction, then add to
worklist" by combining it into a single function. This does not cause any change
in functionality.

Reviewed By: foad, arsenm

Differential Revision: https://reviews.llvm.org/D150636
2023-05-17 07:47:43 +05:30
Sameer Sahasrabuddhe
fbe1c0616f [LLVM][Uniformity] Improve detection of uniform registers
The MachineUA now queries the target to determine if a given register holds a
uniform value. This is determined using the corresponding register bank if
available, or by a combination of the register class and value type. This
assumes that the target is optimizing for performance by choosing registers, and
the target is responsible for any mismatch with the inferred uniformity.

For example, on AMDGPU, an SGPR is now treated as uniform, except if the
register bank is VCC (i.e., the register holds a wave-wide vector of 1-bit
values) or equivalently if it has a value type of s1.

 - This does not always work with inline asm, where the register bank or the
   value type might not be present. We assume that the SGPR is uniform, because
   it is not expected to be s1 in the vast majority of cases.
 - The pseudo branch instruction SI_LOOP is now hard-coded to be always
   divergent, although its condition is an SGPR.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D150438
2023-05-16 09:37:04 +05:30
Sameer Sahasrabuddhe
b0f0dd2554 [LLVM][Uniformity] Propagate temporal divergence explicitly
At a cycle C with divergent exits, UA was using a naive traversal of the exiting
edges to locate blocks that may use values defined inside C. But this traversal
fails when it encounters a cycle. This is now replaced with a much simpler
propagation that iterates over every instruction in C and checks any uses that
are outside C. But such an iteration can be expensive when C is very large; the
original strategy may need to be reconsidered if there is a regression in
compilation times.

Also fixed lit tests that should have originally caught the missed propagation
of temporal divergence.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D149646
2023-05-15 20:17:43 +05:30
pvanhout
ae77aceba5 [Analysis] Remove DA & LegacyDA
UniformityAnalysis offers all of the same features and much more, there is no reason left to use the legacy DAs.
See RFC: https://discourse.llvm.org/t/rfc-deprecate-divergenceanalysis-legacydivergenceanalysis/69538

- Remove LegacyDivergenceAnalysis.h/.cpp
- Remove DivergenceAnalysis.h/.cpp + Unit tests
- Remove SyncDependenceAnalysis - it was not a real registered analysis and was only used by DAs
- Remove/adjust references to the passes in the docs where applicable
- Remove TTI hook associated with those passes.
- Move tests to UniformityAnalysis folder.
  - Remove RUN lines for the DA, leave only the UA ones.
- Some tests had to be adjusted/removed depending on how they used the legacy DAs.

Reviewed By: foad, sameerds

Differential Revision: https://reviews.llvm.org/D148116
2023-04-17 09:01:22 +02:00