This is a takeover of PR ##110527
This commit adds an optional list of memory fences to gpu.barrier,
allowing users to specify which memory scopes they wish to fence
explicitly, while leaving the default semantics (which are equivalent to
calling for a global and local fence by analogy to CUDA's __syncthreads)
unchanged. The new expanded semantics are implemented for SPIR-V and for
the AMDGPU backend.
See also
https://discourse.llvm.org/t/rfc-add-memory-scope-to-gpu-barrier/81021/2?u=fmarno,
where the default behavior of a gpu.barrier was hashed out (though note
that the examples based on VMCNT are outdated for AMDGPU in that memory
fences can now be annotated with the correct set of address spaces).
This commit also deprecates amdgpu.lds_barrier for usecases that don't
involve targeting a gfx908.
Assisted-by: Cursor/Claude code (tests and extending amdgpu.lds_barrier
pattern while copying it over)
---------
Co-authored-by: Finlay Marno <finlay.marno@codeplay.com>
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
Co-authored-by: Alan Li <alan.li@me.com>
This lets us properly annotate ranges for gpu.cluster_block_id and
gpu.cluster_dim_blocks. It also allows us to fill in the
nvvm.cluster_dim attribute for use in the NVVM backend.
The MLIR [GPU dialect
docs](https://mlir.llvm.org/docs/Dialects/GPU/#gpubarrier-gpubarrierop)
specify that gpu::BarrierOp should make *all memory accesses* visible to
all work items in the workgroup.
Current implementation uses only CLK_LOCAL_MEM_FENCE, which per the
[OpenCL
specification](https://registry.khronos.org/OpenCL/sdk/3.0/docs/man/html/barrier.html)
guarantees visibility of
only *local memory accesses*.
This PR changes the barrier conversion to use CLK_LOCAL_MEM_FENCE |
CLK_GLOBAL_MEM_FENCE,
ensuring both local and global memory operations are properly
synchronized per the MLIR spec.
This issue was discovered while investigating numerical instabilities on
Intel Battlemage,
where race conditions occurred due to incomplete memory synchronization.
Existing pattern for lowering gpu.printf op to LLVM call uses fixed
function name and calling convention.
Those two should be exposed as pass option to allow supporting Intel
Compute Runtime for GPU.
Also adds gpu.printf op pattern to GPU to LLVMSPV pass.
It may appear out of place, but integration test is added to XeVM
integration test as that is the current best folder for testing with
Intel Compute Runtime.
Test should be moved in the future if a better test folder is added.
This PR adds support to the `bf16` and `i1` data types when converting
`gpu::shuffle` to the `LLVMSPV` dialect, by inserting `bitcast` to/from
`i16` (for `bf16`) and extending/truncating to `i8` (for `i1`).
## Description
This PR updates the `ConvertGpuOpsToLLVMSPVOps`'s option by replacing
the `index-bitwidth` with a boolean option `use-64bit-index` (similar to
the `ConvertGPUToSPIRV` option).
The reason for this modification is because the
`ConvertGpuOpsToLLVMSPVOps`:
> Generate LLVM operations to be ingested by a SPIR-V backend for gpu
operations
In the context of SPIR-V specifications only two physical addressing
models are allowed: `Physical32` and `Physical64`.
This change guarantees output sanity by preventing invalid or
unsupported index bitwidths from being specified.
Use `llvm.func`'s `intel_reqd_sub_group_size` attribute instead of
SPIR-V environment attributes in the `gpu.shuffle` conversion pattern.
This metadata is needed to check the semantics of the operation are
supported, i.e., it has a constant width and its value is equal to the
sub-group size.
As the pass also converts `gpu.func` to `llvm.func`, adding a
discardable attribute of name `intel_reqd_sub_group_size` attribute to
the latter is enough for this pattern to work.
We no longer have a notion of "default" sub-group size, so this
attribute needs to be set in the parent function for `gpu.shuffle`
operations to be converted.
Drop dependency on the SPIR-V dialect as we no longer require creating
attributes from this dialect to lower `gpu.shuffle` instances.
---------
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Default to Global address space for memrefs that do not have an explicit address space set in the IR.
---------
Co-authored-by: Victor Perez <victor.perez@intel.com>
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
Co-authored-by: Victor Perez <victor.perez@codeplay.com>
This commit marks the type converter in `populate...` functions as
`const`. This is useful for debugging.
Patterns already take a `const` type converter. However, some
`populate...` functions do not only add new patterns, but also add
additional type conversion rules. That makes it difficult to find the
place where a type conversion was added in the code base. With this
change, all `populate...` functions that only populate pattern now have
a `const` type converter. Programmers can then conclude from the
function signature that these functions do not register any new type
conversion rules.
Also some minor cleanups around the 1:N dialect conversion
infrastructure, which did not always pass the type converter as a
`const` object internally.
Expand the accepted types for gpu.shuffle to any integer, float or 1d vector of integers or floats.
Also updated the gpu-to-llvm-spv pass to support those types.
Implement mapping:
- `global`: 1
- `workgroup`: 3
- `private`: 0
Add `addressSpaceToStorageClass`, mapping GPU address spaces to SPIR-V
storage classes to be able to use SPIR-V's
`storageClassToAddressSpace`, mapping SPIR-V storage classes to LLVM
address spaces according to our mapping above *by definition*.
---------
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Add support in `-convert-gpu-to-llvm-spv` to convert `gpu.func` to
`llvm.func` operations.
- `spir_kernel`/`spir_func` calling conventions used for
kernels/functions.
- `workgroup` attributions encoded as additional `llvm.ptr<3>`
arguments.
- No attribute used to annotate kernels
- `reqd_work_group_size` attribute using to encode
`gpu.known_block_size`.
- `llvm.mlir.workgroup_attrib_size` used to encode workgroup attribution
sizes. This will be attached to the pointer argument workgroup
attributions lower to.
**Note**: A notable missing feature that will be addressed in a
follow-up PR is a `-use-bare-ptr-memref-call-conv` option to replace
MemRef arguments with bare pointers to the MemRef element types instead
of the current MemRef descriptor approach.
---------
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Adds the attributes nounwind and willreturn to all function
declarations. Adds `memory(none)` equivalent to the id/dimension
function declarations. The function declaration attributes are copied to
the function calls.
`nounwind` is legal because there are no exception in SPIR-V. I also do
not see any reason why any of these functions would not return when used
correctly.
I'm confident that the get id/dim functions will have no externally
observable memory effects, but think the convergent functions will have
effects.