These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Bug description: Hardware intrinsic functions created during GPU
conversion to NVVM may contain debug info metadata from the original
function which cannot be used out of that function.
Bug description: Global variables and functions created during
gpu.printf conversion to NVVM may contain debug info metadata from
function containing the gpu.printf which cannot be used out of that
function.
When the upper_bound of a gpu dim op (like `gpu.block_dim`) is the
maximum i32 integer value, the op conversion for it causes overflow by
adding 1 to convert the bound from closed to open. This fixes the bug by
clamping the open bound to the maximum i32 value.
---------
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
`FunctionOpInterface` assumed the fact that the function type (attribute
of the operation) can be cloned with arbirary lists of function
arguments and results to support argument and result list mutation. This
is not always correct, in particular, LLVM dialect functions require
exactly one result making it impossible to erase the result.
Allow function type cloning to fail and propagate this failure through
various APIs that use it. The common assumption is that existing IR has
not been modified.
Fixes#131142.
Reland a8c7ecdcbc3e89b493b495c6831cc93671c3b844 / #136300.
`FunctionOpInterface` assumed the fact that the function type (attribute
of the operation) can be cloned with arbirary lists of function
arguments and results to support argument and result list mutation. This
is not always correct, in particular, LLVM dialect functions require
exactly one result making it impossible to erase the result.
Allow function type cloning to fail and propagate this failure through
various APIs that use it. The common assumption is that existing IR has
not been modified.
Fixes#131142.
This commit adds 1:N support to `SignatureConversion::remapInputs`. This
API allows users to replace a block argument with multiple replacement
values. (And the block argument is dropped.) The API already supported
"bbarg --> multiple bbargs" mappings, but "bbarg --> multiple SSA
values" was missing.
---------
Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
Certain GPU->NVVM patterns compete with Arith->LLVM patterns. (The ones
that lower to libdevice.) Add an optional `benefit` parameter to all
`populate` functions so that users can give preference to GPU->NVVM
patterns.
This is a follow-up to #127844. That PR got vectors of arbitrary rank
working, but I hadn't thought about the rank-0 case.
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
There was a discrepancy between the type-converter and rewrite-pattern
parts of conversion to LLVM used in various GPU targets, at least ROCDL
and NVVM:
- The TypeConverter part was handling vectors of arbitrary rank,
converting them to nests of `!llvm.array< ... >` with a vector at the
inner-most dimension:
8337d01e30/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp (L629-L655)
- The rewrite pattern part was not handling `llvm.array`:
8337d01e30/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp (L594-L596)
That led to conversion failures when lowering `math` dialect ops on
rank-2 vectors, as in the testcase being added in this PR.
This PR fixes this by reusing a shared utility already used in other
conversions to LLVM:
8337d01e30/mlir/lib/Conversion/LLVMCommon/VectorPattern.cpp (L80-L104)
---------
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
Introduce a subset of floating point classification ops to the Math
dialect. These ops mirror functions provided by the C math library and,
similarly to the existing `math.copysign`, belong to the math dialect.
Add a lowering of those ops to Nvidia libdevice calls when possible as
the first mechanism to exercise them.
LLVM itself is generally moving away from using `undef` and towards
using `poison`, to the point of having a lint that caches new uses of
`undef` in tests.
In order to not trip the lint on new patterns and to conform to the
evolution of LLVM
- Rename valious ::undef() methods on StructBuilder subclasses to
::poison()
- Audit the uses of UndefOp in the MLIR libraries and replace almost all
of them with PoisonOp
The remaining uses of `undef` are initializing `uninitialized` memrefs,
explicit conversions to undef from SPIR-V, and a few cases in
AMDGPUToROCDL where usage like
%v = insertelement <M x iN> undef, iN %v, i32 0
%arg = bitcast <M x iN> %v to i(M * N)
is used to handle "i32" arguments that are are really packed vectors of
smaller types that won't always be fully initialized.
This commit is a follow-up to 99a562b3cb17e89273ba0fe77129f2fb17a19381,
which migrated some of the mlir-vulkan-runner tests to mlir-cpu-runner
using a new pipeline and set of wrappers. That commit could not migrate
all the tests, because the existing calling conventions/ABIs for kernel
arguments generated by GPUToLLVMConversionPass were not a good fit for
the Vulkan runtime. This commit fixes this and migrates the remaining
tests. With this commit, mlir-vulkan-runner and many related components
are now unused, and they will be removed in a later commit (see #73457).
The old calling conventions require both the caller (host LLVM code) and
callee (device code) to have compile-time knowledge of the precise
argument types. This works for CUDA, ROCm and SYCL, where there is a
C-like calling convention agreed between the host and device code, and
the runtime passes through arguments as raw data without comprehension.
For Vulkan, however, the interface declared by the shader/kernel is in a
more abstract form, so the device code has indirect access to the
argument data, and the runtime must process the arguments to set up and
bind appropriately-sized buffer descriptors.
This commit introduces a new calling convention option to meet the
Vulkan runtime's needs. It lowers memref arguments to {void*, size_t}
pairs, which can be trivially interpreted by the runtime without it
needing to know the original argument types. Unlike the stopgap measure
in the previous commit, this system can support memrefs of various ranks
and element types, which unblocked migrating the remaining tests.
This commit adds new wrappers around the MLIR Vulkan runtime which
implement the mgpu* APIs (as generated by GPUToLLVMConversionPass), adds
an optional LLVM lowering to the Vulkan runner mlir-opt pipeline based
on GPUToLLVMConversionPass, and migrates several of the
mlir-vulkan-runner tests to use mlir-cpu-runner instead, together with
the new pipeline and wrappers.
This is a further incremental step towards eliminating
mlir-vulkan-runner and its associated pipeline, passes and wrappers
(#73457). This commit does not migrate all of the tests to the new
system, because changes to the mgpuLaunchKernel ABI will be necessary to
support the tests that use multi-dimensional memref arguments.
This commit add an NVIDIA-specific lowering of `cf.assert` to to
`__assertfail`.
Note: `getUniqueFormatGlobalName`, `getOrCreateFormatStringConstant` and
`getOrDefineFunction` are moved to `GPUOpsLowering.h`, so that they can
be reused.
This patch fixes:
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp:535:13: error:
'applyPatternsAndFoldGreedily' is deprecated: Use
applyPatternsGreedily() instead [-Werror,-Wdeprecated-declarations]
Clean up `populateVectorToLLVMConversionPatterns` so that it populates
only conversion patterns. All rewrite patterns that do not lower to LLVM
should be populated into a separate greedy pattern rewrite.
The current combination of rewrite patterns and conversion patterns
triggered an edge case when merging the 1:1 and 1:N dialect conversions.
Depends on #119973.
This patch adds the `ConvertToLLVMAttrInterface` and
`ConvertToLLVMOpInterface` interfaces. It also modifies the
`convert-to-llvm` pass to use these interfaces when available.
The `ConvertToLLVMAttrInterface` interfaces allows attributes to
configure conversion to LLVM, including the conversion target, LLVM type
converter, and populating conversion patterns. See the `NVVMTargetAttr`
implementation of this interface for an example of how this interface
can be used to configure conversion to LLVM.
The `ConvertToLLVMOpInterface` interface collects all convert to LLVM
attributes stored in an operation.
Finally, the `convert-to-llvm` pass was modified to use these interfaces
when available. This allows applying `convert-to-llvm` to GPU modules
and letting the `NVVMTargetAttr` decide which patterns to populate.
This commit marks the type converter in `populate...` functions as
`const`. This is useful for debugging.
Patterns already take a `const` type converter. However, some
`populate...` functions do not only add new patterns, but also add
additional type conversion rules. That makes it difficult to find the
place where a type conversion was added in the code base. With this
change, all `populate...` functions that only populate pattern now have
a `const` type converter. Programmers can then conclude from the
function signature that these functions do not register any new type
conversion rules.
Also some minor cleanups around the 1:N dialect conversion
infrastructure, which did not always pass the type converter as a
`const` object internally.
This commit introduces a ConstantRange attribute to match the
ConstantRange attribute type present in LLVM IR.
It then refactors the LLVM_IntrOpBase so that the basic part of the
intrinsic builder code can be re-used without needing to copy it or
get rid of important context. This, along with adding code for
handling an optional `range` attribute to that same base, allows us to
make the support for range() annotations generic without adding
another bit to IntrOpBase.
This commit then updates the lowering of index intrinsic operations to
use the new ConstantRange attribute and fixes a bug (where we'd be
subtracting 1 from upper bounds instead of adding it on operations
like gpu.block_dim) along the way.
The point of these changes is to enable these range annotations to be
used for the corresponding NVVM operations in a future commit.
Implement mapping:
- `global`: 1
- `workgroup`: 3
- `private`: 0
Add `addressSpaceToStorageClass`, mapping GPU address spaces to SPIR-V
storage classes to be able to use SPIR-V's
`storageClassToAddressSpace`, mapping SPIR-V storage classes to LLVM
address spaces according to our mapping above *by definition*.
---------
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Add support in `-convert-gpu-to-llvm-spv` to convert `gpu.func` to
`llvm.func` operations.
- `spir_kernel`/`spir_func` calling conventions used for
kernels/functions.
- `workgroup` attributions encoded as additional `llvm.ptr<3>`
arguments.
- No attribute used to annotate kernels
- `reqd_work_group_size` attribute using to encode
`gpu.known_block_size`.
- `llvm.mlir.workgroup_attrib_size` used to encode workgroup attribution
sizes. This will be attached to the pointer argument workgroup
attributions lower to.
**Note**: A notable missing feature that will be addressed in a
follow-up PR is a `-use-bare-ptr-memref-call-conv` option to replace
MemRef arguments with bare pointers to the MemRef element types instead
of the current MemRef descriptor approach.
---------
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Support fastMath and other non-supported mathOp which only require float
operands and call libdevice function directly to nvvm.
1. lowering mathOp with fastMath attribute to correct libdevice
intrinsic.
2. some mathOp in math dialect has been lowered to libdevice now, but it
doesn't cover all mathOp. so this mr lowers all the remaining mathOp
which only require float operands.
Before this commit, there used to be a workaround in the
`func.func`/`gpu.func` op lowering when the bare-pointer calling
convention is enabled. This workaround "patched up" the argument
materializations for memref arguments. This can be done directly in the
argument materialization functions (as the TODOs in the code base
indicate).
This commit effectively reverts back to the old implementation
(a664c14001fa2359604527084c91d0864aa131a4) and adds additional checks to
make sure that bare pointers are used only for function entry block
arguments.
The `gpu.func` op lowering accounts for memref arguments/results (both
"normal" and bare-pointer supported), but the `gpu.return` op lowering
did not. The lowering produced invalid IR that did not verify.
This commit uses the same lowering strategy as for `func.return` in the
`gpu.return` lowering. (The C++ implementation is copied. We may want to
share some code between `func` and `gpu` lowerings in the future.)
This change reworks how range information for GPU dispatch IDs (block
IDs, thread IDs, and so on) is handled.
1. `known_block_size` and `known_grid_size` become inherent attributes
of GPU functions. This makes them less clunky to work with. As a
consequence, the `gpu.func` lowering patterns now only look at the
inherent attributes when setting target-specific attributes on the
`llvm.func` that they lower to.
2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size`
are made official dialect-level discardable attributes which can be
placed on arbitrary functions. This allows for progressive lowerings
(without this, a lowering for `gpu.thread_id` couldn't know about the
bounds if it had already been moved from a `gpu.func` to an `llvm.func`)
and allows for range information to be provided even when
`gpu.*_{id,dim}` are being used outside of a `gpu.func` context.
3. All of these index operations have gained an optional `upper_bound`
attribute, allowing for an alternate mode of operation where the bounds
are specified locally and not inherited from the operation's context.
These also allow handling of cases where the precise launch sizes aren't
known, but can be bounded more precisely than the maximum of what any
platform's API allows. (I'd like to thank @benvanik for pointing out
that this could be useful.)
When inferring bounds (either for range inference or for setting `range`
during lowering) these sources of information are consulted in order of
specificity (`upper_bound` > inherent attribute > discardable attribute,
except that dimension sizes check for `known_*_bounds` to see if they
can be constant-folded before checking their `upper_bound`).
This patch also updates the documentation about the bounds and inference
behavior to clarify what these attributes do when set and the
consequences of setting them up incorrectly.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
This patch updates the lowering of `LaunchFuncOp` in GPU to LLVM to only
legalize the operation with the converted operands, effectively removing
the lowering used by the old serialization pipeline.
It also removes all remaining uses of the old gpu serialization
infrastructure in `gpu-to-llvm`.
See [Compilation overview | 'gpu' Dialect - MLIR
docs](https://mlir.llvm.org/docs/Dialects/GPU/#compilation-overview) for
additional information on the target attributes compilation pipeline
that replaced the old serialization pipeline.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
10 under mlir/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
GEPArg can only be constructed from int32_t and mlir::Value. Explicitly
cast other types (e.g. unsigned, size_t) to int32_t to avoid narrowing
conversion warnings on MSVC. Some recent examples of such are:
```
mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'size_t' to 'T' requires a narrowing
conversion
with
[
T=mlir::LLVM::GEPArg
]
mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'unsigned int' to 'T' requires a narrowing
conversion
with
[
T=mlir::LLVM::GEPArg
]
```
Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
Setting thread block size with `maxntid` on the kernel has great
performance benefits. In this way, downstream PTX compiler can do better
register allocation.
MLIR's `gpu.launch` and `gpu.launch_func` already has an attribute
(`known_block_size`) that keeps the thread block size when it is known.
This PR simply uses this attribute to set `maxntid`.
Expand the copying of attributes on GPU kernel arguments during LLVM
lowering.
Support copying attributes from values that are already LLVM pointers.
Support copying attributes, like `noundef`, that aren't specific to (the
pointer parts of) arguments.