The code pattern that clang will generate for HLSL has changed from the
original plan. This allows the SPIR-V backend to generate code for the
current code generation.
It looks for patterns of the form:
```
%1 = @llvm.spv.resource.handlefrombinding
%2 = @llvm.spv.resource.getpointer(%1, index)
load/store %2
```
These three llvm-ir instruction are treated as a single unit that will
1. Generate or find the global variable identified by the call to
`resource.handlefrombinding`.
2. Generate an OpLoad of the variable to get the handle to the image.
3. Generate an OpImageRead or OpImageWrite using that handle with the
given index.
This will generate the OpLoad in the same BB as the read/write.
Note: Now that `resource.handlefrombinding` is not processed on its own,
many existing tests had to be removed. We do not have intrinsics that
are able to use handles to sampled images, input attachments, etc., so
we cannot generate the load of the handle. These tests are removed for
now, and will be added when those resource types are fully implemented.
``` - add clang builtin to Builtins.td
- link builtin in hlsl_intrinsics
- add codegen for spirv intrinsic and two directx intrinsics to retain
signedness information of the operands in CGBuiltin.cpp
- add semantic analysis in SemaHLSL.cpp
- add lowering of spirv intrinsic to spirv backend in
SPIRVInstructionSelector.cpp
- add lowering of directx intrinsics to WaveActiveOp dxil op in
DXIL.td
- add test cases to illustrate passespendent pr merges.
```
Resolves#70106
---------
Co-authored-by: Finn Plummer <canadienfinn@gmail.com>
This PR fixes https://github.com/llvm/llvm-project/issues/122075. We
prefer SPV_INTEL_optnone over SPV_EXT_optnone when both extensions are
available, otherwise, when a target specifies a required extension
explicitly rather than allowing any of those (e.g., by providing
--spirv-ext=all command line argument), the Backend's behavior remains
unchanged. An existing test case is updated to check the case of 2
alternative extensions available at the same time.
This PR is to address legacy issues with module analysis that currently
uses a complicated and not so efficient approach to trace dependencies
between SPIR-V id's via a duplicate tracker data structures and an
explicitly built dependency graph. Even a quick performance check
without any specialized benchmarks points to this part of the
implementation as a biggest bottleneck.
This PR specifically:
* eliminates a need to build a dependency graph as a data structure,
* updates the test suite (mainly, by fixing incorrect CHECK's referring
to a hardcoded order of definitions, contradicting the spec requirement
to allow certain definitions to go "in any order", see
https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_logical_layout_of_a_module),
* improves function pointers implementation so that it now passes
EXPENSIVE_CHECKS (thus removing 3 XFAIL's in the test suite).
As a quick sanity check of whether goals of the PR are achieved, we can
measure time of translation for any big LLVM IR. While testing the PR in
the local development environment, improvements of the x5 order have
been observed.
For example, the SYCL test case "group barrier" that is a ~1Mb binary IR
input shows the following values of the naive performance metric that we
can nevertheless apply here to roughly estimate effects of the PR.
before the PR:
```
$ time llc -O0 -mtriple=spirv64v1.6-unknown-unknown _group_barrier_phi.bc -o 1 --filetype=obj
real 3m33.241s
user 3m14.688s
sys 0m18.530s
```
after the PR
```
$ time llc -O0 -mtriple=spirv64v1.6-unknown-unknown _group_barrier_phi.bc -o 1 --filetype=obj
real 0m42.031s
user 0m38.834s
sys 0m3.193s
```
Next work should probably address Duplicate Tracker further, as it needs
analysis now from the perspective of what parts of it are not necessary
now, after changing the approach to implementation of the module
analysis step.
This PR adds the following features:
* saturation and float rounding mode decorations,
* arithmetic constrained floating-point intrinsics (strict_fadd,
strict_fsub, strict_fmul, strict_fdiv, strict_frem, strict_fma and
strict_fldexp),
* and SPV_INTEL_float_controls2 extension,
* using recent improvements of emit-intrinsics step, this PR also
simplifies pre- and post-legalizer steps and improves instruction
selection.
This PR improves general validity of emitted code between passes due to
generation of `TargetOpcode::PHI` instead of `SPIRV::OpPhi` after
Instruction Selection, fixing generation of OpTypePointer instructions
and using of proper virtual register classes.
Using `TargetOpcode::PHI` instead of `SPIRV::OpPhi` after Instruction
Selection has a benefit to support existing optimization passes
immediately, as an alternative path to disable those passes that use
`MI.isPHI()`. This PR makes it possible thus to revert
https://github.com/llvm/llvm-project/pull/116060 actions and get back to
use the `MachineSink` pass.
This PR is a solution of the problem discussed in details in
https://github.com/llvm/llvm-project/pull/110507. It accepts an advice
from code reviewers of the PR #110507 to postpone generation of OpPhi
rather than to patch CodeGen. This solution allows to unblock
improvements wrt. expensive checks and makes it unrelated to the general
points of the discussion about OpPhi vs. G_PHI/PHI.
This PR contains numerous small patches of emitted code validity that
allows to substantially pass rate with expensive checks. Namely, the
test suite with expensive checks set ON now has only 12 fails out of 569
total test cases.
FYI @bogner
The spec is available here:
https://github.com/intel/llvm/pull/12497
The PR doesn't add OpCooperativeMatrixApplyFunctionINTEL instruction as
it's still experimental and not properly tested E2E.
The PR also fixes few bugs in the related code:
1. CooperativeMatrixMulAddKHR optional operand must be literal, not a
constant;
2. Fixed available capabilities table creation for a case, when a single
extension adds few capabilities, that occupy not contiguous op codes.
---------
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
This PR fixes:
* emission of OpNames (added newly inserted internal intrinsics and
basic blocks)
* emission of function attributes (SRet is added)
* implementation of SPV_INTEL_optnone so that it emits OptNoneINTEL
Function Control flag, and add implementation of the SPV_EXT_optnone
SPIR-V extension.
```
- use the new OpSDot/OpUDot instructions when capabilites allow in SPIRVInstructionSelector.cpp
- correct functionality of capability check onto input operand and not return operand type in SPIRVModuleAnalysis.cpp
- add test cases to demonstrate use case in idot.ll
```
Resolves#114632
Resolves https://github.com/llvm/llvm-project/issues/99160
- [x] Implement `WaveActiveAnyTrue` clang builtin,
- [x] Link `WaveActiveAnyTrue` clang builtin with `hlsl_intrinsics.h`
- [x] Add sema checks for `WaveActiveAnyTrue` to
`CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp`
- [x] Add codegen for `WaveActiveAnyTrue` to `EmitHLSLBuiltinExpr` in
`CGBuiltin.cpp`
- [x] Add codegen tests to
`clang/test/CodeGenHLSL/builtins/WaveActiveAnyTrue.hlsl`
- [x] Add sema tests to
`clang/test/SemaHLSL/BuiltIns/WaveActiveAnyTrue-errors.hlsl`
- [x] Create the `int_dx_WaveActiveAnyTrue` intrinsic in
`IntrinsicsDirectX.td`
- [x] Create the `DXILOpMapping` of `int_dx_WaveActiveAnyTrue` to `113`
in `DXIL.td`
- [x] Create the `WaveActiveAnyTrue.ll` and
`WaveActiveAnyTrue_errors.ll` tests in `llvm/test/CodeGen/DirectX/`
- [x] Create the `int_spv_WaveActiveAnyTrue` intrinsic in
`IntrinsicsSPIRV.td`
- [x] In SPIRVInstructionSelector.cpp create the `WaveActiveAnyTrue`
lowering and map it to `int_spv_WaveActiveAnyTrue` in
`SPIRVInstructionSelector::selectIntrinsic`.
- [x] Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveAnyTrue.ll`
---------
Co-authored-by: Finn Plummer <50529406+inbelic@users.noreply.github.com>
Co-authored-by: Greg Roth <grroth@microsoft.com>
This commit adds an intrinsic that will write to an image buffer. We
chose to match the name of the DXIL intrinsic for simplicity in clang.
We cannot reuse the existing openCL write_image function because that is
not a reserved name in HLSL. There is not much common code to factor
out.
This commit adds an intrinsic that will read from an image buffer. We
chose to match the name of the DXIL intrinsic for simplicity in clang.
We cannot reuse the existing openCL readimage function because that is
not a reserved name in HLSL.
I considered trying to refactor generateReadImageInst, so that we could
share code between the two implementations. However, most of the code in
generateReadImageInst is concerned with trying to figure out which type
of image read is being done. Once we factor out the code that will be
common, then we end up with just a single call to the MIRBuilder being
common.
- create a clang built-in in Builtins.td
- link dot4add_i8packed in hlsl_intrinsics.h
- add lowering to spirv backend through expansion of operation as OPSDot
is missing up to SPIRV 1.6 in SPIRVInstructionSelector.cpp
- add lowering to spirv backend using OpSDot in applicable SPIRV version
or if SPV_KHR_integer_dot_product is enabled
- add dot4add_i8packed intrinsic to IntrinsicsDirectX.td and mapping to
DXIL.td op Dot4AddI8Packed
- add tests for HLSL intrinsic lowering to dx/spv intrinsic in
dot4add_i8packed.hlsl
- add tests for sema checks in dot4add_i8packed-errors.hlsl
- add test of spir-v lowering in SPIRV/dot4add_i8packed.ll
- add test to dxil lowering in DirectX/dot4add_i8packed.ll
Resolves#99220
This PR introduces changes into processing of internal/service data in
SPIRV Backend so that Duplicates Tracker accounts for possible changes
in Constant usage after optimization, namely this PR fixes the case when
a Constant register stored in Duplicates Tracker after all passes is
represented by a non-constant expression. In this case we may be sure
that it neither is able to create a duplicate nor is in need of a
special placement as a Constant instruction.
This PR doesn't introduce a new feature, and in this case we rely on
existing set of test cases in the SPIRV Backend test suite to ensure
that this PR doesn't break existing assumptions without introducing new
test cases. There is a reproducer of the issue available as part of SYCL
CTS test suite, however it's a binary of several MB's size. Given the
subtlety of the issue, reduction of the reproducer to a reasonable site
for inclusion into the SPIRV Backend test suite doesn't seem realistic.
The SPIRV backend has a special type named `spirv.Image`. This type is
meant to correspond to the OpTypeImage instruction in SPIR-V, but there
is one difference. The access qualifier operand in OpTypeImage is
optional. On top of that, the access qualifiers are only valid for
kernels, and not for shaders.
We want to reuse this type when generating shader from HLSL, but we
can't use the access qualifier. This commit make the access qualifer
optional in the target extension type.
The same is done for `spirv.SampledImage`.
Contributes to #81036
The commit introduces support for fundamental DI instruction. Metadata
handlers required for this instruction is stored inside debug records
(https://llvm.org/docs/SourceLevelDebugging.html) parts of the module
which rises the necessity of it's traversal.
This PR is to ensure that OpExtInst instructions generated by
NonSemantic_Shader_DebugInfo_100 are not mixed up with other OpExtInst
instructions.
Original implementation
(https://github.com/llvm/llvm-project/pull/97558) has introduced an
issue by moving OpExtInst instruction with the 3rd operand equal to
DebugSource (value 35) or DebugCompilationUnit (value 1) even if
OpExtInst is not generated by NonSemantic_Shader_DebugInfo_100
implementation code.
The reproducer is attached as a new test case. The code of the test case
reproduces the issue, because "lgamma" has the same code (35) inside
OpenCL_std as DebugSource inside NonSemantic_Shader_DebugInfo_100.
This commit introduces emission of DebugSource, DebugCompileUnit from
NonSemantic.Shader.DebugInfo.100 and required OpString with filename.
NonSemantic.Shader.DebugInfo.100 is divided, following DWARF into two
main concepts – emitting DIE and Line.
In DWARF .debug_abbriev and .debug_info sections are responsible for
emitting tree with information (DEIs) about e.g. types, compilation
unit. Corresponding to that in NonSemantic.Shader.DebugInfo.100 have
instructions like DebugSource, DebugCompileUnit etc. which preforms same
role in SPIR-V file. The difference is in fact that in SPIR-V there are
no sections but logical layout which forces order of the instruction
emission.
The NonSemantic.Shader.DebugInfo.100 requires for this type of global
information to be emitted after OpTypeXXX and OpConstantXXX
instructions.
One of the goals was to minimize changes and interaction with
SPIRVModuleAnalysis as possible which current commit achieves by
emitting it’s instructions directly into MachineFunction.
The possibility of duplicates are mitigated by guard inside pass which
emits the global information only once in one function.
By that method duplicates don’t have chance to be emitted.
From that point, adding new debug global instructions should be
straightforward.
This PR continues https://github.com/llvm/llvm-project/pull/94467 and
contains fixes in emission of type intrinsics, constant recording and
corresponding test cases:
* type-deduce-global-dup.ll -- fix of integer constant emission on
32-bit platforms and correct type deduction for globals
* type-deduce-simple-for.ll -- fix of GEP translation (there was an
issue previously that led to incorrect translation/broken logic of
for-range implementation)
This PR also:
* fixes a cast between identical storage classes and updates the test
case to include validation run by spirv-val,
* ensures that Bitcast for pointers satisfies the requirement that the
address spaces must match and adds the corresponding test case,
* improve encode in Tablegen and decode in code of dependencies between
SPIR-V entities and required capability/extensions,
* prevent emission of identical OpTypePointer instructions.
This PR introduces support of llvm.ptr.annotation to SPIR-V Backend, and
implement several extensions which make use of spirv.Decorations and
llvm.ptr.annotation to annotate global variables and pointers:
- SPV_INTEL_cache_controls
- SPV_INTEL_global_variable_host_access
- SPV_INTEL_global_variable_fpga_decorations
This PR introduces support for inline assembly calls for SPIR-V Backend
in general, and support for SPV_INTEL_inline_assembly [1] extension in
particular. The former part of the PR is agnostic towards
vendor-specific requirements and resolves the task of supporting
successful transformation of inline assembly as long as it's possible
without specific SPIR-V instruction codes.
As a part of the PR there appears an opportunity to bring coherent
inline assembly information up to latest passes of the transformation
process (emitting final SPIR-V instructions), so that PR makes it easy
to add any another required flavor of inline assembly, other then
supported by the vendor specific SPV_INTEL_inline_assembly extension,
if/when needed.
At the moment, however, SPV_INTEL_inline_assembly is the only
implemented way to bring LLVM IR inline assembly calls up to valid
SPIR-V instructions and also the default one. This means that inline
assembly calls will generate an error message of such extension is not
used to prevent LLVM-generated error messages at the final stages of
translation. When the SPV_INTEL_inline_assembly extension is mentioned
among supported, translation of inline assembly is intercepted by this
extension implementation on a pre-legalizer step, and this is a place
where support for a new inline assembly extension may be added if
needed.
This PR also extends support for register classes, improves type
inference during pre-legalizer pass, and fixes a minor bug with
asm-printing of string literals.
[1]
https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc
Recognize `cl_khr_kernel_clock` builtins and translate them to
`OpReadClockKHR` instructions. The `Scope` operand is deduced from the
builtin function name.
spirv-val does not pass yet due to OpReadClockKHR only supporting the
valid scopes for Vulkan (Device and Subgroup, but not Workgroup), so
leave validation disabled with a TODO.
This patch:
- Adds SPIR-V backend's registered generator magic number to the emitted
binary. The magic number consists of the generator ID (43) and LLVM
major version.
- Adds SPIR-V version to the binary.
- Allows reading the expected (maximum supported) SPIR-V version from
the target triple.
- Uses VersionTuple for representing versions throughout the backend's
codebase.
- Registers v1.6 for spirv32 and spirv64 triple.
See more: https://github.com/KhronosGroup/SPIRV-Headers/commit/7d500c
If there is no information about OpenCL version we are forced to
generate OpenCL 1.0 by default for the OpenCL environment to avoid
puzzling run-times with Unknown/0.0 version output. For a reference,
LLVM-SPIRV Translator avoids potential issues with run-times in a
similar manner.
Add support to generate valid SPIR-V for the WaveGetLaneIndex() HLSL
builtin.
To implement this, I had to fix a few small issues in the backend, like
the i8* pointer type being emitted, even if we have the type information
elsewhere.
Signed-off-by: Nathan Gauër <brioche@google.com>
This PR:
* adds Lifetime intrinsics/instructions
* fixes how the binary header is emitted (correct version and better
approximation of Bound)
* add validation into more test cases
This PR is to add explicit support for SPV_KHR_float_controls
(https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_float_controls.asciidoc).
This extension is included into SPIR-V after version 1.4, but in case of
lower versions it is to be included explicitly and OpExtension must be
present in the module with `OpExtension "SPV_KHR_float_controls"`.
This PR fixes this issue and fixes the test case
test/CodeGen/SPIRV/exec_mode_float_control_khr.ll to account for a
version lower than 1.4.
This PR adds SPIR-V extension SPV_INTEL_variable_length_array that
allows to allocate local arrays whose number of elements is unknown at
compile time:
* add a new SPIR-V internal intrinsic:int_spv_alloca_array
* legalize G_STACKSAVE and G_STACKRESTORE
* implement allocation of arrays (previously getArraySize() of
AllocaInst was not used)
* add tests
This PR is to add support for the SPIR-V extension
SPV_KHR_uniform_group_instructions that adds new instructions to SPIR-V
to support additional group operations within uniform control flow.
This PR adds support for atomic instruction on floating-point numbers:
* SPV_EXT_shader_atomic_float_add
* SPV_EXT_shader_atomic_float_min_max
* SPV_EXT_shader_atomic_float16_add
and fixes asm printer output for half floating-type.
This PR adds support for the SPV_KHR_linkonce_odr extension and modifies
existing negative test with a positive check for the extension and
proper linkage type in case when the extension is enabled.
SPV_KHR_linkonce_odr adds a "LinkOnceODR" linkage type, allowing proper
translation of, for example, C++ templates classes merging during
linking from different modules and supporting any other cases when a
global variable/function must be merged with equivalent global
variable(s)/function(s) from other modules during the linking process.
By SPIR-V specification: "If an instruction, enumerant, or other feature
specifies multiple enabling capabilities, only one such
capability needs to be declared to use the feature."
However, one capability may be preferred over another. One important
case is Shader capability that may not be supported by a backend, but
always is inserted if "OpDecorate SpecId" is found, because Enabling
Capabilities for the latter is the list of Shader and Kernel, where
Shader is coming first and thus always selected as the first available
option.
In this PR we address the problem by keeping current behaviour of
selecting the first option among enabling capabilities as is, but giving
a user a way to filter capabilities during the selection process via a
newly introduced "--avoid-spirv-capabilities" command line option. This
option is to avoid selection of certain capabilities if there are other
available enabling capabilities.
This PR is changing also existing pruneCapabilities() function. It
doesn't remove capability from module requirement anymore, but only adds
implicitly required capabilities recursively, so its name is changed
accordingly. This change fixes the present bug in collecting required by
a module capabilities. Before the change, introduced by this PR,
pruneCapabilities() function has been removing, for example, Kernel
capability from required by a module, because Kernel is initially
required and the second time it was needed pruneCapabilities() removed
it by mistake.
The goal of this PR is to fix an issue when Module Analysis stage is not
able to complete processing of a really big LLVM source:
https://github.com/llvm/llvm-project/issues/76048.
There is an example of a bulky LLVM source:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/main/test/SpecConstants/long-spec-const-composite.ll
Processing of this file with
`llc -mtriple=spirv64-unknown-unknown -O0 long-spec-const-composite.ll
-o long-spec-const-composite.spvt`
to produce SPIR-V output using LLVM SPIR-V backend takes too long, and
I've never been able to see it actually completes. After the patch from
this PR applied elapsed time for me is ~30 sec.
The fix changes underlying data structure to be `std::set` to trace
instructions with identical operands instead of the existing approach of
the `findSameInstrInMS()` function.
Add Float16 to Vulkan's available capabilities, and guard Float16Buffer
(Kernel-only capability) against being added outside OpenCL
environments.
Add tests to verify half and half vector types, and validate with
spirv-val.
Fixes#66398
We were removing bit_instructions cap from All caps but this was a
mistake.
Test SPV_KHR_bit_instructions was failing. Remove function
removeCapabilityIf. It was not being done correctly and is now
unnecessary.
Adds new extension SPV_KHR_expect_assume, new capability
ExpectAssumeKHR as well as the new instructions:
* OpExpectKHR
* OpAssumeTrueKHR
These are lowered from respectively llvm.expect.<ty> and llvm.assume
intrinsics.
Previously https://reviews.llvm.org/D157696