Raw (as in ByteAddress) buffer accesses in DXIL must specify
ElementIndex as undef, and Structured buffer accesses must specify a
value. Ensure that we do this correctly in DXILResourceAccess, and
enforce that the operations are valid in DXILOpLowering.
Fixes#173316
fixes#165051
This change reverts the experiment we did for #165311
While some backends seem to support llvm.assume without validation The
validator itself does not so it makes more sense to just remove it.
fixes#165051
Change is just to let the assume intrinsic pass on unmodified. Test is
to confirm the DXIL disassembler doesn't blow up whe we generate DXIL
with this intrinsic.
Introduces LLVM intrinsic `llvm.dx.resource.getdimensions.x` and its lowering to DXIL op `op.dx.getDimensions`.
The intrinsic will be used to implement `GetDimension` for buffers. The lowering is using `undef` value since it is required by the DXIL format which is based on LLVM 3.7.
Proposal update: https://github.com/llvm/wg-hlsl/pull/350Closes#112982
Introduces `llvm.{dx|svp}.resource.nonuniformindex` intrinsic that will be used when a resource index is not guaranteed to be uniform across threads (HLSL function NonUniformResourceIndex).
The DXIL lowering layer looks for this intrinsic call in the resource index calculation, makes sure it is reflected in the NonUniform flag on DXIL create handle ops (`dx.op.createHandle` and `dx.op.createHandleFromBinding`), and then removes it from the module.
Closes#155701
Removes uniformity bit from resource initialization intrinsics `llvm.{dx|spv}.resource.handlefrombinding` and `llvm.{dx|spv}.resource.handlefromimplicitbinding`. The flag currently always set to `false`. It should be derived from resource analysis and not provided by codegen.
Closes#135452
As part of the Root Signature Spec, we need to validate if Root
Signatures are not defining overlapping ranges.
Closes: https://github.com/llvm/llvm-project/issues/126645
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com>
Co-authored-by: Joao Saffran <jderezende@microsoft.com>
Fixes#152754
- Fixes the ArgOperand index in `DXILOpLowering.cpp` used to obtain the
pointer operand of a lifetime intrinsic.
- Updates the tests
`llvm/test/CodeGen/DirectX/legalize-lifetimes-valver-1.5.ll`,
`llvm/test/CodeGen/DirectX/legalize-lifetimes-valver-1.6.ll`,
`llvm/test/CodeGen/DirectX/ShaderFlags/lifetimes-noint64op.ll`, and
`llvm/test/tools/dxil-dis/lifetimes.ll` to use the new size-less
lifetime intrinsic
- Removes lifetime intrinsics from the test
`llvm/test/CodeGen/DirectX/legalize-memset.ll` to be consistent with the
corresponding memcpy test which does not have lifetime intrinsics.
(Removal of lifetime intrinsics from tests like this was suggested here
in the past:
https://github.com/llvm/llvm-project/pull/139173#discussion_r2091778868)
- Rewrites the lifetime legalization functions in the EmbedDXILPass to
re-add the explicit size argument for DXIL
Fixes#147394
References DXC for the implementation logic:
d751c827ed/lib/HLSL/DxilPreparePasses.cpp (L693-L699)
If DXIL Version < 1.6 then replace lifetime intrinsics with stores
- For validator version >= 1.6, store an undef
- For validator version < 1.6, store zeros
else keep the lifetime intrinsics in the DXIL.
After this PR, the number of DML shaders failing validation due to
#146974 is reduced from 157 to 50.
This annotates the `Twine` passed to the constructors of the various
DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes
us to warn when we would try to print the twine after it had already
been destructed.
We also update `DiagnosticInfoUnsupported` to hold a `const Twine &`
like all of the other DiagnosticInfo classes, since this warning allows
us to clean up all of the places where it was being used incorrectly.
Adds resource name argument to `llvm.dx.handlefrombinding` and `llvm.dx.handlefromimplicitbinding` intrinsics.
SPIR-V currently does not seem to need the resource names so this change only affects DirectX binding intrinsics.
Part 2/4 of https://github.com/llvm/llvm-project/issues/105059
Fixes#137209
This PR:
- Adds a case to `expandIntrinsic()` in `DXILIntrinsicExpansion.cpp` to
expand the `Intrinsic::is_fpclass` in the case of
`FPClassTest::fcNegZero`
- Defines the `IsNaN`, `IsFinite`, `IsNormal` DXIL ops in `DXIL.td`
- Adds a case to `lowerIntrinsics()` in `DXILOpLowering.cpp` to handle
the lowering of `Intrinsic::is_fpclass` to the DXIL ops `IsNaN`,
`IsInf`, `IsFinite`, `IsNormal` when the FPClassTest is `fcNan`,
`fcInf`, `fcFinite`, and `fcNormal` respectively
- Creates a test `llvm/test/CodeGen/DirectX/is_fpclass.ll` to exercise
the intrinsic expansion and DXIL op lowering of `Intrinsic::is_fpclass`
~~A separate PR will be made to remove the now-redundant `dx_isinf`
intrinsic to address #87777.~~
A proper implementation for the lowering of the `llvm.is.fpclass`
intrinsic to handle all possible combinations of FPClassTest can be
implemented in a separate PR. This PR's implementation focuses primarily
on addressing the current use-cases for DirectML and HLSL intrinsics.
This moves the responsibility for cleaning up dead intrinsics from
DXILFinalizeLinkage to DXILOpLowering, and moves DXILFinalizeLinkage
back to it's pre-#136244 place in the pipeline. Doing this avoids issues
with DXIL passes running on obviously dead code, and makes it more clear
what DXILFinalizeLinkage is really doing.
This also helps with the story for #134260, as cleaning up dead
intrinsics doesn't make sense if this becomes a more generic pass.
Note that test/CodeGen/DirectX/remove-dead-intriniscs.ll already covers
most of the testing here. It'd be nice to have something that catches
the regression from changing the pass ordering but I couldn't come up
with anything that wouldn't be incredibly fragile.
Fixes#138180.
fixes#136620
It was determined that the lifetime intrinsics generated by clang are
likely more correct than the ones in DXC hence explaining the missing
lifetimes between the IR diffs.
As such we are legalizing lllvm lifetime intrinsics by letting them all
pass on through.
fixes#135654
In #128613 we added safe guards to prevent the lowering of just any
intrinsic in the backend. We used `DiagnosticInfoUnsupported` to do
this.
What we found was when using `opt` the diagnostic print function was
called but when using clang the diagnostic message was used.
Printing message in the clang version means we miss valuable debugging
information like function name and function type when LLVMContext was
only needed to call `getBestLocationFromDebugLoc`.
There are a few potential fixes
1. Write a custom DiagnosticInfoUnsupported so we can change the Message
just for DirectX. Too heavy handed so rejected.
2. Add the function name to the Message in DirectX code. Very simple one
line change. Downside is when using opt you see the function name twice.
But makes the clang-dxc bugs more actionable.
3. change CodeGenAction.cpp to always use the print function and not the
message directly. Downside is a bunch of innacurate information shows up
in the message if you don't specify `-debug-info-kind=standalone`.
4. add some book keeping to know which function called the intrinsic
keep a map of these so we can pass the calling function to
`DiagnosticInfoUnsupported` instead of the intrinsic. This would only be
useful if we had debug info so we could distinguish different uses of
the intrinsic by line\col number. We would also need to change from
iterating on every function to doing something like a LazyCallGraph
which is a nonstarter.
5. pick a different means of doing a Diagnostic error, because other
uses of `DiagnosticInfoUnsupported` error when we are in the body of a
function not when we see one being used like in the intrinsic case.
This PR went with a combo of option 2 & 5. Its low code change that also
only impacts the DirectX backend.
The `dx.dot2`, `dot3`, and `dot4` intrinsics exist purely to lower
`dx.fdot`, and they map exactly to the DXIL ops of the same name. Using
vectors for their arguments adds unnecessary complexity and causes us to
have vector operations that are not trivial to lower post-scalarizer.
Similarly, the `dx.dot2add` intrinsic is overly generic for something
that only needs to lower to a single `dot2AddHalf` DXIL op. Update its
signature to match the operation it lowers to.
Fixes#134569.
Resolves#99221
Key points: For SPIRV backend, it decompose into a `dot` followed a
`add`.
- [x] Implement dot2add clang builtin,
- [x] Link dot2add clang builtin with hlsl_intrinsics.h
- [x] Add sema checks for dot2add to CheckHLSLBuiltinFunctionCall in
SemaHLSL.cpp
- [x] Add codegen for dot2add to EmitHLSLBuiltinExpr in CGBuiltin.cpp
- [x] Add codegen tests to clang/test/CodeGenHLSL/builtins/dot2add.hlsl
- [x] Add sema tests to clang/test/SemaHLSL/BuiltIns/dot2add-errors.hlsl
- [x] Create the int_dx_dot2add intrinsic in IntrinsicsDirectX.td
- [x] Create the DXILOpMapping of int_dx_dot2add to 162 in DXIL.td
- [x] Create the dot2add.ll and dot2add_errors.ll tests in
llvm/test/CodeGen/DirectX/
Removing `DXILResourceMDAnalysis` that gathers information about
resources for the `DXILTranslateMetadata` pass. It collects the info
based on obsolete resource metadata annotations that are going to be
removed soon.
Part 1/2 of #114126
Update the lowering of `llvm.dx.resource.store.typedbuffer` to match DXC
and repeat the first element in cases where we are storing fewer than 4
elements.
Fixes#128110
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.
For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.
The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
Fixes#128071
The current behavior lets intrinsics that don't map to a DXILOP slip
through. Nothing catches this until we hit the DXIL validator. This
change fails earlier so we don't encode invalid llvm intrinsics that can
slip through because of clang builtins like `__builtin_reduce_and`
example:
https://hlsl.godbolt.org/z/13rPj18vn
This removes "replaceFunctionWithNamedStructOp" and folds its
functionality into "replaceFunctionWithOp". It turns out we were
overcomplicating things and this is trivial to handle generically.
Fixes#113192
This introduces `@llvm.dx.resource.load.rawbuffer` and generalizes the
buffer load docs under DirectX/DXILResources.
This resolves the "load" parts of #106188
We'd introduced separate versions of `llvm.dx.resource.load` with a
struct return to handle the CheckAccessFullyMapped case without making
the IR for the common case unnecessarily complicated. However, at this
point the common case is really `resource.getpointer`, so the ergonomics
of a simplified version of `load` don't actually gain us as much as the
cost of having multiple opcodes.
Drop the `dx.resource.loadchecked` functions and have `dx.resource.load`
consistently return `{element_type, i1}`.
Move the DXILOpLoweringPass after DXILTranslateMetadata, and add asserts
in DXILShaderFlags to ensure it isn't scheduled after op lowering. This
will allow us to rely on DirectX intrinsics in the shader flags analysis
rather than having to recover information from lowered operations.
Fixes#120119.
This splits the DXILResourceAnalysis pass into TypeAnalysis and
BindingAnalysis passes. The type analysis pass is made immutable and
populated lazily so that it can be used earlier in the pipeline without
needing to carefully maintain the invariants of the binding analysis.
Fixes#118400
Instead of storing an auxilliary structure with the information from the
DXIL resource target extension types duplicated, access the information
that we can via the type itself.
This also means we need to handle some of the target extension types we
haven't fully defined yet, like Texture and CBuffer. For now we make an
educated guess to what those should look like based on llvm/wg-hlsl#76,
and we can update them fairly easily when we've defined them more
thoroughly.
First part of #118400
DXILOpLowering runs after scalarization but `@llvm.dx.typedbuffer.store`
takes a vector, so the argument is usually an artifact. Avoid creating a
vector just to extract elements from it immediately.
fixes#112974
partially fixes#70103
An earlier version of this change was reverted so some issues could be fixed.
### Changes
- Added new tablegen based way of lowering dx intrinsics to DXIL ops.
- Added int_dx_group_memory_barrier_with_group_sync intrinsic in
IntrinsicsDirectX.td
- Added expansion for int_dx_group_memory_barrier_with_group_sync in
DXILIntrinsicExpansion.cpp`
- Added DXIL backend test case
### Related PRs
* [[clang][HLSL] Add GroupMemoryBarrierWithGroupSync intrinsic
#111883](https://github.com/llvm/llvm-project/pull/111883)
* [[SPIRV] Add GroupMemoryBarrierWithGroupSync intrinsic
#111888](https://github.com/llvm/llvm-project/pull/111888)
Introduces `__builtin_hlsl_buffer_update_counter` clang buildin that is
used to implement the `IncrementCounter` and `DecrementCounter` methods
on `RWStructuredBuffer` and `RasterizerOrderedStructuredBuffer` (see
Note).
The builtin is translated to LLVM intrisic `llvm.dx.bufferUpdateCounter`
or `llvm.spv.bufferUpdateCounter`.
Introduces `BuiltinTypeMethodBuilder` helper in `HLSLExternalSemaSource`
that enables adding methods to builtin types using builder pattern like
this:
```
BuiltinTypeMethodBuilder(Sema, RecordBuilder, "MethodName", ReturnType)
.addParam("param_name", Type, InOutModifier)
.callBuiltin("buildin_name", { BuiltinParams })
.finalizeMethod();
```
Fixes#113513
[First version](llvm/llvm-project#114148) of this PR was reverted
because of build break.
In the DXIL CreateHandle and CreateHandleFromBinding ops, resource
bindings are
indexed from the beginning of the binding space, not from the binding
itself.
Translate from an index into the binding to one from the beginning of
the space
when lowering to these operations.
Introduces `__builtin_hlsl_buffer_update_counter` clang buildin that is
used to implement the `IncrementCounter` and `DecrementCounter` methods
on `RWStructuredBuffer` and `RasterizerOrderedStructuredBuffer` (see
Note).
The builtin is translated to LLVM intrisic `llvm.dx.bufferUpdateCounter`
or `llvm.spv.bufferUpdateCounter`.
Introduces `BuiltinTypeMethodBuilder` helper in `HLSLExternalSemaSource`
that enables adding methods to builtin types using builder pattern like
this:
```
BuiltinTypeMethodBuilder(Sema, RecordBuilder, "MethodName", ReturnType)
.addParam("param_name", Type, InOutModifier)
.callBuiltin("buildin_name", { BuiltinParams })
.finalizeMethod();
```
Fixes#113513
By giving these intrinsics their appropriate attributes, loads of
globals that are stored on the other side of these calls can be
eliminated by the EarlyCSE pass. Stores to the same globals and the
globals themselves require more direct intervention as part of the
create/annotated handle lowering.
Adds a test that verifies that the unneeded globals and their uses can
be eliminated and also that the attributes are set properly.
Fixes#104271
fixes#112974
partially fixes#70103
### Changes
- Added new tablegen based way of lowering dx intrinsics to DXIL ops.
- Added int_dx_group_memory_barrier_with_group_sync intrinsic in
IntrinsicsDirectX.td
- Added expansion for int_dx_group_memory_barrier_with_group_sync in
DXILIntrinsicExpansion.cpp`
- Added DXIL backend test case
### Related PRs
* [[clang][HLSL] Add GroupMemoryBarrierWithGroupSync intrinsic
#111883](https://github.com/llvm/llvm-project/pull/111883)
* [[SPIRV] Add GroupMemoryBarrierWithGroupSync intrinsic
#111888](https://github.com/llvm/llvm-project/pull/111888)
Restricts hlsl countbits to always return a uint32.
Implements a lowering from llvm.ctpop which has an overloaded return
type to dxil cbits op which always returns uint32.
Closes#112779