This PR addresses a specific edge case when deciding whether or not to
produce a bitcast instruction.
Specifically, when the given instruction is a global array, the element
type of the array wasn't correctly compared to the return type. In this
specific case, if the types are equal, a bitcast shouldn't be created,
but it was.
This PR checks to see if the element type of the array is the same as
the return type, and if it is, it doesn't create a bitcast instruction.
Fixes https://github.com/llvm/llvm-project/issues/139013
partially fixes#142836
- Update DXILFlattenArrays.cpp GEPs to use two indicies since they are
array GEPs
- Update flatten test cases
- This change reduces dxv bitcast validation errors by 364 (Total now is
1070x)
- This change reduces dxv out of bounds validation errors by 124 (Total
is now 24)
- We are also able to successfully compile 4 more shaders
fixes#140415
The i8 legalization code in DXILLegalizePass's `fixI8UseChain` needs to
be updated to check for i8 geps.
It seems like there are i8 GEPs being left around after we remove all
the other i8 instructions and this is causing problem on validation.
Since this is cleaning up a missed GEP The approach is to assume the
getPointerOperand is to an alloca we further will check if this is an
array alloca then do some byte offset arithmetic to figure out the
memory index to use. Finally we will emit the new gep and cleanup the
old one.
Finally needed to update upcastI8AllocasAndUses to account for loads off
of GEPs instead of just loads from the alloca.
- adds parsing from metadata into dxcontainer binary
- adds validations as described in the spec
- adds testing scenarios
closes: [#126638](https://github.com/llvm/llvm-project/issues/126638)
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
Adds reporting of overlapping binding errors to `DXILPostOptimizationValidation` pass. Only runs when `DXILResourceBindingAnalysis` detects that there is a resource binding that overlaps while it is building up a map of available register spaces.
Fixes#110723
Since the `DXILResourceImplicitBinding` pass has been moved after `DXILForwardHandleAccesses` pass the forward-handle pass needs to handle `llvm.dx.resource.handlefromimplicitbinding` calls as well.
typedBufferLoad of double/double2 is expanded to a typedBufferLoad of a
<2 x i32>/<4 x i32> and asdouble
typedBufferStore of a double/double2 is expanded to a splitdouble and a
typedBufferStore of a <2 x i32>/<4 x i32>
Add tests showing result of intrinsic expansion for typedBufferLoad and
typedBufferStore
Add tests showing dxil op lowering can handle typedBufferLoad and
typedBufferStore where the target type doesn't match the typedBufferLoad
and typedBufferStore type
Closes#104423
Update resource type names for globals variables that we generate in `DXILTranslateMetadata` pass to include element type. This change prevents duplicate types for identical resources and brings the DXIL metadata names it closer to what DXC generates.
Moving `DXILResourceImplicitBinding` pass and the associated `DXILResourceBindingAnalysis` lower in the llc pipeline to just before the DXIL Resource Analysis, which is where its results are first needed, and adjusting the set of analyses it preserves.
The reason for this change is that I will soon be adding `DXILResourceBindingAnalysis` dependency to `DXILPostOptimizationValidation` pass and bringing this closer to where it is needed avoid unnecessary churn to preserved analysis setting in preceding passes.
Adds resource name argument to `llvm.dx.handlefrombinding` and `llvm.dx.handlefromimplicitbinding` intrinsics.
SPIR-V currently does not seem to need the resource names so this change only affects DirectX binding intrinsics.
Part 2/4 of https://github.com/llvm/llvm-project/issues/105059
- DXILDataScalarization should not just be limited to global data
- Add a scalarization for alloca
- Add ReversePostOrderTraversal of functions and iterate over basic
blocks and run DataScalarizerVisitor.
- fixes#140143
Adds checks for resource globals that were created for DXIL metadata.
The names of the globals and the names of the types will be changing
soon. Adding these to the baseline will make it easier to see what is
changing.
Fixes#137188
This PR legalizes memcpy for DXIL in cases where:
- the src and dst arguments are from Alloca or a GlobalVariable,
- the src and dst are pointers to an ArrayType,
- the array element types of src and dst must be equivalent, and
- the len param is a ConstantInt
These assumptions simplify the legalization and, with the addition of
#138991, covers the currently-known cases of memcpy that appear when
compiling DML shaders.
This PR may be unnecessary if #138788 determines that memset and memcpy
can be eliminated entirely.
---------
Co-authored-by: Finn Plummer <canadienfinn@gmail.com>
Co-authored-by: Greg Roth <grroth@microsoft.com>
There were some issues with these ops:
- The overload wasn't being specified (`dx.op.dot4AddPacked` vs
`dx.op.dot4AddPacked.i32`)
- The versioning wasn't correct (These ops were added in SM 6.4)
- The argument order was off - while the HLSL function has the
accumulator as the last argument, the DXIL op lists it first.
This fixes the DXIL.td definition and adjusts the LLVM DX and SPIRV
intrinsics to match the argument order in DXIL rather than the argument
order in HLSL.
Fixes#139018
Fixes#139024 and #139954
- Refactor DXILShaderFlags to compute the flags that apply to a whole
module before computing flags that apply individually to each function
- Make DXILResourceMap const, since it is not modified in
DXILShaderFlags
- Per-function shader flag analysis now initially starts with the set of
flags that apply to the whole module instead of starting from no flags.
This change fixes the above linked issues
- Fix shader flag tests affected by the above change
Fixes#138997 and does refactoring for low-precision-related flags.
The shader feature flags MinimumPrecision and NativeLowPrecision were
not being set, leading to validation errors.
This PR sets these shader feature flags [as in
DXC](377c4ca6d8/lib/DXIL/DxilShaderFlags.cpp (L58-L63))
and adds tests for them.
This PR also performs some refactoring of low-precision-related flags to
make it less confusing.
- The `UseNativeLowPrecision` DXIL module flag has been renamed to
`NativeLowPrecisionMode` to imply that it is setting some execution
state which the module should be interpreted with
- The LLVM module flag `dx.nativelowprec` is now read only once and sets
a bool to be used by `updateFunctionFlags()` and for setting the DXIL
module flag `NativeLowPrecisionMode`
- The `MinimumPrecision`, `NativeLowPrecision`, and
`LowPrecisionPresent` shader feature flags are all set together under
`updateFunctionFlags()`
- Moved the logic for setting DXIL module flags `NativeLowPrecisionMode`
and `ResMayNotAlias` out of the per-function loop and placed it
alongside the logic for setting other DXIL module flags
(`DisableOptimizations`, `Max64UAVs`, and `UAVsAtEveryStage` flags)
---------
Co-authored-by: Justin Bogner <mail@justinbogner.com>
Move dxil resource access legacy pass before intrinsic expansion legacy
pass so TypedBuffer Loads and Stores will be created before intrinsic
expansion.
This is to facilitate #104423
The `DXILResourceImplicitBinding` pass uses the results of
`DXILResourceBindingAnalysis` to assigns register slots to resources
that do not have explicit binding. It replaces all
`llvm.dx.resource.handlefromimplicitbinding` calls with
`llvm.dx.resource.handlefrombinding` using the newly assigned binding.
If a binding cannot be found for a resource, the pass will raise a
diagnostic error. Currently this diagnostic message does not include the
resource name, which will be addressed in a separate task (#137868).
Part 2/2 of #136786Closes#136786
Fixes#137209
This PR:
- Adds a case to `expandIntrinsic()` in `DXILIntrinsicExpansion.cpp` to
expand the `Intrinsic::is_fpclass` in the case of
`FPClassTest::fcNegZero`
- Defines the `IsNaN`, `IsFinite`, `IsNormal` DXIL ops in `DXIL.td`
- Adds a case to `lowerIntrinsics()` in `DXILOpLowering.cpp` to handle
the lowering of `Intrinsic::is_fpclass` to the DXIL ops `IsNaN`,
`IsInf`, `IsFinite`, `IsNormal` when the FPClassTest is `fcNan`,
`fcInf`, `fcFinite`, and `fcNormal` respectively
- Creates a test `llvm/test/CodeGen/DirectX/is_fpclass.ll` to exercise
the intrinsic expansion and DXIL op lowering of `Intrinsic::is_fpclass`
~~A separate PR will be made to remove the now-redundant `dx_isinf`
intrinsic to address #87777.~~
A proper implementation for the lowering of the `llvm.is.fpclass`
intrinsic to handle all possible combinations of FPClassTest can be
implemented in a separate PR. This PR's implementation focuses primarily
on addressing the current use-cases for DirectML and HLSL intrinsics.
This moves the responsibility for cleaning up dead intrinsics from
DXILFinalizeLinkage to DXILOpLowering, and moves DXILFinalizeLinkage
back to it's pre-#136244 place in the pipeline. Doing this avoids issues
with DXIL passes running on obviously dead code, and makes it more clear
what DXILFinalizeLinkage is really doing.
This also helps with the story for #134260, as cleaning up dead
intrinsics doesn't make sense if this becomes a more generic pass.
Note that test/CodeGen/DirectX/remove-dead-intriniscs.ll already covers
most of the testing here. It'd be nice to have something that catches
the regression from changing the pass ordering but I couldn't come up
with anything that wouldn't be incredibly fragile.
Fixes#138180.
fixes#136243
This change converts memset into a series of geps and stores It is
intentionally limited to memsets of fixed size It also converts the byte
stores to type stores.
DXIL does not support i8 plus this reduces the total number of gep and
store instructions.
This change also moves DXILFinalizeLinkage to run after Legalization to
clean up any dead intrinsic definitions.
fixes#137202
investingating i8 allocas I came to find some missing instructions from
out i8 legalization around load, store, and select.
Added those three.
To do i8 allocas right though we needed to walk the uses and find the
casts.
After finding the casts I chose to pick the smallest cast as the cast to
transform to. That would then let me preserve the larger casts that come
later
Fixes#112272
In addition to the implementation of the UAVsAtEveryStage shader flag
analysis, several unrelated tests have had the `dx.valver` module
metadata defined to avoid setting the UAVsAtEveryStage shader flag in
them.
Example:
```
!dx.valver = !{!0}
!0 = !{i32 1, i32 8}
```
---------
Co-authored-by: Justin Bogner <mail@justinbogner.com>
This PR revises the descriptions of DXIL module flags.
Descriptions such as `D3D11_1_SB_GLOBAL_FLAG_SKIP_OPTIMIZATION` are
referring to Global Flags in DXBC.
DXBC is not a supported backend target, so references to DXBC should not
be present.
There is also confusion with regards to the description of the
`LowPrecisionPresent` DXIL module flag, which currently reads
`D3D11_1_SB_GLOBAL_FLAG_ENABLE_MINIMUM_PRECISION` and implies the use of
minimum-precision to handle 16-bit types.
However this is not true, because both the flags `LowPrecisionPresent`
and `UseNativeLowPrecision` can simultaneously be set in the same DXIL
module, and minimum precision mode is mutually exclusive with native low
precision.
This PR revises the description of the `LowPrecisionPresent` flag to
accurately describe what it represents.
This PR introduces a Metadata Node Kind allowlist. The purpose is to
prevent newer Metadata Node Kinds to be used and inserted into the
outputted DXIL module. Only the metadata kinds that are accepted in the
DXIL Validator are on the allowlist. The Github DXC validator doesn't
support these newer Metadata Node Kinds, so we need to filter them out.
We introduce this restrictive allowlist into LLVM and strip all metadata
that isn't found in the list.
The accompanying test would add the `llvm.loop.mustprogress` metadata
node kind, but thanks to the allowlist, filters it out, and so the
whitelist is proven to work.
The test also has two separate metadata kinds that are on the allowlist,
and remain after the DXIL Prepare pass.
This PR primarily fixes the version-checking logic of the shader flags
`ResMayNotAlias` and `Max64UAVs` to correctly match DXC's behavior.
Primary changes:
- The logic for determining the presence of UAVs for the
`ResMayNotAlias` shader flag checked against the DXIL Version when it
should have been checking against the DXIL Validator Version. (See DXC:
[DxilShaderFlags.cpp#L484](f19b5da541/lib/DXIL/DxilShaderFlags.cpp (L484)))
- The logic for counting UAVs for the `Max64UAVs` shader flag checked
against the DXIL Version when it should have been checking against the
DXIL Validator Version. (See DXC:
[DxilModule.cpp#L327](f19b5da541/lib/DXIL/DxilModule.cpp (L327)))
- Tests have been modified to test the corrected behaviors for these two
flags
Additional changes included for consistency:
- The logic for setting `UseNativeLowPrecision` now checks against
Shader Model version instead of DXIL version to be consistent with the
code comments from DXC
([DxilShaderFlags.h#L280](f19b5da541/include/dxc/DXIL/DxilShaderFlags.h (L280))).
- An additional test has been added to ensure that the module flag
"dx.nativelowprec" set to 0 does not apply the `UseNativeLowPrecision`
shader flag
- Related shader flag tests were renamed to be more consistent, and some
comments were edited for clarification
- Add obj2yaml tests for the `Max64UAVs` shader flag
We can end up with loads of single element vectors when we have scalar
values, because the vectorizer may introduce these to use ops like
shufflevector in some cases. Make sure we're maintaining the correct
type when translating these into resource load operations.
Fixes#136409.
This pass attempts to forward resource handle creation to accesses of
the handle global. This avoids dependence on optimizations like CSE and
GlobalOpt for correctness of DXIL.
Fixes#134574.
fixes#136620
It was determined that the lifetime intrinsics generated by clang are
likely more correct than the ones in DXC hence explaining the missing
lifetimes between the IR diffs.
As such we are legalizing lllvm lifetime intrinsics by letting them all
pass on through.
Fixes [#114553](https://github.com/llvm/llvm-project/issues/114553)
This implementation replicates the behavior of DXC in setting the
`m_b64UAVs` flag: the `Max64UAVs` DXIL module flag is set in the
presence of more than 8 UAVs in a DXIL module.
The behavior of how UAV (resource) arrays are counted differs based on
Shader Model version:
- If Shader Model < 6.6, then a UAV array counts as a single UAV
regardless of its range size
- if Shader Model >= 6.6, then a UAV array contributes its range size to
the total number of UAVs
I initially thought the complete implementation of this analysis may be
blocked by the resource arrays implementation, but it seems that it is
not the case, as the `@llvm.dx.resource.handle*` already includes a
range size argument.
fixes#135654
In #128613 we added safe guards to prevent the lowering of just any
intrinsic in the backend. We used `DiagnosticInfoUnsupported` to do
this.
What we found was when using `opt` the diagnostic print function was
called but when using clang the diagnostic message was used.
Printing message in the clang version means we miss valuable debugging
information like function name and function type when LLVMContext was
only needed to call `getBestLocationFromDebugLoc`.
There are a few potential fixes
1. Write a custom DiagnosticInfoUnsupported so we can change the Message
just for DirectX. Too heavy handed so rejected.
2. Add the function name to the Message in DirectX code. Very simple one
line change. Downside is when using opt you see the function name twice.
But makes the clang-dxc bugs more actionable.
3. change CodeGenAction.cpp to always use the print function and not the
message directly. Downside is a bunch of innacurate information shows up
in the message if you don't specify `-debug-info-kind=standalone`.
4. add some book keeping to know which function called the intrinsic
keep a map of these so we can pass the calling function to
`DiagnosticInfoUnsupported` instead of the intrinsic. This would only be
useful if we had debug info so we could distinguish different uses of
the intrinsic by line\col number. We would also need to change from
iterating on every function to doing something like a LazyCallGraph
which is a nonstarter.
5. pick a different means of doing a Diagnostic error, because other
uses of `DiagnosticInfoUnsupported` error when we are in the body of a
function not when we see one being used like in the intrinsic case.
This PR went with a combo of option 2 & 5. Its low code change that also
only impacts the DirectX backend.
fixes#135719
LLVM 3.7 did not have a freeze instruction
Further this instruction is really only used as syntactic sugar
in LLVM's optimizer passes to not aggressively optimize things that
could be undef or poison ie x*2 to x+x.
Most backends treat it as a no-op so we will do the same
by removing it and replacing its uses with its input.
Adding support for Root Constant in MC, Object and obj2yaml and
yaml2obj, this PR adds:
- new structures to dxbc definition.
- serialize and desirialize logic from dxcontainer to yaml
- tests validating against dxc
- adding support to multiple parts.
Closes: https://github.com/llvm/llvm-project/issues/126633
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
This introduces a pass that walks accesses to globals in cbuffers and
replaces them with accesses via the cbuffer handle itself. The logic to
interpret the cbuffer metadata is kept in `lib/Frontend/HLSL` so that it
can be reused by other consumers of that metadata.
Fixes#124630.
The `dx.dot2`, `dot3`, and `dot4` intrinsics exist purely to lower
`dx.fdot`, and they map exactly to the DXIL ops of the same name. Using
vectors for their arguments adds unnecessary complexity and causes us to
have vector operations that are not trivial to lower post-scalarizer.
Similarly, the `dx.dot2add` intrinsic is overly generic for something
that only needs to lower to a single `dot2AddHalf` DXIL op. Update its
signature to match the operation it lowers to.
Fixes#134569.
fixes#135285
This change implements the `usub.sat` intrinsic to perform an unsigned
saturating subtraction on the 2 arguments.
The minimum value this operation is clamp to is 0.
Fixes#112270
Completed ACs:
- `-res-may-alias` clang-dxc command-line option added
- It inserts and sets a module metadata flag `dx.resmayalias` to 1
- Shader flag set appropriately:
- The flag IS NOT set if DXIL Version <= 1.6 OR the command-line option
`-res-may-alias` is specified
- Otherwise the flag IS set when:
- DXIL Version > 1.7 AND function uses UAVs, OR
- DXIL Version <= 1.7 AND UAVs present globally
- Add tests
- Tests for Shader Models 6.6, 6.7, and 6.8 corresponding to DXIL
Versions 1.6, 1.7, and 1.8
- Tests (`res-may-alias-0.ll`/`res-may-alias-1.ll`) for when the module
metadata flag `dx.resmayalias` is set to 0 or 1 respectively
- A frontend test (`res-may-alias.hlsl`) for testing that that the
command-line option `-res-may-alias` inserts `dx.resmayalias` module
metadata correctly
Fixes#112267
Implement the shader flag analysis to set the UseNativeLowPrecision DXIL
module flag.
The flag is only able to be set when the command-line flag
`-enable-16bit-types` is passed to clang-dxc, or equivalently
`-fnative-half-type` is passed to clang.
When the command-line flag is passed, a module metadata flag called
"dx.nativelowprec" is set to 1.
The DXILShaderFlags shader flags analysis checks that the module
metadata flag "dx.nativelowprec" is set to 1 and the DXIL Version is 1.2
or greater before setting the UseNativeLowPrecision DXIL module flag.
Resolves#99221
Key points: For SPIRV backend, it decompose into a `dot` followed a
`add`.
- [x] Implement dot2add clang builtin,
- [x] Link dot2add clang builtin with hlsl_intrinsics.h
- [x] Add sema checks for dot2add to CheckHLSLBuiltinFunctionCall in
SemaHLSL.cpp
- [x] Add codegen for dot2add to EmitHLSLBuiltinExpr in CGBuiltin.cpp
- [x] Add codegen tests to clang/test/CodeGenHLSL/builtins/dot2add.hlsl
- [x] Add sema tests to clang/test/SemaHLSL/BuiltIns/dot2add-errors.hlsl
- [x] Create the int_dx_dot2add intrinsic in IntrinsicsDirectX.td
- [x] Create the DXILOpMapping of int_dx_dot2add to 162 in DXIL.td
- [x] Create the dot2add.ll and dot2add_errors.ll tests in
llvm/test/CodeGen/DirectX/
Do cleanup in DXILFinalizeLinkage.cpp where intrinsic declares are getting orphaned.
This change reduces "Unsupported intrinsic for DXIL lowering" errors
when compiling DML shaders from 12218 to 415. and improves our
compilation success rate from less than 1% to 44%.