fixes#142836
We added a function called `collectIndicesAndDimsFromGEP` which builds
the Indicies and Dims up for the recursive case and the base case.
really to solve #142836 we didn't need to add it to the recursive case.
The recursive cases exists for gep chains which are ussually two
indicies per gep ie ptr index and array index. adding
collectIndicesAndDimsFromGEP to the recursive cases means we can now do
some mixed mode indexing say we get a case where its not the ussual 2
indicies but instead 3 we can now treat those last two indicies as part
of the computation for the flat array index.
In the AArch64 version this helps reduce the number of blr instruction
(indirect jumps) in from 325 to 87, and reduces the size of the object
file by 4%. It seems to help make the code more efficient even if it
doesn't greatly affect compile time.
The AMDGPU variants are already marked as final.
This change relands https://github.com/llvm/llvm-project/pull/142853
It fixes the circular reference issue we were seeing in GEPs
ex `%.flat = getelementptr inbounds [16 x i32], ptr %.flat, i32 0, i32
15`
This PR addresses a specific edge case when deciding whether or not to
produce a bitcast instruction.
Specifically, when the given instruction is a global array, the element
type of the array wasn't correctly compared to the return type. In this
specific case, if the types are equal, a bitcast shouldn't be created,
but it was.
This PR checks to see if the element type of the array is the same as
the return type, and if it is, it doesn't create a bitcast instruction.
Fixes https://github.com/llvm/llvm-project/issues/139013
partially fixes#142836
- Update DXILFlattenArrays.cpp GEPs to use two indicies since they are
array GEPs
- Update flatten test cases
- This change reduces dxv bitcast validation errors by 364 (Total now is
1070x)
- This change reduces dxv out of bounds validation errors by 124 (Total
is now 24)
- We are also able to successfully compile 4 more shaders
fixes#140415
The i8 legalization code in DXILLegalizePass's `fixI8UseChain` needs to
be updated to check for i8 geps.
It seems like there are i8 GEPs being left around after we remove all
the other i8 instructions and this is causing problem on validation.
Since this is cleaning up a missed GEP The approach is to assume the
getPointerOperand is to an alloca we further will check if this is an
array alloca then do some byte offset arithmetic to figure out the
memory index to use. Finally we will emit the new gep and cleanup the
old one.
Finally needed to update upcastI8AllocasAndUses to account for loads off
of GEPs instead of just loads from the alloca.
- adds parsing from metadata into dxcontainer binary
- adds validations as described in the spec
- adds testing scenarios
closes: [#126638](https://github.com/llvm/llvm-project/issues/126638)
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
Adds reporting of overlapping binding errors to `DXILPostOptimizationValidation` pass. Only runs when `DXILResourceBindingAnalysis` detects that there is a resource binding that overlaps while it is building up a map of available register spaces.
Fixes#110723
Since the `DXILResourceImplicitBinding` pass has been moved after `DXILForwardHandleAccesses` pass the forward-handle pass needs to handle `llvm.dx.resource.handlefromimplicitbinding` calls as well.
typedBufferLoad of double/double2 is expanded to a typedBufferLoad of a
<2 x i32>/<4 x i32> and asdouble
typedBufferStore of a double/double2 is expanded to a splitdouble and a
typedBufferStore of a <2 x i32>/<4 x i32>
Add tests showing result of intrinsic expansion for typedBufferLoad and
typedBufferStore
Add tests showing dxil op lowering can handle typedBufferLoad and
typedBufferStore where the target type doesn't match the typedBufferLoad
and typedBufferStore type
Closes#104423
Update resource type names for globals variables that we generate in `DXILTranslateMetadata` pass to include element type. This change prevents duplicate types for identical resources and brings the DXIL metadata names it closer to what DXC generates.
Moving `DXILResourceImplicitBinding` pass and the associated `DXILResourceBindingAnalysis` lower in the llc pipeline to just before the DXIL Resource Analysis, which is where its results are first needed, and adjusting the set of analyses it preserves.
The reason for this change is that I will soon be adding `DXILResourceBindingAnalysis` dependency to `DXILPostOptimizationValidation` pass and bringing this closer to where it is needed avoid unnecessary churn to preserved analysis setting in preceding passes.
This annotates the `Twine` passed to the constructors of the various
DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes
us to warn when we would try to print the twine after it had already
been destructed.
We also update `DiagnosticInfoUnsupported` to hold a `const Twine &`
like all of the other DiagnosticInfo classes, since this warning allows
us to clean up all of the places where it was being used incorrectly.
Adds resource name argument to `llvm.dx.handlefrombinding` and `llvm.dx.handlefromimplicitbinding` intrinsics.
SPIR-V currently does not seem to need the resource names so this change only affects DirectX binding intrinsics.
Part 2/4 of https://github.com/llvm/llvm-project/issues/105059
As requested in a previous PR, this change moves all structs related to
RTS0 to RTS0 namespace.
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
- DXILDataScalarization should not just be limited to global data
- Add a scalarization for alloca
- Add ReversePostOrderTraversal of functions and iterate over basic
blocks and run DataScalarizerVisitor.
- fixes#140143
Remove the MCSubtargetInfo argument from applyFixup, introduced in
https://reviews.llvm.org/D45962 , as it's only required by ARM. Instead,
add const MCFragment & so that ARMAsmBackend can retrieve
MCSubtargetInfo via a static member function.
Additionally, remove the MCAssembler argument, which is also only
required by ARM.
Additionally, make applyReloc non-const. Its arguments now fully cover
addReloc's functionality.
Fixes#137188
This PR legalizes memcpy for DXIL in cases where:
- the src and dst arguments are from Alloca or a GlobalVariable,
- the src and dst are pointers to an ArrayType,
- the array element types of src and dst must be equivalent, and
- the len param is a ConstantInt
These assumptions simplify the legalization and, with the addition of
#138991, covers the currently-known cases of memcpy that appear when
compiling DML shaders.
This PR may be unnecessary if #138788 determines that memset and memcpy
can be eliminated entirely.
---------
Co-authored-by: Finn Plummer <canadienfinn@gmail.com>
Co-authored-by: Greg Roth <grroth@microsoft.com>
There were some issues with these ops:
- The overload wasn't being specified (`dx.op.dot4AddPacked` vs
`dx.op.dot4AddPacked.i32`)
- The versioning wasn't correct (These ops were added in SM 6.4)
- The argument order was off - while the HLSL function has the
accumulator as the last argument, the DXIL op lists it first.
This fixes the DXIL.td definition and adjusts the LLVM DX and SPIRV
intrinsics to match the argument order in DXIL rather than the argument
order in HLSL.
Fixes#139018
This PR refactors mcdxbc data structure for root signatures to support
out of order storage of in memory root signature data.
closes: #139585
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
Fixes#139024 and #139954
- Refactor DXILShaderFlags to compute the flags that apply to a whole
module before computing flags that apply individually to each function
- Make DXILResourceMap const, since it is not modified in
DXILShaderFlags
- Per-function shader flag analysis now initially starts with the set of
flags that apply to the whole module instead of starting from no flags.
This change fixes the above linked issues
- Fix shader flag tests affected by the above change
Fixes#138997 and does refactoring for low-precision-related flags.
The shader feature flags MinimumPrecision and NativeLowPrecision were
not being set, leading to validation errors.
This PR sets these shader feature flags [as in
DXC](377c4ca6d8/lib/DXIL/DxilShaderFlags.cpp (L58-L63))
and adds tests for them.
This PR also performs some refactoring of low-precision-related flags to
make it less confusing.
- The `UseNativeLowPrecision` DXIL module flag has been renamed to
`NativeLowPrecisionMode` to imply that it is setting some execution
state which the module should be interpreted with
- The LLVM module flag `dx.nativelowprec` is now read only once and sets
a bool to be used by `updateFunctionFlags()` and for setting the DXIL
module flag `NativeLowPrecisionMode`
- The `MinimumPrecision`, `NativeLowPrecision`, and
`LowPrecisionPresent` shader feature flags are all set together under
`updateFunctionFlags()`
- Moved the logic for setting DXIL module flags `NativeLowPrecisionMode`
and `ResMayNotAlias` out of the per-function loop and placed it
alongside the logic for setting other DXIL module flags
(`DisableOptimizations`, `Max64UAVs`, and `UAVsAtEveryStage` flags)
---------
Co-authored-by: Justin Bogner <mail@justinbogner.com>
Move dxil resource access legacy pass before intrinsic expansion legacy
pass so TypedBuffer Loads and Stores will be created before intrinsic
expansion.
This is to facilitate #104423
The `DXILResourceImplicitBinding` pass uses the results of
`DXILResourceBindingAnalysis` to assigns register slots to resources
that do not have explicit binding. It replaces all
`llvm.dx.resource.handlefromimplicitbinding` calls with
`llvm.dx.resource.handlefrombinding` using the newly assigned binding.
If a binding cannot be found for a resource, the pass will raise a
diagnostic error. Currently this diagnostic message does not include the
resource name, which will be addressed in a separate task (#137868).
Part 2/2 of #136786Closes#136786
With this change, some callers get to use StringRef::starts_with.
I'm planning to teach getAsmString to return StringRef also, but
I'ld like to keep that separate from this patch.
Fixes#137209
This PR:
- Adds a case to `expandIntrinsic()` in `DXILIntrinsicExpansion.cpp` to
expand the `Intrinsic::is_fpclass` in the case of
`FPClassTest::fcNegZero`
- Defines the `IsNaN`, `IsFinite`, `IsNormal` DXIL ops in `DXIL.td`
- Adds a case to `lowerIntrinsics()` in `DXILOpLowering.cpp` to handle
the lowering of `Intrinsic::is_fpclass` to the DXIL ops `IsNaN`,
`IsInf`, `IsFinite`, `IsNormal` when the FPClassTest is `fcNan`,
`fcInf`, `fcFinite`, and `fcNormal` respectively
- Creates a test `llvm/test/CodeGen/DirectX/is_fpclass.ll` to exercise
the intrinsic expansion and DXIL op lowering of `Intrinsic::is_fpclass`
~~A separate PR will be made to remove the now-redundant `dx_isinf`
intrinsic to address #87777.~~
A proper implementation for the lowering of the `llvm.is.fpclass`
intrinsic to handle all possible combinations of FPClassTest can be
implemented in a separate PR. This PR's implementation focuses primarily
on addressing the current use-cases for DirectML and HLSL intrinsics.
This implements the result of the discussion at:
https://discourse.llvm.org/t/rfc-report-fatal-error-and-the-default-value-of-gencrashdialog/73587
There are two different use cases for report_fatal_error, so replace it
with two functions reportFatalInternalError() and
reportFatalUsageError(). The former indicates a bug in LLVM and
generates a crash dialog. The latter does not. The names have been
suggested by rnk and people seemed to like them.
This replaces a lot of the usages that passed an explicit value for
GenCrashDiag. I did not bulk replace remaining report_fatal_error usage
-- they probably require case by case review for which function to use.
This moves the responsibility for cleaning up dead intrinsics from
DXILFinalizeLinkage to DXILOpLowering, and moves DXILFinalizeLinkage
back to it's pre-#136244 place in the pipeline. Doing this avoids issues
with DXIL passes running on obviously dead code, and makes it more clear
what DXILFinalizeLinkage is really doing.
This also helps with the story for #134260, as cleaning up dead
intrinsics doesn't make sense if this becomes a more generic pass.
Note that test/CodeGen/DirectX/remove-dead-intriniscs.ll already covers
most of the testing here. It'd be nice to have something that catches
the regression from changing the pass ordering but I couldn't come up
with anything that wouldn't be incredibly fragile.
Fixes#138180.
fixes#136243
This change converts memset into a series of geps and stores It is
intentionally limited to memsets of fixed size It also converts the byte
stores to type stores.
DXIL does not support i8 plus this reduces the total number of gep and
store instructions.
This change also moves DXILFinalizeLinkage to run after Legalization to
clean up any dead intrinsic definitions.
fixes#137202
investingating i8 allocas I came to find some missing instructions from
out i8 legalization around load, store, and select.
Added those three.
To do i8 allocas right though we needed to walk the uses and find the
casts.
After finding the casts I chose to pick the smallest cast as the cast to
transform to. That would then let me preserve the larger casts that come
later
Fixes#112272
In addition to the implementation of the UAVsAtEveryStage shader flag
analysis, several unrelated tests have had the `dx.valver` module
metadata defined to avoid setting the UAVsAtEveryStage shader flag in
them.
Example:
```
!dx.valver = !{!0}
!0 = !{i32 1, i32 8}
```
---------
Co-authored-by: Justin Bogner <mail@justinbogner.com>
This PR introduces a Metadata Node Kind allowlist. The purpose is to
prevent newer Metadata Node Kinds to be used and inserted into the
outputted DXIL module. Only the metadata kinds that are accepted in the
DXIL Validator are on the allowlist. The Github DXC validator doesn't
support these newer Metadata Node Kinds, so we need to filter them out.
We introduce this restrictive allowlist into LLVM and strip all metadata
that isn't found in the list.
The accompanying test would add the `llvm.loop.mustprogress` metadata
node kind, but thanks to the allowlist, filters it out, and so the
whitelist is proven to work.
The test also has two separate metadata kinds that are on the allowlist,
and remain after the DXIL Prepare pass.
Replace "concept based polymorphism" with simpler PImpl idiom.
This pursues two goals:
* Enforce static type checking. Previously, target implementations hid
base class methods and type checking was impossible. Now that they
override the methods, the compiler will complain on mismatched
signatures.
* Make the code easier to navigate. Previously, if you asked your
favorite LSP server to show a method (e.g. `getInstructionCost()`), it
would show you methods from `TTI`, `TTI::Concept`, `TTI::Model`,
`TTIImplBase`, and target overrides. Now it is two less :)
There are three commits to hopefully simplify the review.
The first commit removes `TTI::Model`. This is done by deriving
`TargetTransformInfoImplBase` from `TTI::Concept`. This is possible
because they implement the same set of interfaces with identical
signatures.
The first commit makes `TargetTransformImplBase` polymorphic, which
means all derived classes should `override` its methods. This is done in
second commit to make the first one smaller. It appeared infeasible to
extract this into a separate PR because the first commit landed
separately would result in tons of `-Woverloaded-virtual` warnings (and
break `-Werror` builds).
The third commit eliminates `TTI::Concept` by merging it with the only
derived class `TargetTransformImplBase`. This commit could be extracted
into a separate PR, but it touches the same lines in
`TargetTransformInfoImpl.h` (removes `override` added by the second
commit and adds `virtual`), so I thought it may make sense to land these
two commits together.
Pull Request: https://github.com/llvm/llvm-project/pull/136674
This PR primarily fixes the version-checking logic of the shader flags
`ResMayNotAlias` and `Max64UAVs` to correctly match DXC's behavior.
Primary changes:
- The logic for determining the presence of UAVs for the
`ResMayNotAlias` shader flag checked against the DXIL Version when it
should have been checking against the DXIL Validator Version. (See DXC:
[DxilShaderFlags.cpp#L484](f19b5da541/lib/DXIL/DxilShaderFlags.cpp (L484)))
- The logic for counting UAVs for the `Max64UAVs` shader flag checked
against the DXIL Version when it should have been checking against the
DXIL Validator Version. (See DXC:
[DxilModule.cpp#L327](f19b5da541/lib/DXIL/DxilModule.cpp (L327)))
- Tests have been modified to test the corrected behaviors for these two
flags
Additional changes included for consistency:
- The logic for setting `UseNativeLowPrecision` now checks against
Shader Model version instead of DXIL version to be consistent with the
code comments from DXC
([DxilShaderFlags.h#L280](f19b5da541/include/dxc/DXIL/DxilShaderFlags.h (L280))).
- An additional test has been added to ensure that the module flag
"dx.nativelowprec" set to 0 does not apply the `UseNativeLowPrecision`
shader flag
- Related shader flag tests were renamed to be more consistent, and some
comments were edited for clarification
- Add obj2yaml tests for the `Max64UAVs` shader flag
We can end up with loads of single element vectors when we have scalar
values, because the vectorizer may introduce these to use ops like
shufflevector in some cases. Make sure we're maintaining the correct
type when translating these into resource load operations.
Fixes#136409.
This pass attempts to forward resource handle creation to accesses of
the handle global. This avoids dependence on optimizations like CSE and
GlobalOpt for correctness of DXIL.
Fixes#134574.
fixes#136620
It was determined that the lifetime intrinsics generated by clang are
likely more correct than the ones in DXC hence explaining the missing
lifetimes between the IR diffs.
As such we are legalizing lllvm lifetime intrinsics by letting them all
pass on through.
Fixes [#114553](https://github.com/llvm/llvm-project/issues/114553)
This implementation replicates the behavior of DXC in setting the
`m_b64UAVs` flag: the `Max64UAVs` DXIL module flag is set in the
presence of more than 8 UAVs in a DXIL module.
The behavior of how UAV (resource) arrays are counted differs based on
Shader Model version:
- If Shader Model < 6.6, then a UAV array counts as a single UAV
regardless of its range size
- if Shader Model >= 6.6, then a UAV array contributes its range size to
the total number of UAVs
I initially thought the complete implementation of this analysis may be
blocked by the resource arrays implementation, but it seems that it is
not the case, as the `@llvm.dx.resource.handle*` already includes a
range size argument.
Making `TargetTransformInfo::Model::Impl` `const` makes sure all
interface methods are `const`, in `BasicTTIImpl`, its bases, and in all
derived classes.
Pull Request: https://github.com/llvm/llvm-project/pull/136598