This updates everywhere we emit/check an SME routines to use
RuntimeLibcalls to get the function name and calling convention.
Note: RuntimeLibcallEmitter had some issues with emitting non-unique
variable names for sets of libcalls, so I tweaked the output to avoid
the need for variables.
We extract the binding logic out of the DXILResource analysis passes into the
FrontendHLSL library. This will allow us to use this logic for resource and
root signature bindings in both the DirectX backend and the HLSL frontend.
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Target` library.
These annotations currently have no meaningful impact on the LLVM build;
however, they are a prerequisite to support an LLVM Windows DLL (shared
library) build.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
A sub-set of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.
The bulk of this change is manual additions of `LLVM_ABI` to
`LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target.
Adding `LLVM_ABI` to the function implementation is required here
because they do not `#include "llvm/Support/TargetSelect.h"`, which
contains the declarations for this functions and was already updated
with `LLVM_ABI` in a previous patch. I considered patching these files
with `#include "llvm/Support/TargetSelect.h"` instead, but since
TargetSelect.h is a large file with a bunch of preprocessor x-macro
stuff in it I was concerned it would unnecessarily impact compile times.
In addition, a number of unit tests under llvm/unittests/Target required
additional dependencies to make them build correctly against the LLVM
DLL on Windows using MSVC.
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
This change uses resource name during DXIL resource binding analysis to detect when two (or more) resources have identical overlapping binding.
The DXIL resource analysis just detects that there is a problem with the binding and sets the `hasOverlappingBinding` flag. Full error reporting will happen later in DXILPostOptimizationValidation pass (llvm/llvm-project#110723).
This continues s-barannikov's work TableGen-erating SDNode descriptions.
This takes the initial patch from #119709 and moves documentation and the
rest of the AArch64ISD nodes to TableGen. Some issues were found by the
generated SDNode verification added in this patch. These issues have been
described and fixed in the following PRs:
- #140706
- #140711
- #140713
- #140715
---------
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
Adds resource name argument to `llvm.dx.handlefrombinding` and `llvm.dx.handlefromimplicitbinding` intrinsics.
SPIR-V currently does not seem to need the resource names so this change only affects DirectX binding intrinsics.
Part 2/4 of https://github.com/llvm/llvm-project/issues/105059
SMECallAttrs is a new helper class that holds all the SMEAttrs for a
call. The interfaces to query actions needed for the call (e.g. change
streaming mode) have been moved to the SMECallAttrs class.
The main motivation for this change is to make the split between the
caller, callee, and callsite attributes more apparent.
Before this change, we would always merge callsite and callee
attributes. The main reason to do this was to handle indirect calls,
however, we also occasionally used callsite attributes on direct calls
in tests (mainly to avoid creating multiple function declarations). With
this patch, we now explicitly handle indirect calls and disallow
incompatible attributes on direct calls (so this patch is not entirely
an NFC).
Same as #137239, but with a change to avoid inferring SME attributes for
function definitions. This allows stubbing the SME ABI routines in C/C++
(and matches the old behaviour).
`DXILResourceBindingAnalysis` analyses explicit resource bindings in the
module and puts together lists of used virtual register spaces and
available virtual register slot ranges for each binding type. It also
stores additional information found during the analysis such as whether
the module uses implicit bindings or if any of the bindings overlap.
This information will be used in `DXILResourceImplicitBindings` pass
(coming soon) to assign register slots to resources with implicit
bindings, and in a post-optimization validation pass that will raise
diagnostic about overlapping bindings.
Part 1/2 of #136786
SMECallAttrs is a new helper class that holds all the SMEAttrs for a
call. The interfaces to query actions needed for the call (e.g. change
streaming mode) have been moved to the SMECallAttrs class.
The main motivation for this change is to make the split between the
caller, callee, and callsite attributes more apparent.
Before this change, we would always merge callsite and callee
attributes. The main reason to do this was to handle indirect calls,
however, we also occasionally used callsite attributes on direct calls
in tests (mainly to avoid creating multiple function declarations). With
this patch, we now explicitly handle indirect calls and disallow
incompatible attributes on direct calls (so this patch is not entirely
an NFC).
In #132722 spills of ZT0 were disabled around all SME ABI routines to
avoid a case where ZT0 is spilled before ZA is enabled (resulting in a
crash).
It turns out that the ABI does not promise that routines will preserve
ZT0 (however in practice they do), so generally disabling ZT0 spills for
ABI routines is not correct.
The case where a crash was possible was "aarch64_new_zt0" functions with
ZA disabled on entry and a ZT0 spill around __arm_tpidr2_save. In this
case, ZT0 will be undefined at the call to __arm_tpidr2_save, so this
patch avoids the ZT0 spill by marking the callsite with
"aarch64_zt0_undef". This attribute only applies to callsites and marks
that at the point the call is made ZT0 is not defined, so does not need
preserving.
Add support for using the existing SCRATCH_STORE_BLOCK and
SCRATCH_LOAD_BLOCK instructions for saving and restoring callee-saved
VGPRs. This is controlled by a new subtarget feature, block-vgpr-csr. It
does not include WWM registers - those will be saved and restored
individually, just like before. This patch does not change the ABI.
Use of this feature may lead to slightly increased stack usage, because
the memory is not compacted if certain registers don't have to be
transferred (this will happen in practice for calling conventions where
the callee and caller saved registers are interleaved in groups of 8).
However, if the registers at the end of the block of 32 don't have to be
transferred, we don't need to use a whole 128-byte stack slot - we can
trim some space off the end of the range.
In order to implement this feature, we need to rely less on the
target-independent code in the PrologEpilogInserter, so we override
several new methods in SIFrameLowering. We also add new pseudos,
SI_BLOCK_SPILL_V1024_SAVE/RESTORE.
One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the
SCRATCH_LOAD_BLOCK instructions will have all the registers that are not
transferred added as implicit uses. This is done in order to inform
LiveRegUnits that those registers are not available before the restore
(since we're not really restoring them - so we can't afford to scavenge
them). Unfortunately, this trick doesn't work with the save, so before
the save all the registers in the block will be unavailable (see the
unit test).
This was reverted due to failures in the builds with expensive checks
on, now fixed by always updating LiveIntervals and SlotIndexes in
SILowerSGPRSpills.
Fixes https://github.com/llvm/llvm-project/issues/135667
Analyze and annotate `ResourceInfo` with the derived direction of calls
to updateCounter (if any).
This change only sets the value. Any diagnostics that should be raised
must be done somewhere else.
Add support for using the existing `SCRATCH_STORE_BLOCK` and
`SCRATCH_LOAD_BLOCK` instructions for saving and restoring callee-saved
VGPRs. This is controlled by a new subtarget feature, `block-vgpr-csr`.
It does not include WWM registers - those will be saved and restored
individually, just like before. This patch does not change the ABI.
Use of this feature may lead to slightly increased stack usage, because
the memory is not compacted if certain registers don't have to be
transferred (this will happen in practice for calling conventions where
the callee and caller saved registers are interleaved in groups of 8).
However, if the registers at the end of the block of 32 don't have to be
transferred, we don't need to use a whole 128-byte stack slot - we can
trim some space off the end of the range.
In order to implement this feature, we need to rely less on the
target-independent code in the PrologEpilogInserter, so we override
several new methods in `SIFrameLowering`. We also add new pseudos,
`SI_BLOCK_SPILL_V1024_SAVE/RESTORE`.
One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the
SCRATCH_LOAD_BLOCK instructions will have all the registers that are not
transferred added as implicit uses. This is done in order to inform
LiveRegUnits that those registers are not available before the restore
(since we're not really restoring them - so we can't afford to scavenge
them). Unfortunately, this trick doesn't work with the save, so before
the save all the registers in the block will be unavailable (see the
unit test).
This change removes the uint64_t constructor on LocationSize
preventing implicit conversion, and fixes up the using APIs to adapt to
the change. Note that I'm adding a couple of explicit conversion points
on routines where passing in a fixed offset as an integer seems likely
to have well understood semantics.
We had an unfortunate case which arose if you tried to pass a TypeSize
value to a parameter of LocationSize type. We'd find the implicit
conversion path through TypeSize -> uint64_t -> LocationSize which works
just fine for fixed values, but looses information and fails assertions
if the TypeSize was scalable. This change breaks the first link in that
implicit conversion chain since that seemed to be the easier one.
- Use static instead of anonymous namespace for file local functions.
- Enclose file-local classes in anonymous namespace.
- Eliminate `llvm::` qualifier when file has `using namespace llvm`.
- Eliminate namespace surrounding entire code in
SPIRVConvergenceRegionAnalysis.cpp file.
- Eliminate call to `initializeSPIRVStructurizerPass` from the pass
constructor (https://github.com/llvm/llvm-project/issues/111767)
As with #132002, these do show up in a compilation of llvm-test-suite
(including SPEC 2017). We remove 30-40 static instances so this isn't
anything earth shattering.
rs2 is always added to the other shifted (and potentially extended)
operand unmodified, so rs1==zero is equivalent to a copy.
This adds coverage for additional instructions in isCopyInstrImpl, for
now picking just those where I can observe that there is a codegen
difference for SPEC.
This allows MachineCopyPropagation to successfully eliminate no-op moves in this form.
In dynamic VGPR mode, we can allocate up to 8 blocks of either 16 or 32
VGPRs (based on a chip-wide setting which we can model with a Subtarget
feature). Update some of the subtarget helpers to reflect this.
In particular:
- getVGPRAllocGranule is set to the block size
- getAddresableNumVGPR will limit itself to 8 * size of a block
We also try to be more careful about how many VGPR blocks we allocate.
Therefore, when deciding if we should revert scheduling after a given
stage, we check that we haven't increased the number of VGPR blocks that
need to be allocated.
---------
Co-authored-by: Jannik Silvanus <jannik.silvanus@amd.com>
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.
For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.
The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
Split out from #77610 and features a test, as a buggy version of this
caused a regression when landing that patch (the previous version had a
typo picking the wrong register as the source).
This is also motivated by future changes to MachineCopyPropagation which will use this information to determine if we have been left with a nop mv.
This fixes#126784 for the DirectX backend.
This bug was marked critical for DX so it is going to go in first. At
least one register class needs to be added via `addRegisterClass` for
`RegClassForVT` to be valid.
Further for costing information used by loop unroll and other
optimizations to be valid we need to call `computeRegisterProperties`.
This change does both of these.
The test cases confirm that we can fetch costing information off of
`getRegisterInfo` and that `DirectXTargetLowering` maps `i32` typed
registers to `DXILClassRegClass`.
Adds `findByUse` which takes a `llvm::Value` from a use and resolves it
(as best as possible) back to the creation of that resource.
It may return multiple ResourceBindingInfo if the use comes from
branched control flow.
Fixes#125746
This is necessary to enable composing subregisters in peephole-opt.
For now use a brute force table to find the return value. The worst
case target is AMDGPU with a 399 x 399 entry table.
This PR fixes https://github.com/llvm/llvm-project/issues/124703:
* added a new API call `SPIRVTranslate` that is to replace entirely old
`SPIRVTranslateModule` after existing clients switch into the new
function;
* the new `SPIRVTranslate` doesn't require option parsing, replacing the
`Opts` argument with explicit `CodeGenOptLevel` and `Triple` arguments;
* the old `SPIRVTranslateModule` call is a wrapper for `SPIRVTranslate`,
it doesn't require option parsing either and doesn't hold any logic
inside except for converting string options into `CodeGenOptLevel` and
`Triple` arguments;
* usage of the extensions list is reworked to avoid writes to the global
cl::opt variable `lib/Target/SPIRV/SPIRVSubtarget.cpp::Extensions` --
instead a new class member in SPIRVSubtarget.cpp is implemented that
allows to replace supported extensions after SPIRVSubtarget.cpp is
created;
* both API calls don't require option parsing and don't write to global
cl::opt variables.
Other related/required changes:
* SPIRV::Capability::Shader is marked as an capability of lesser
priority for OpenCL environment (to remediate absence of the
"avoid-spirv-capabilities" command line option in API calls);
* unit tests are updated and extended to cover testing of a newer API
call;
* old API call is marked with TODO to remove it after existing clients
switch into the new function.
Adding SPIRV to LLVM_ALL_TARGETS
(https://github.com/llvm/llvm-project/pull/119653) revealed a series of
minor compilation problems and sanitizer complaints. This PR is to move
unit tests resources (a Module ptr) from the class-scope to a local
scope of the class member function to be sure that before the test env
is teared down the ptr is released.
Previously, registers and subregisters mapped to the same Dwarf
encoding. We don't really have any way to refer to subregisters directly
from Dwarf, the expression emitter should instead use DW_OPs to stencil
out the subregister from the whole register. This was also confusing
tools that need to map back to the llvm reg (e.g. dwarfdump), since
getLLVMRegNum() would arbitrarily return the _LO16 register.
This relands commit #115111.
Use traditional way to update post dominator tree, i.e. break critical
edge splitting into insert, insert, delete sequence.
When splitting critical edges, the post dominator tree may change its
root node, and `setNewRoot` only works in normal dominator tree...
See
6c7e5827ed/llvm/include/llvm/Support/GenericDomTree.h (L684-L687)
Add testing for the visitor and added a note explaining irreducible CFG
are not supported.
Related to #116692
---------
Signed-off-by: Nathan Gauër <brioche@google.com>
Block sorting was assuming reducible CFG. Meaning we always had a best
node to continue with. Irreducible CFG makes breaks this assumption, so
the algorithm looped indefinitely because no node was a valid candidate.
Fixes#116692
---------
Signed-off-by: Nathan Gauër <brioche@google.com>
The issue with slow compile-time was caused by an assert in
AArch64RegisterInfo.cpp. The assert invokes 'checkAllSuperRegsMarked'
after adding all the reserved registers. This call gets very expensive
after adding the _HI registers due to the way the function searches
in the 'Exception' list, which is expected to be a small list but isn't
(the patch added 190 _HI regs).
It was possible to rewrite the code in such a way that the _HI registers
are marked as reserved after the check. This makes the problem go away
entirely and restores compile-time to what it was before (tested for
`check-runtimes`, which previously showed a ~5x slowdown).
This reverts commits:
1434d2ab215e3ea9c5f34689d056edd3d4423a78
2704647fb7986673b89cef1def729e3b022e2607
Following discussions in #110443, and the following earlier discussions
in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html,
https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this
PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine`
interface classes. More specifically:
1. Makes `TargetMachine` the only class implemented under
`TargetMachine.h` in the `Target` library.
2. `TargetMachine` contains target-specific interface functions that
relate to IR/CodeGen/MC constructs, whereas before (at least on paper)
it was supposed to have only IR/MC constructs. Any Target that doesn't
want to use the independent code generator simply does not implement
them, and returns either `false` or `nullptr`.
3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming
aims to make the purpose of `LLVMTargetMachine` clearer. Its interface
was moved under the CodeGen library, to further emphasis its usage in
Targets that use CodeGen directly.
4. Makes `TargetMachine` the only interface used across LLVM and its
projects. With these changes, `CodeGenCommonTMImpl` is simply a set of
shared function implementations of `TargetMachine`, and CodeGen users
don't need to static cast to `LLVMTargetMachine` every time they need a
CodeGen-specific feature of the `TargetMachine`.
5. More importantly, does not change any requirements regarding library
linking.
cc @arsenm @aeubanks