We can simplify the code with *Map::try_emplace where we need
default-constructed values while avoding calling constructors when
keys are already present.
There were some issues with these ops:
- The overload wasn't being specified (`dx.op.dot4AddPacked` vs
`dx.op.dot4AddPacked.i32`)
- The versioning wasn't correct (These ops were added in SM 6.4)
- The argument order was off - while the HLSL function has the
accumulator as the last argument, the DXIL op lists it first.
This fixes the DXIL.td definition and adjusts the LLVM DX and SPIRV
intrinsics to match the argument order in DXIL rather than the argument
order in HLSL.
Fixes#139018
The "target-features" function attribute is not currently considered
when adding vscale_range to a function. When +sve/+sme are pushed onto
functions with "#pragma attribute push(+sve/+sme)", the function
potentially misses out on optimizations that rely on vscale_range being
present.
Constant buffers defined with the `cbuffer` keyword do not have a
constructor. Instead, the call to initialize the resource handle based
on its binding is generated in codegen. This change adds initialization
of `cbuffer` handles that have implicit binding.
Closes #139617
This refactors existing code into a 'SanitizerInfoFromCFICheckKind'
helper function. This will be useful in future work to annotate CFI
checks with debug info
(https://github.com/llvm/llvm-project/pull/139809).
- prereq: Modify `RootSignatureAttr` to hold a reference to the owned
declaration
- Define and implement `MetadataBuilder` in `HLSLRootSignature`
- Integrate and invoke the builder in `CGHLSLRuntime.cpp` to generate
the Root Signature for any associated entry functions
- Add tests to demonstrate functionality in `RootSignature.hlsl`
Resolves https://github.com/llvm/llvm-project/issues/126584
Note: this is essentially just
https://github.com/llvm/llvm-project/pull/125131 rebased onto the new
approach of constructing a root signature decl, instead of holding the
elements in `AdditionalMembers`.
This generalizes the debug info annotation code from https://github.com/llvm/llvm-project/pull/139149 and moves it into a helper function, SanitizerAnnotateDebugInfo().
Future work can use 'ApplyDebugLocation ApplyTrapDI(*this, SanitizerAnnotateDebugInfo(Ordinal));' to add annotations to additional checks.
This patch adds fp8 variants to existing intrinsics, whose operation
doesn't depend on arguments being a specific type.
It also changes mfloat8 type representation in memory from `i8` to
`<1xi8>`
Adds constructor for resources with implicit binding and applies it to
all resources without binding at the global scope.
Adds Clang builtin function
`__builtin_hlsl_resource_handlefromimplicitbinding` that gets translated
to `llvm.dx|spv.resource.handlefromimplicitbinding` intrinsic calls.
Specific bindings are assigned in DXILResourceImplicitBinding pass.
Design proposals:
https://github.com/llvm/wg-hlsl/blob/main/proposals/0024-implicit-resource-binding.mdhttps://github.com/llvm/wg-hlsl/blob/main/proposals/0025-resource-constructors.md
One change from the proposals is that the `orderId` parameter is added
onto the constructor. Originally it was supposed to be generated in
codegen when the `llvm.dx|spv.resource.handlefromimplicitbinding` call
is emitted, but that is not possible because the call is inside a
constructor, and the constructor body is generated once per resource
type and not resource instance. So the only way to inject instance-based
data like `orderId` into the
`llvm.dx|spv.resource.handlefromimplicitbinding` call is that it must
come in via the constructor argument.
Closes#136784
Thread-local globals live, by default, in the default globals address
space, which may not be 0, so we need to overload @llvm.thread.pointer
to support other address spaces, and use the default globals address
space in Clang.
For i1 vectors, we used an i8 fixed vector as the storage type.
If the known minimum number of elements of the scalable vector type is
less than 8, we were doing the cast through memory. This used a load or
store from a fixed vector alloca. If is less than 8, DataLayout
indicates that the load/store reads/writes vscale bytes even if vscale
is known and vscale*X is less than or equal to 8. This means the load or
store is outside the bounds of the fixed size alloca as far as
DataLayout is concerned leading to undefined behavior.
This patch avoids this by widening the i1 scalable vector type with zero
elements until it is divisible by 8. This allows it be bitcasted to/from
an i8 scalable vector. We then insert or extract the i8 fixed vector
into this type.
Hopefully this enables #130973 to be accepted.
The passed indices have to be constant integers anyway, which we verify
before creating the ShuffleVectorExpr. Use the value we create there and
save the indices using a ConstantExpr instead. This way, we don't have
to evaluate the args every time we call getShuffleMaskIdx().
I also fixed __builtin_wasm_ref_null_extern() to generate a diagnostic
when it gets an argument. It seems like `SemaRef.checkArgCount()` has a
bug that makes it unable to check for 0 args.
The 'counted_by' attribute is now available for pointers in structs.
It generates code for sanity checks as well as
__builtin_dynamic_object_size()
calculations. For example:
struct annotated_ptr {
int count;
char *buf __attribute__((counted_by(count)));
};
If the pointer's type is 'void *', use the 'sized_by' attribute, which
works similarly to 'counted_by', but can handle the 'void' base type:
struct annotated_ptr {
int count;
void *buf __attribute__((sized_by(count)));
};
If the 'count' field member occurs after the pointer, use the
'-fexperimental-late-parse-attributes' flag during compilation.
Note that 'counted_by' cannot be applied to a pointer to an incomplete
type, because the size isn't known.
struct foo;
struct annotated_ptr {
int count;
struct foo *buf __attribute__((counted_by(count))); /* invalid */
};
Signed-off-by: Bill Wendling <morbo@google.com>
Unused types are retained in the debug info when
-fno-eliminate-unused-debug-types is specified.
However, unused nested enums were not being emitted even with this
option.
This patch fixes the missing emission of unused nested enums with
-fno-eliminate-unused-debug-types
- Defines a new declaration node `HLSLRootSignature` in `DeclNodes.td`
that will consist of a `TrailingObjects` of the in-memory construction
of the root signature, namely an array of `hlsl::rootsig::RootElement`s
- Defines a new clang attr `RootSignature` which simply holds an
identifier to a corresponding root signature declaration as above
- Integrate the `HLSLRootSignatureParser` to construct the decl node in
`ParseMicrosoftAttributes` and then attach the parsed attr with an
identifier to the entry point function declaration.
- Defines the various required declaration methods
- Add testing that the declaration and reference attr are created
correctly, and some syntactical error tests.
It was previously proposed that we could have the root elements
reference be stored directly as an additional member of the attribute
and to not have a separate root signature decl. In contrast, by defining
them separately as this change proposes, we will allow a unique root
signature to have its own declaration in the AST tree. This allows us to
only construct a single root signature for all duplicate root signature
attributes. Having it located directly as a declaration might also prove
advantageous when we consider root signature libraries.
Resolves https://github.com/llvm/llvm-project/issues/119011
Adds support for MSVC's undocumented `/funcoverride` flag, which marks
functions as being replaceable by the Windows kernel loader. This is
used to allow functions to be upgraded depending on the capabilities of
the current processor (e.g., the kernel can be built with the naive
implementation of a function, but that function can be replaced at boot
with one that uses SIMD instructions if the processor supports them).
For each marked function we need to generate:
* An undefined symbol named `<name>_$fo$`.
* A defined symbol `<name>_$fo_default$` that points to the `.data`
section (anywhere in the data section, it is assumed to be zero sized).
* An `/ALTERNATENAME` linker directive that points from `<name>_$fo$` to
`<name>_$fo_default$`.
This is used by the MSVC linker to generate the appropriate metadata in
the Dynamic Value Relocation Table.
Marked function must never be inlined (otherwise those inline sites
can't be replaced).
Note that I've chosen to implement this in AsmPrinter as there was no
way to create a `GlobalVariable` for `<name>_$fo$` that would result in
a symbol being emitted (as nothing consumes it and it has no
initializer). I tried to have `llvm.used` and `llvm.compiler.used` point
to it, but this didn't help.
Within LLVM I referred to this feature as "loader replaceable" as
"function override" already has a different meaning to C++ developers...
I also took the opportunity to extract the feature symbol generation
code used by both AArch64 and X86 into a common function in AsmPrinter.
Allows the __ptrauth qualifier to be applied to pointer sized integer types,
updates Sema to ensure trivially copyable, etc correctly handle address
discriminated integers, and updates codegen to perform authentication
around arithmetic on the types.
Adds support for emitting Windows x64 Unwind V2 information, includes
support `/d2epilogunwind` in clang-cl.
Unwind v2 adds information about the epilogs in functions such that the
unwinder can unwind even in the middle of an epilog, without having to
disassembly the function to see what has or has not been cleaned up.
Unwind v2 requires that all epilogs are in "canonical" form:
* If there was a stack allocation (fixed or dynamic) in the prolog, then
the first instruction in the epilog must be a stack deallocation.
* Next, for each `PUSH` in the prolog there must be a corresponding
`POP` instruction in exact reverse order.
* Finally, the epilog must end with the terminator.
This change adds a pass to validate epilogs in modules that have Unwind
v2 enabled and, if they pass, emits new pseudo instructions to MC that
1) note that the function is using unwind v2 and 2) mark the start of
the epilog (this is either the first `POP` if there is one, otherwise
the terminator instruction). If a function does not meet these
requirements, it is downgraded to Unwind v1 (i.e., these new pseudo
instructions are not emitted).
Note that the unwind v2 table only marks the size of the epilog in the
"header" unwind code, but it's possible for epilogs to use different
terminator instructions thus they are not all the same size. As a work
around for this, MC will assume that all terminator instructions are
1-byte long - this still works correctly with the Windows unwinder as it
is only using the size to do a range check to see if a thread is in an
epilog or not, and since the instruction pointer will never be in the
middle of an instruction and the terminator is always at the end of an
epilog the range check will function correctly. This does mean, however,
that the "at end" optimization (where an epilog unwind code can be
elided if the last epilog is at the end of the function) can only be
used if the terminator is 1-byte long.
One other complication with the implementation is that the unwind table
for a function is emitted during streaming, however we can't calculate
the distance between an epilog and the end of the function at that time
as layout hasn't been completed yet (thus some instructions may be
relaxed). To work around this, epilog unwind codes are emitted via a
fixup. This also means that we can't pre-emptively downgrade a function
to Unwind v1 if one of these offsets is too large, so instead we raise
an error (but I've passed through the location information, so the user
will know which of their functions is problematic).
It isn't used and is redundant with the result pointer type argument.
A more reasonable API would only have LangAS parameters, or IR parameters,
not both. Not all values have a meaningful value for this. I'm also
not sure why we have this at all, it's not overridden by any targets and
further simplification is possible.
Do not assume it's the alloca address space, we have an explicit
address space to use for the argument already. Also use the original
value's type instead of assuming DefaultAS.
This essentially reverts 1bf1a156d673.
OpenCL's handling of address spaces has always been a mess, but it's
better than it used to be so this hack appears to be unnecessary now.
None of the code here should really depend on the language or language
address space. The ABI address space to use is already explicit in the
ABIArgInfo, so use that instead of guessing it has anything to do with
LangAS::Default or getASTAllocaAddressSpace.
The below usage of LangAS::Default and getASTAllocaAddressSpace are also
suspect, but appears to be a more involved and separate fix.
In DWARF info RISC-V Vector types are presented as DW_TAG_array_type
with tags DW_AT_type (what elements does this array consist of) and
DW_TAG_subrange_type. DW_TAG_subrange_type have DW_AT_upper_bound tag
which contain upper bound value for this array.
For now, it's generate same DWARF info about length of segmented types
and their corresponding non-tuple types.
For example, vint32m4x2_t and vint32m4_t have DW_TAG_array_type with
same DW_AT_type and DW_TAG_subrange_type, it means that this types have
same length, which is not correct
(vint32m4x2_t length is twice as big as vint32m4_t)
This fixes emitting undefined behavior where a 64-bit generic
pointer is written to a 32-bit slot allocated for a private pointer.
This can be seen in test/CodeGenOpenCL/amdgcn-automatic-variable.cl's
wrong_pointer_alloca.
@fmayer introduced '-mllvm -array-bounds-pseudofn'
(https://github.com/llvm/llvm-project/pull/128977/) to make it easier to
see why crashes occurred, and to estimate with a profiler the cycles
spent on these array-bounds checks. This functionality could be usefully
generalized to other checks in future work.
This patch adds the plumbing for -fsanitize-annotate-debug-info, and
connects it to the existing array-bounds-pseudo-fn functionality i.e.,
-fsanitize-annotate-debug-info=array-bounds can be used as a replacement
for '-mllvm -array-bounds-pseudofn', though we do not yet delete the
latter.
Note: we replaced '-mllvm -array-bounds-pseudofn' in
clang/test/CodeGen/bounds-checking-debuginfo.c, because adding test
cases would modify the line numbers in the test assertions, and
therefore obscure that the test output is the same between '-mllvm
-array-bounds-pseudofn' and -fsanitize-annotate-debug-info=array-bounds.
Normally -fsanitize-coverage=stack-depth inserts inline arithmetic to
update thread_local __sancov_lowest_stack. To support stack depth
tracking in the Linux kernel, which does not implement traditional
thread_local storage, provide the option to call a function instead.
This matches the existing "stackleak" implementation that is supported
in Linux via a GCC plugin. To make this coverage more performant, a
minimum estimated stack depth can be chosen to enable the callback mode,
skipping instrumentation of functions with smaller stacks.
With -fsanitize-coverage-stack-depth-callback-min set greater than 0,
the __sanitize_cov_stack_depth() callback will be injected when the
estimated stack depth is greater than or equal to the given minimum.
OpenCL Kernels body is emitted as stubs and the kernel is emitted as
call to respective stub.
(https://github.com/llvm/llvm-project/pull/115821).
The stub function should be alwaysinlined, since call to stub can cause
performance drop.
Co-authored-by: anikelal <anikelal@amd.com>
This adds
- The parsing of `trivially_relocatable_if_eligible`,
`replaceable_if_eligible` keywords
- `__builtin_trivially_relocate`, implemented in terms of memmove. In
the future this should
- Add the appropriate start/end lifetime markers that llvm does not have
(`start_lifetime_as`)
- Add support for ptrauth when that's upstreamed
- the `__builtin_is_cpp_trivially_relocatable` and
`__builtin_is_replaceable` traits
Fixes#127609
When an HLSL Init list is producing a Scalar, handle OpaqueValueExprs in
the Init List with 'emitInitListOpaqueValues'
Copied from 'AggExprEmitter::VisitCXXParenListOrInitListExpr'
Closes#136408
---------
Co-authored-by: Chris B <beanz@abolishcrlf.org>
Most callers want a constant index. Instead of making every caller
create a ConstantInt, we can do it in IRBuilder. This is similar to
createInsertElement/createExtractElement.
Currently BlockAddresses store both the Function and the BasicBlock they
reference, and the BlockAddress is part of the use list of both the
Function and BasicBlock.
This is quite awkward, because this is not really a use of the function
itself (and walks of function uses generally skip block addresses for
that reason). This also has weird implications on function RAUW (as that
will replace the function in block addresses in a way that generally
doesn't make sense), and causes other peculiar issues, like the ability
to have multiple block addresses for one block (with different
functions).
Instead, I believe it makes more sense to specify only the basic block
and let the function be implied by the BB parent. This does mean that we
may have block addresses without a function (if the BB is not inserted),
but this should only happen during IR construction.