This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
- `Builtins.td` - Add f16 support for libm atan2 builtin
- `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin
- `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of
atan2 for clang builtin to lib call calling convention check, now that
atan2 maps to an intrinsic.
- add atan2 cases to llvm.experimental.constrained tests for more
backends: ARM, PowerPC, RISCV, SystemZ.
- LangRef.rst: add llvm.experimental.constrained.atan2, revise
llvm.atan2 description.
Last part of Implement the atan2 HLSL Function. Fixes#70096.
Combined constructs (OpenACC 3.3 section 2.11) are a short-cut for
writing a `loop` construct immediately inside of a `compute` construct.
However, this interaction requires we do additional work to ensure that
we get the semantics between the two correct, as well as diagnostics.
This patch adds the semantic analysis for the constructs (but no
clauses), as well as the AST nodes.
```
- add codegen for llvm builtin to spirv/directx intrinsic in CGBuiltin.cpp
- add lowering of spirv intrinsic to spirv backend in SPIRVInstructionSelector.cpp
- add lowering of directx intrinsic to dxil op in DXIL.td
- add test cases to illustrate passes
- add test case for semantic analysis
```
Resolves#80176
Fixes#88052
- Added the following intrinsics:
- `int_spv_uclamp`
- `int_spv_sclamp`
- `int_spv_fclamp`
- Updated DirectX counterparts to have the same three clamp intrinsics.
- Update the clamp.hlsl unit tests to include SPIRV
- Added the SPIRV specific tests
The __builtin_counted_by_ref builtin is used on a flexible array
pointer and returns a pointer to the "counted_by" attribute's COUNT
argument, which is a field in the same non-anonymous struct as the
flexible array member. This is useful for automatically setting the
count field without needing the programmer's intervention. Otherwise
it's possible to get this anti-pattern:
ptr = alloc(<ty>, ..., COUNT);
ptr->FAM[9] = 42; /* <<< Sanitizer will complain */
ptr->count = COUNT;
To prevent this anti-pattern, the user can create an allocator that
automatically performs the assignment:
#define alloc(TY, FAM, COUNT) ({ \
TY __p = alloc(get_size(TY, COUNT)); \
if (__builtin_counted_by_ref(__p->FAM)) \
*__builtin_counted_by_ref(__p->FAM) = COUNT; \
__p; \
})
The builtin's behavior is heavily dependent upon the "counted_by"
attribute existing. It's main utility is during allocation to avoid
the above anti-pattern. If the flexible array member doesn't have that
attribute, the builtin becomes a no-op. Therefore, if the flexible
array member has a "count" field not referenced by "counted_by", it
must be set explicitly after the allocation as this builtin will
return a "nullptr" and the assignment will most likely be elided.
---------
Co-authored-by: Bill Wendling <isanbard@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
```- create a clang built-in in Builtins.td
- link dot4add_u8packed in hlsl_intrinsics.h
- add lowering to spirv backend through expansion of operation as OpUDot is missing up to SPIRV 1.6 in SPIRVInstructionSelector.cpp
- add lowering to spirv backend using OpUDot if applicable SPIRV version or SPV_KHR_integer_dot_product is enabled
- add dot4add_u8packed intrinsic to IntrinsicsDirectX.td and mapping to DXIL.td op Dot4AddU8Packed
- add tests for HLSL intrinsic lowering to dx/spv intrinsic in dot4add_u8packed.hlsl
- add tests for sema checks in dot4add_u8packed-errors.hlsl
- add test of spir-v lowering in SPIRV/dot4add_u8packed.ll
- add test to dxil lowering in DirectX/dot4add_u8packed.ll
```
Resolves#99219
Implements elementwise firstbithigh hlsl builtin.
Implements firstbituhigh intrinsic for spirv and directx, which handles
unsigned integers
Implements firstbitshigh intrinsic for spirv and directx, which handles
signed integers.
Fixes#113486Closes#99115
LLVM support for the attribute has been implemented already, so it just
plumbs it through to the CUDA front-end.
One notable difference from NVCC is that the attribute can be used
regardless of the targeted GPU. On the older GPUs it will just be
ignored. The attribute is a performance hint, and does not warrant a
hard error if compiler can't benefit from it on a particular GPU
variant.
- create a clang built-in in Builtins.td
- link dot4add_i8packed in hlsl_intrinsics.h
- add lowering to spirv backend through expansion of operation as OPSDot
is missing up to SPIRV 1.6 in SPIRVInstructionSelector.cpp
- add lowering to spirv backend using OpSDot in applicable SPIRV version
or if SPV_KHR_integer_dot_product is enabled
- add dot4add_i8packed intrinsic to IntrinsicsDirectX.td and mapping to
DXIL.td op Dot4AddI8Packed
- add tests for HLSL intrinsic lowering to dx/spv intrinsic in
dot4add_i8packed.hlsl
- add tests for sema checks in dot4add_i8packed-errors.hlsl
- add test of spir-v lowering in SPIRV/dot4add_i8packed.ll
- add test to dxil lowering in DirectX/dot4add_i8packed.ll
Resolves#99220
This commit partially implements SPIRTargetCodeGenInfo::getHLSLType. It
can now generate the spirv type for the following HLSL types:
1. RWBuffer
2. Buffer
3. Sampler
---------
Co-authored-by: Nathan Gauër <github@keenuts.net>
[Related
RFC](https://discourse.llvm.org/t/rfc-support-globpattern-add-operator-to-invert-matches/80683/5?u=justinstitt)
### Summary
Implement type-based filtering via [Sanitizer Special Case
Lists](https://clang.llvm.org/docs/SanitizerSpecialCaseList.html) for
the arithmetic overflow and truncation sanitizers.
Currently, using the `type:` prefix with these sanitizers does nothing.
I've hooked up the SSCL parsing with Clang codegen so that we don't emit
the overflow/truncation checks if the arithmetic contains an ignored
type.
### Usefulness
You can craft ignorelists that ignore specific types that are expected
to overflow or wrap-around. For example, to ignore `my_type` from
`unsigned-integer-overflow` instrumentation:
```bash
$ cat ignorelist.txt
[unsigned-integer-overflow]
type:my_type=no_sanitize
$ cat foo.c
typedef unsigned long my_type;
void foo() {
my_type a = ULONG_MAX;
++a;
}
$ clang foo.c -fsanitize=unsigned-integer-overflow -fsanitize-ignorelist=ignorelist.txt ; ./a.out
// --> no sanitizer error
```
If a type is functionally intended to overflow, like
[refcount_t](https://kernsec.org/wiki/index.php/Kernel_Protections/refcount_t)
and its associated APIs in the Linux kernel, then this type filtering
would prove useful for reducing sanitizer noise. Currently, the Linux
kernel dealt with this by
[littering](https://elixir.bootlin.com/linux/v6.10.8/source/include/linux/refcount.h#L139
) `__attribute__((no_sanitize("signed-integer-overflow")))` annotations
on all the `refcount_t` APIs. I think this serves as an example of how a
codebase could be made cleaner. We could make custom types that are
filtered out in an ignorelist, allowing for types to be more expressive
-- without the need for annotations. This accomplishes a similar goal to
https://github.com/llvm/llvm-project/pull/86618.
Yet another use case for this type filtering is whitelisting. We could
ignore _all_ types, save a few.
```bash
$ cat ignorelist.txt
[implicit-signed-integer-truncation]
type:*=no_sanitize # ignore literally all types
type:short=sanitize # except `short`
$ cat bar.c
// compile with -fsanitize=implicit-signed-integer-truncation
void bar(int toobig) {
char a = toobig; // not instrumented
short b = toobig; // instrumented
}
```
### Other ways to accomplish the goal of sanitizer
allowlisting/whitelisting
* ignore list SSCL type support (this PR that you're reading)
* [my sanitize-allowlist
branch](https://github.com/llvm/llvm-project/compare/main...JustinStitt:llvm-project:sanitize-allowlist)
- this just implements a sibling flag `-fsanitize-allowlist=`, removing
some of the double negative logic present with `skip`/`ignore` when
trying to whitelist something.
* [Glob
Negation](https://discourse.llvm.org/t/rfc-support-globpattern-add-operator-to-invert-matches/80683)
- Implement a negation operator to the GlobPattern class so the
ignorelist query can use them to simulate allowlisting
Please let me know which of the three options we like best. They are not
necessarily mutually exclusive.
Here's [another related
PR](https://github.com/llvm/llvm-project/pull/86618) which implements a
`wraps` attribute. This can accomplish a similar goal to this PR but
requires in-source changes to codebases and also covers a wider variety
of integer definedness problems.
### CCs
@kees @vitalybuka @bwendling
---------
Signed-off-by: Justin Stitt <justinstitt@google.com>
The early simplication pipeline is used in non-LTO and (Thin/Full)LTO
pre-link
stage. There are some passes that we want them in non-LTO mode, but not
at LTO
pre-link stage. The control is missing currently. This PR adds the
support. To
demonstrate the use, we only enable the internalization pass in non-LTO
mode for
AMDGPU because having it run in pre-link stage causes some issues.
The nested switch exists to share setting IntrinsicsTypes to {ResultType}.
clz/ctz return before we reach that so they can just be in the top
level switch.
Currently, the `DropTypeTests` parameter only fully works with phi nodes
and llvm.assume instructions. However, we'd like CFI to work in
conjunction with FatLTO, in so far as the bitcode section should be able
to contain the CFI instrumentation, while any incompatible bits are
dropped when compiling the object code.
To do that, we need to drop the llvm.type.test instructions everywhere,
and not just their uses in phi nodes. This patch updates the
LowerTypeTest pass so that uses are removed, and replaced with `true` in
all cases, and not just in phi nodes.
Addressing this will allow us to fix#112053 by modifying the FatLTO
pipeline.
Reviewers: pcc, nikic
Reviewed By: pcc
Pull Request: https://github.com/llvm/llvm-project/pull/112787
UAVs and SRVs have already been converted to use LLVM target types and
we can disable generating of the !hlsl.uavs and !hlsl.srvs! annotations.
This will enable adding tests for structured buffers with user defined
types that this old resource annotations code does not handle (it
crashes).
Part 1 of #114126
Reproducer:
```
//--- a.cppm
export module a;
int func();
static int a = func();
//--- a.cpp
import a;
```
The `func()` should only execute once. However, before this patch we
will somehow import `static int a` from a.cppm incorrectly and
initialize that again.
This is super bad and can introduce serious runtime behaviors.
And also surprisingly, it looks like the root cause of the problem is
simply some oversight choosing APIs.
Enables the support of `-fcf-protection=return` on RISC-V, which
requires Zicfiss. It also adds a string attribute "hw-shadow-stack"
to every function if the option is set on RISC-V
Pure Scalable Types are defined in AAPCS64 here:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#pure-scalable-types-psts
And should be passed according to Rule C.7 here:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#682parameter-passing-rules
This part of the ABI is completely unimplemented in Clang, instead it
treats PSTs sometimes as HFAs/HVAs, sometime as general composite types.
This patch implements the rules for passing PSTs by employing the
`CoerceAndExpand` method and extending it to:
* allow array types in the `coerceToType`; Now only `[N x i8]` are
considered padding.
* allow mismatch between the elements of the `coerceToType` and the
elements of the `unpaddedCoerceToType`; AArch64 uses this to map
fixed-length vector types to SVE vector types.
Corectly passing a PST argument needs a decision in Clang about whether
to pass it in memory or registers or, equivalently, whether to use the
`Indirect` or `Expand/CoerceAndExpand` method. It was considered
relatively harder (or not practically possible) to make that decision in
the AArch64 backend.
Hence this patch implements the register counting from AAPCS64 (cf.
`NSRN`, `NPRN`) to guide the Clang's decision.
Remove these intrinsics which can be better represented by load
instructions with `!invariant.load` metadata:
- llvm.nvvm.ldg.global.i
- llvm.nvvm.ldg.global.f
- llvm.nvvm.ldg.global.p
# What
This PR renames the newly-introduced llvm attribute
`sanitize_realtime_unsafe` to `sanitize_realtime_blocking`. Likewise,
sibling variables such as `SanitizeRealtimeUnsafe` are renamed to
`SanitizeRealtimeBlocking` respectively. There are no other functional
changes.
# Why?
- There are a number of problems that can cause a function to be
real-time "unsafe",
- we wish to communicate what problems rtsan detects and *why* they're
unsafe, and
- a generic "unsafe" attribute is, in our opinion, too broad a net -
which may lead to future implementations that need extra contextual
information passed through them in order to communicate meaningful
reasons to users.
- We want to avoid this situation and make the runtime library boundary
API/ABI as simple as possible, and
- we believe that restricting the scope of attributes to names like
`sanitize_realtime_blocking` is an effective means of doing so.
We also feel that the symmetry between `[[clang::blocking]]` and
`sanitize_realtime_blocking` is easier to follow as a developer.
# Concerns
- I'm aware that the LLVM attribute `sanitize_realtime_unsafe` has been
part of the tree for a few weeks now (introduced here:
https://github.com/llvm/llvm-project/pull/106754). Given that it hasn't
been released in version 20 yet, am I correct in considering this to not
be a breaking change?
ARM ACLE PR#323[1] adds new modal types for 8-bit floating point
intrinsic.
From the PR#323:
```
ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3
8-bit floating-point formats. It is a storage and interchange only type
with no arithmetic operations other than intrinsic calls.
````
The type should be an opaque type and its format in undefined in Clang.
Only defined in the backend by a status/format register, for AArch64 the
FPMR.
This patch is an attempt to the add the mfloat8_t scalar type. It has a
parser and codegen for the new scalar type.
The patch it is lowering to and 8bit unsigned as it has no format. But
maybe we should add another opaque type.
[1] https://github.com/ARM-software/acle/pull/323
This patch implements an approach to communicate errors between the
OMPIRBuilder and its users. It introduces `llvm::Error` and
`llvm::Expected` objects to replace the values returned by callbacks
passed to `OMPIRBuilder` codegen functions. These functions then check
the result for errors when callbacks are called and forward them back to
the caller, which has the flexibility to recover, exit cleanly or dump a
stack trace.
This prevents a failed callback to leave the IR in an invalid state and
still continue the codegen process, triggering unrelated assertions or
segmentation faults. In the case of MLIR to LLVM IR translation of the
'omp' dialect, this change results in the compiler emitting errors and
exiting early instead of triggering a crash for not-yet-implemented
errors. The behavior in Clang and openmp-opt stays unchanged, since
callbacks will continue always returning 'success'.
If a call using the musttail attribute returns it's value through an
sret argument pointer, we must forward an incoming sret pointer to it,
instead of creating a new alloca. This is always possible because the
musttail attribute requires the caller and callee to have the same
argument and return types.
Extends `nowait` support for other device directives. This PR refactors
the task generation utils used for the `target` directive so that they
are general enough to be reused for other device directives as well.
Fixes: #113187
Avoid to create init function since clang does not support global
variable with flexible array init.
It will cause assertion failure later.
When compiling HIP source for AMDGCN flavoured SPIR-V that is expected
to be consumed by the ROCm HIP RT, it's not desirable to set the OpenCL
Kernel CC on `__global__` functions. On one hand, this is not an OpenCL
RT, so it doesn't compose with e.g. OCL specific attributes. On the
other it is a "noisy" CC that carries semantics, and breaks overload
resolution when using [generic dispatchers such as those used by
RAJA](186d4194a5/src/common/HipDataUtils.hpp (L39)).
Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use
`private` as the default address space. This is wrong for cases where
the `generic` address space is available, and is corrected via this
patch. In general, this AS map abuse is a bad hack and we should re-work
it altogether, but at least after this patch we will stop being
incorrect for e.g. OpenCL 2.0.
Fixed: #113044
the type of `ArrayTypeTraitExpr` can be changed, use i32 directly is
incorrect.
---------
Co-authored-by: Eli Friedman <efriedma@quicinc.com>