This will allow vectorizing these calls (after a few more patches). This
should not change the codegen for targets that enable the use of AA
during the codegen (in `TargetSubtargetInfo::useAA()`). This includes
targets such as AArch64. This notably does not include x86 but can be
worked around by passing `-mllvm -combiner-global-alias-analysis=true`
to clang.
Follow up to #114086.
Attempting to pass a `ptr addrspace(7)` to functions that take `ptr`
arguments produces undesirable `addrspacecast(addrspacecast(p8 x to p7)
to p0) => addrspacecast(p8 x to p0)` folds. This results in illegal GEP
operations on buffer resources, which can't be GEP'd. (However, note
that, while unimplemneted, addressspacecast from ptr addrspace(7) to ptr
is legal - it's just an effective address computation)
To resolve this problem, and thus prevent illegal
`getelementptr T, ptr addrspace(8) %x, ...` s from being produces, this
commit extends amdgcn.make.buffer.rsrc to also be variadic in its result
type, auto-upgrading old manglings.
The logic for handling a make.buffer.rsrc in instruction selection
remains untouched and expects the output type to be a ptr addrspace(8),
as does the Clang lowering for its builtin (the pointer-to-pointer
version might want a different name in clang). LowerBufferFatPointers
has been updated to lower
amdgcn.make.buffer.rsrc.p7.p* to amdgcn.make.buffer.rsrc.p8.p* .
This'll also make exposing buffer fat pointers in Clang easier, since
you don't have to cast between a `__amdgcn_rsrc_t` and a pointer.
This patch adds a new __builtin_assume_dereferenceable to encode
dereferenceability of a pointer using llvm.assume with an operand
bundle.
For now the builtin only accepts constant sizes, I am planning to drop
this restriction in a follow-up change.
This can be used to better optimize cases where a pointer is known to be
dereferenceable, e.g. unconditionally loading from p2 when vectorizing
the loop.
int *get_ptr();
void foo(int* src, int x) {
int *p2 = get_ptr();
__builtin_assume_aligned(p2, 4);
__builtin_assume_dereferenceable(p2, 4000);
for (unsigned I = 0; I != 1000; ++I) {
int x = src[I];
if (x == 0)
x = p2[I];
src[I] = x;
}
}
PR: https://github.com/llvm/llvm-project/pull/121789
This is straightforward as we already had all the necessary
instructions, they simply were not wired up.
Also allows implementing the vec_round intrinsic via the
standard llvm.roundeven IR instead of a platform intrinsic now.
This both reapplies #118734, the initial attempt at this, and updates it
significantly.
First, it uses the newly added `StringTable` abstraction for string
tables, and simplifies the construction to build the string table and
info arrays separately. This should reduce any `constexpr` compile time
memory or CPU cost of the original PR while significantly improving the
APIs throughout.
It also restructures the builtins to support sharding across several
independent tables. This accomplishes two improvements from the
original PR:
1) It improves the APIs used significantly.
2) When builtins are defined from different sources (like SVE vs MVE in
AArch64), this allows each of them to build their own string table
independently rather than having to merge the string tables and info
structures.
3) It allows each shard to factor out a common prefix, often cutting the
size of the strings needed for the builtins by a factor two.
The second point is important both to allow different mechanisms of
construction (for example a `.def` file and a tablegen'ed `.inc` file,
or different tablegen'ed `.inc files), it also simply reduces the sizes
of these tables which is valuable given how large they are in some
cases. The third builds on that size reduction.
Initially, we use this new sharding rather than merging tables in
AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system
works, as without further changes these still push scaling limits.
Subsequent commits will more deeply leverage the new structure,
including using the prefix capabilities which cannot be easily factored
out here and requires deep changes to the targets.
This broke the build, see buildbot comments on the PR.
This reverts commit ee92122b53c7af26bb766e89e1d30ceb2fd5bb93 and
follow-up 5dccfd9283cd784758aa3d16fcb6e31f135c080f.
This is similar in spirit to previous changes to make _mm_mfence
builtins to avoid conflicts with winnt.h and other MSVC ecosystem
headers that pre-declare compiler intrinsics as extern "C" symbols.
Also update the feature flag for _mm_prefetch to sse, which is more accurate than mmx.
This should fix issue #87515.
Refactoring of how __builtin_dynamic_object_size() is calculated for
flexible array members (in preparation for adding support for the
'counted_by' attribute on pointers in structs).
The only functionality change is that we use the already emitted Expr
code to build our calculations off of rather than re-emitting the Expr.
That allows the 'StructFieldAccess' visitor to sift through all casts
and ArraySubscriptExprs to find the first MemberExpr. We build our GEPs
and calculate offsets based off of relative distances from that
MemberExpr.
The testcase passes execution tests.
Calculate the flexible array member's object size using these formulae
(note: if the calculation is negative, we return 0.):
struct p;
struct s {
/* ... */
int count;
struct p *array[] __attribute__((counted_by(count)));
};
1) 'ptr->array':
count = ptr->count;
flexible_array_member_base_size = sizeof (*ptr->array);
flexible_array_member_size =
count * flexible_array_member_base_size;
if (flexible_array_member_size < 0)
return 0;
return flexible_array_member_size;
2) '&ptr->array[idx]':
count = ptr->count;
index = idx;
flexible_array_member_base_size = sizeof (*ptr->array);
flexible_array_member_size =
count * flexible_array_member_base_size;
index_size = index * flexible_array_member_base_size;
if (flexible_array_member_size < 0 || index < 0)
return 0;
return flexible_array_member_size - index_size;
3) '&ptr->field':
count = ptr->count;
sizeof_struct = sizeof (struct s);
flexible_array_member_base_size = sizeof (*ptr->array);
flexible_array_member_size =
count * flexible_array_member_base_size;
field_offset = offsetof (struct s, field);
offset_diff = sizeof_struct - field_offset;
if (flexible_array_member_size < 0)
return 0;
return offset_diff + flexible_array_member_size;
4) '&ptr->field_array[idx]':
count = ptr->count;
index = idx;
sizeof_struct = sizeof (struct s);
flexible_array_member_base_size = sizeof (*ptr->array);
flexible_array_member_size =
count * flexible_array_member_base_size;
field_base_size = sizeof (*ptr->field_array);
field_offset = offsetof (struct s, field)
field_offset += index * field_base_size;
offset_diff = sizeof_struct - field_offset;
if (flexible_array_member_size < 0 || index < 0)
return 0;
return offset_diff + flexible_array_member_size;
---------
Signed-off-by: Bill Wendling <morbo@google.com>
GCC supports three flags related to overflow behavior:
* `-fwrapv`: Makes signed integer overflow well-defined.
* `-fwrapv-pointer`: Makes pointer overflow well-defined.
* `-fno-strict-overflow`: Implies `-fwrapv -fwrapv-pointer`, making both
signed integer overflow and pointer overflow well-defined.
Clang currently only supports `-fno-strict-overflow` and `-fwrapv`, but
not `-fwrapv-pointer`.
This PR proposes to introduce `-fwrapv-pointer` and adjust the semantics
of `-fwrapv` to match GCC.
This allows signed integer overflow and pointer overflow to be
controlled independently, while `-fno-strict-overflow` still exists to
control both at the same time (and that option is consistent across GCC
and Clang).
``` - add clang builtin to Builtins.td
- link builtin in hlsl_intrinsics
- add codegen for spirv intrinsic and two directx intrinsics to retain
signedness information of the operands in CGBuiltin.cpp
- add semantic analysis in SemaHLSL.cpp
- add lowering of spirv intrinsic to spirv backend in
SPIRVInstructionSelector.cpp
- add lowering of directx intrinsics to WaveActiveOp dxil op in
DXIL.td
- add test cases to illustrate passespendent pr merges.
```
Resolves#99170
Reimplement Neon FP8 vector types using attribute `neon_vector_type`
instead of having them as builtin types.
This allows to implement FP8 Neon intrinsics without the need to add
special cases for these types when using `__builtin_shufflevector`
or bitcast (using C-style cast operator) between vectors, both
extensively used in the generated code in `arm_neon.h`.
Reverts llvm/llvm-project#123853
The introduction of `reflect-error.ll` surfaced a bug with the use of
`report_fatal_error` in `SPIRVInstructionSelector` that was propagated
into the pr. This has caused a build-bot breakage, and the work to solve
the underlying issue is tracked here:
https://github.com/llvm/llvm-project/issues/124045. We can re-apply this
commit when the underlying issue is resolved.
This PR relands
[#122992](https://github.com/llvm/llvm-project/pull/122992).
Some machines were failing to run the `reflect-error.ll` test due to the
RUN lines
```llvm
; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %}
; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %}
```
which failed when `spirv-tools` was not present on the machine due to
running the command `not` without any arguments.
These RUN lines have been removed since they don't actually test
anything new compared to the other two RUN lines due to the expected
error during instruction selection.
```llvm
; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s
; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s
```
Re-write the sema and codegen for the atomic_test_and_set and
atomic_clear builtin functions to go via AtomicExpr, like the other
atomic builtins do. This simplifies the code, because AtomicExpr already
handles things like generating code for to dynamically select the memory
ordering, which was duplicated for these builtins. This also fixes a few
crash bugs, one when passing an integer to the pointer argument, and one
when using an array.
This also adds diagnostics for the memory orderings which are not valid
for atomic_clear according to
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which
were missing before.
Fixes https://github.com/llvm/llvm-project/issues/111293.
This is a re-land of #120449, modified to allow any non-const pointer
type for the first argument.
Fixes#99152
Tasks completed:
- Implement `reflect` in `clang/lib/Headers/hlsl/hlsl_intrinsics.h`
- Implement the `reflect` SPIR-V target built-in in
`clang/include/clang/Basic/BuiltinsSPIRV.td`
- Add a SPIR-V fast path in `clang/lib/Headers/hlsl/hlsl_detail.h` in
the form
```c++
#if (__has_builtin(__builtin_spirv_reflect))
return __builtin_spirv_reflect(...);
#else
return ...; // regular behavior
#endif
```
- Add codegen for the SPIR-V `reflect` built-in to
`EmitSPIRVBuiltinExpr` in `clang/lib/CodeGen/CGBuiltin.cpp`
- Add HLSL codegen tests to
`clang/test/CodeGenHLSL/builtins/reflect.hlsl`
- Add SPIR-V built-in codegen tests to
`clang/test/CodeGenSPIRV/Builtins/reflect.c`
- Add sema tests to `clang/test/SemaHLSL/BuiltIns/reflect-errors.hlsl`
- Add SPIR-V sema tests to
`clang/test/CodeGenSPIRV/Builtins/reflect-errors.c`
- Create the `int_spv_reflect` intrinsic in
`llvm/include/llvm/IR/IntrinsicsSPIRV.td`
- In `llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp` create the
`reflect` lowering and map it to `int_spv_reflect` in
`SPIRVInstructionSelector::selectIntrinsic`
- Create a SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/reflect.ll`
Additional tasks completed:
- Implement sema check for the `reflect` SPIR-V built-in in
`clang/lib/Sema/SemaSPIRV.cpp`
- Required for HLSL codegen to work via the SPIR-V fast path, because
the types defined in `clang/include/clang/Basic/BuiltinsSPIRV.td` are
being overridden
- Create SPIR-V backend error test case in
`llvm/test/CodeGen/SPIRV/opencl/reflect-error.ll`
- Since `reflect` is only available in the GLSL extended instruction
set, using it in OpenCL should result in an error
Incomplete tasks:
- Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/opencl/reflect.ll`
- An OpenCL test is not applicable in this case because the [OpenCL
SPIR-V extended instruction
set](https://registry.khronos.org/SPIR-V/specs/unified1/OpenCL.ExtendedInstructionSet.100.html)
does not include a `reflect` function
This started out as trying to combine bf16 fpround to BFCVT2
instructions, but ended up removing the aarch64.neon.nfcvt intrinsics in
favour of generating fpround instructions directly. This simplifies the
patterns and can lead to other optimizations. The BFCVT2 instruction is
adjusted to makes sure the types are valid, and a bfcvt2 is now
generated in more place. The old intrinsics are auto-upgraded to fptrunc
instructions too.
This patch adds support for the next-generation arch15
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Detection of arch15 as host processor.
- Assembler/disassembler support for new instructions.
- Exploitation of new instructions for code generation.
- New vector (signed|unsigned|bool) __int128 data types.
- New LLVM intrinsics for certain new instructions.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining __VEC__ == 10305.
Note: No currently available Z system supports the arch15
architecture. Once new systems become available, the
official system name will be added as supported -march name.
``` - add clang builtin to Builtins.td
- link builtin in hlsl_intrinsics
- add codegen for spirv intrinsic and two directx intrinsics to retain
signedness information of the operands in CGBuiltin.cpp
- add semantic analysis in SemaHLSL.cpp
- add lowering of spirv intrinsic to spirv backend in
SPIRVInstructionSelector.cpp
- add lowering of directx intrinsics to WaveActiveOp dxil op in
DXIL.td
- add test cases to illustrate passespendent pr merges.
```
Resolves#70106
---------
Co-authored-by: Finn Plummer <canadienfinn@gmail.com>
Closes https://github.com/llvm/llvm-project/issues/99116
Implements `firstbitlow` by extracting common functionality from
`firstbithigh` into a shared function while also fixing a bug for an edge
case where `u64x3` and larger vectors will attempt to create vectors
larger than the SPRIV max of 4.
---------
Co-authored-by: Steven Perron <stevenperron@google.com>
The `Checked` parameter of `CodeGenFunction::EmitCheck` is of type
`ArrayRef<std::pair<llvm::Value *, SanitizerMask>>`, which is overly
generalized: SanitizerMask can denote that zero or more sanitizers are
enabled, but `EmitCheck` requires that exactly one sanitizer is
specified in the SanitizerMask (e.g.,
`SanitizeTrap.has(Checked[i].second)` enforces that).
This patch replaces SanitizerMask with SanitizerOrdinal in the `Checked`
parameter of `EmitCheck` and code that transitively relies on it. This
should not affect the behavior of UBSan, but it has the advantages that:
- the code is clearer: it avoids ambiguity in EmitCheck about what to do
if multiple bits are set
- specifying the wrong number of sanitizers in `Checked[i].second` will
be detected as a compile-time error, rather than a runtime assertion
failure
Suggested by Vitaly in https://github.com/llvm/llvm-project/pull/122392
as an alternative to adding an explicit runtime assertion that the
SanitizerMask contains exactly one sanitizer.
## Changes
- Delete DirectX length intrinsic
- Delete HLSL length lang builtin
- Implement length algorithm entirely in the header.
## History
- In the past if an HLSL intrinsic lowered to either a spirv op code or
a DXIL opcode we represented it with intrinsics
## Why we are moving away?
- To make HLSL apis more portable the team decided that it makes sense
for some intrinsics to be defined only in the header.
- Since there tends to be more SPIRV opcodes than DXIL opcodes the plan
is to support SPIRV opcodes either with target specific builtins or via
pattern matching.
## Changes
- Delete DirectX length intrinsic
- Delete HLSL length lang builtin
- Implement length algorithm entirely in the header.
## History
- In the past if an HLSL intrinsic lowered to either a spirv op code or
a DXIL opcode we represented it with intrinsics
## Why we are moving away?
- To make HLSL apis more portable the team decided that it makes sense
for some intrinsics to be defined only in the header.
- Since there tends to be more SPIRV opcodes than DXIL opcodes the plan
is to support SPIRV opcodes either with target specific builtins or via
pattern matching.
Replacing the extant streaming mode function call with an intrinsic
allows us to make further optimisations around it. For example, if it's
called within a function that has a known streaming mode, we can remove
the dead code, and avoid the redundant conditional branch.
- Update pr labeler so new SPIRV files get properly labeled.
- Add distance target builtin to BuiltinsSPIRV.td.
- Update TargetBuiltins.h to account for spirv builtins.
- Update clang basic CMakeLists.txt to build spirv builtin tablegen.
- Hook up sema for SPIRV in Sema.h|cpp, SemaSPIRV.h|cpp, and
SemaChecking.cpp.
- Hookup sprv target builtins to SPIR.h|SPIR.cpp target.
- Update GBuiltin.cpp to emit spirv intrinsics when we get the expected
spirv target builtin.
Consensus was reach in this RFC to add both target builtins and pattern
matching:
https://discourse.llvm.org/t/rfc-add-targetbuiltins-for-spirv-to-support-hlsl/83329.
pattern matching will come in a separate pr this one just sets up the
groundwork to do target builtins for spirv.
partially resolves
[#99107](https://github.com/llvm/llvm-project/issues/99107)
This registers `sincos[f|l]` as a clang builtin and updates GCBuiltin to
emit the `llvm.sincos.*` intrinsic when `-fno-math-errno` is set. Note:
`llvm.sincos.*` is only emitted by `__builtin_sincos[f|l]` functions in
this initial patch.
Re-write the sema and codegen for the atomic_test_and_set and
atomic_clear builtin functions to go via AtomicExpr, like the other
atomic builtins do. This simplifies the code, because AtomicExpr already
handles things like generating code for to dynamically select the memory
ordering, which was duplicated for these builtins. This also fixes a few
crash bugs, one when passing an integer to the pointer argument, and one
when using an array.
This also adds diagnostics for the memory orderings which are not valid
for atomic_clear according to
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which
were missing before.
Fixes#111293.
Resolves https://github.com/llvm/llvm-project/issues/99161
- [x] Implement `WaveActiveAllTrue` clang builtin,
- [x] Link `WaveActiveAllTrue` clang builtin with `hlsl_intrinsics.h`
- [x] Add sema checks for `WaveActiveAllTrue` to
`CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp`
- [x] Add codegen for `WaveActiveAllTrue` to `EmitHLSLBuiltinExpr` in
`CGBuiltin.cpp`
- [x] Add codegen tests to
`clang/test/CodeGenHLSL/builtins/WaveActiveAllTrue.hlsl`
- [x] Add sema tests to
`clang/test/SemaHLSL/BuiltIns/WaveActiveAllTrue-errors.hlsl`
- [x] Create the `int_dx_WaveActiveAllTrue` intrinsic in
`IntrinsicsDirectX.td`
- [x] Create the `DXILOpMapping` of `int_dx_WaveActiveAllTrue` to `114`
in `DXIL.td`
- [x] Create the `WaveActiveAllTrue.ll` and
`WaveActiveAllTrue_errors.ll` tests in `llvm/test/CodeGen/DirectX/`
- [x] Create the `int_spv_WaveActiveAllTrue` intrinsic in
`IntrinsicsSPIRV.td`
- [x] In SPIRVInstructionSelector.cpp create the `WaveActiveAllTrue`
lowering and map it to `int_spv_WaveActiveAllTrue` in
`SPIRVInstructionSelector::selectIntrinsic`.
- [x] Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveAllTrue.ll`
Patch[1] has update intrinsic interface for ld1/st1, while based on
ARM's document, "If the intrinsic also has a vnum argument, the ZA slice
number is calculated by adding vnum to slice.". But the "vnum" did not
work for our realization now, this patch fix this point.
[1]ee31ba0dd9
Adds support for the following MSVC intrinsics:
* `_InterlockedCompareExchangePointer_acq`
* `_InterlockedCompareExchangePointer_rel`
* `_InterlockedExchangePointer_acq`
* `_InterlockedExchangePointer_nf`
* `_InterlockedExchangePointer_rel`
These are documented at:
<https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170#interlocked-intrinsics>
NOTE: `_InterlockedCompareExchangePointer_nf` is not being added since
it already exists, although it was incorrectly added for all
architectures instead of being Arm & AArch64 specific.
This change also unifies how the pointer and non-pointer interlocked
compare-exchange intrinsics are being handled.