All variable declarations in the global scope that are not resources,
static or empty are implicitly added to implicit constant buffer
`$Globals`. They are created in `hlsl_constant` address space and
collected in an implicit `HLSLBufferDecl` node that is added to the AST
at the end of the translation unit. Codegen is the same as for explicit
constant buffers.
Fixes#123801
Translates `cbuffer` declaration blocks to `target("dx.CBuffer")` type. Creates global variables in `hlsl_constant` address space for all `cbuffer` constant and adds metadata describing which global constant belongs to which constant buffer. For explicit constant buffer layout information an explicit layout type `target("dx.Layout")` is used. This might change in the future.
The constant globals are temporary and will be removed in upcoming pass that will translate `load` instructions in the `hlsl_constant` address space to constant buffer load intrinsics calls off a CBV handle (#124630, #112992).
See [Constant buffer design
doc](https://github.com/llvm/wg-hlsl/pull/94) for more details.
Fixes#113514, #106596
Commit 77d3f8a avoids distinct tags for any pointers where the ultimate
pointee type is `void`, to solve breakage in real-world code that uses
(indirections to) `void*` for polymorphism over different pointer types.
While this matches the TBAA implementation in GCC, this patch implements
a refinement that distinguishes void pointers by pointer depth, as
described in the "strict aliasing" documentation included in the
aforementioned commit:
> `void*` is permitted to alias any pointer type, `void**` is permitted
> to alias any pointer to pointer type, and so on.
For example, `void**` is no longer considered to alias `int*` in this
refinement, but it remains possible to use `void**` for polymorphism
over pointers to pointers.
Currently, the clang frontend incorrectly emits the callee instead of
the thunk for the callee in the VTable. This is the case because the
thunk index is not incremented when their callees cannot be emitted.
This patch fixes the bug.
This patch allows using fpfeatures pragmas with __builtin_convertvector:
- added TrailingObjects with FPOptionsOverride and methods for handling
it to ConvertVectorExpr
- added support for codegen, node dumping, and serialization of
fpfeatures contained in ConvertVectorExpr
This will allow vectorizing these calls (after a few more patches). This
should not change the codegen for targets that enable the use of AA
during the codegen (in `TargetSubtargetInfo::useAA()`). This includes
targets such as AArch64. This notably does not include x86 but can be
worked around by passing `-mllvm -combiner-global-alias-analysis=true`
to clang.
Follow up to #114086.
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.
This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.
For SWDEV-512631
Attempting to pass a `ptr addrspace(7)` to functions that take `ptr`
arguments produces undesirable `addrspacecast(addrspacecast(p8 x to p7)
to p0) => addrspacecast(p8 x to p0)` folds. This results in illegal GEP
operations on buffer resources, which can't be GEP'd. (However, note
that, while unimplemneted, addressspacecast from ptr addrspace(7) to ptr
is legal - it's just an effective address computation)
To resolve this problem, and thus prevent illegal
`getelementptr T, ptr addrspace(8) %x, ...` s from being produces, this
commit extends amdgcn.make.buffer.rsrc to also be variadic in its result
type, auto-upgrading old manglings.
The logic for handling a make.buffer.rsrc in instruction selection
remains untouched and expects the output type to be a ptr addrspace(8),
as does the Clang lowering for its builtin (the pointer-to-pointer
version might want a different name in clang). LowerBufferFatPointers
has been updated to lower
amdgcn.make.buffer.rsrc.p7.p* to amdgcn.make.buffer.rsrc.p8.p* .
This'll also make exposing buffer fat pointers in Clang easier, since
you don't have to cast between a `__amdgcn_rsrc_t` and a pointer.
This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare
Mapper directive.
Since both MLIR and Clang now support custom mappers, I've changed the
respective function params to no longer be optional as well.
Depends on #121005
The layout and the size of an ObjC interface can change after its
corresponding implementation is parsed when synthesized ivars or ivars
declared in categories are added to the interface's list of ivars. This
can cause clang to mis-compile if the optimization that emits fixed
offsets for ivars (see 923ddf65f4e21ec67018cf56e823895de18d83bc) uses an
ObjC class layout that is outdated and no longer reflects the current
state of the class.
For example, when compiling `constant-non-fragile-ivar-offset.m`, clang
emits 20 instead of 24 as the offset for `IntermediateClass2Property` as
the class layout for `SuperClass2`, which is created when the
implementation of IntermediateClass3 is parsed, is outdated when the
implementation of `IntermediateClass2` is parsed.
This commit invalidates the stale layout information of the class and
its subclasses if new ivars are added to the interface.
With this change, we can also stop using ObjC implementation decls as
the key to retrieve ObjC class layouts information as the layout
retrieved using the ObjC interface as the key will always be up to date.
rdar://139531391
This PR implements HLSL's initialization list behvaior as specified in
the draft language specifcation under
[*Decl.Init.Agg*](https://microsoft.github.io/hlsl-specs/specs/hlsl.html#Decl.Init.Agg).
This behavior is a bit unusual for C/C++ because intermediate braces in
initializer lists are ignored and a whole array of additional
conversions occur unintuitively to how initializaiton works in C.
The implementaiton in this PR generates a valid C/C++ initialization
list AST for the HLSL initializer so that there are no changes required
to Clang's CodeGen to support this. This design will also allow us to
use Clang's rewrite to convert HLSL initializers to valid C/C++
initializers that are equivalent. It does have the downside that it will
generate often redundant accesses during codegen. The IR optimizer is
extremely good at eliminating those so this will have no impact on the
final executable performance.
There is some opportunity for optimizing the initializer list generation
that we could consider in subsequent commits. One notable opportunity
would be to identify aggregate objects that occur in the same place in
both initializers and do not require converison, those aggregates could
be initialized as aggregates rather than fully scalarized.
Closes#56067
---------
Co-authored-by: Finn Plummer <50529406+inbelic@users.noreply.github.com>
Co-authored-by: Helena Kotas <hekotas@microsoft.com>
Co-authored-by: Justin Bogner <mail@justinbogner.com>
Implement HLSL Aggregate Splat casting that handles splatting for arrays
and structs, and vectors if splatting from a vec1.
Closes#100609 and Closes#100619
Depends on #118842
This patch adds a new __builtin_assume_dereferenceable to encode
dereferenceability of a pointer using llvm.assume with an operand
bundle.
For now the builtin only accepts constant sizes, I am planning to drop
this restriction in a follow-up change.
This can be used to better optimize cases where a pointer is known to be
dereferenceable, e.g. unconditionally loading from p2 when vectorizing
the loop.
int *get_ptr();
void foo(int* src, int x) {
int *p2 = get_ptr();
__builtin_assume_aligned(p2, 4);
__builtin_assume_dereferenceable(p2, 4000);
for (unsigned I = 0; I != 1000; ++I) {
int x = src[I];
if (x == 0)
x = p2[I];
src[I] = x;
}
}
PR: https://github.com/llvm/llvm-project/pull/121789
`sret` arguments are always going to reside in the stack/`alloca`
address space, which makes the current formulation where their AS is
derived from the pointee somewhat quaint. This patch ensures that `sret`
ends up pointing to the `alloca` AS in IR function signatures, and also
guards agains trying to pass a casted `alloca`d pointer to a `sret` arg,
which can happen for most languages, when compiled for targets that have
a non-zero `alloca` AS (e.g. AMDGCN) / map `LangAS::default` to a
non-zero value (SPIR-V). A target could still choose to do something
different here, by e.g. overriding `classifyReturnType` behaviour.
In a broader sense, this patch extends non-aliased indirect args to also
carry an AS, which leads to changing the `getIndirect()` interface. At
the moment we're only using this for (indirect) returns, but it allows
for future handling of indirect args themselves. We default to using the
AllocaAS as that matches what Clang is currently doing, however if, in
the future, a target would opt for e.g. placing indirect returns in some
other storage, with another AS, this will require revisiting.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
This is straightforward as we already had all the necessary
instructions, they simply were not wired up.
Also allows implementing the vec_round intrinsic via the
standard llvm.roundeven IR instead of a platform intrinsic now.
After #124287 updated several functions to return iterators rather than
Instruction *, it was no longer straightforward to pass their result to
DIBuilder. This commit updates DIBuilder methods to accept an
InsertPosition instead, so that they can be called with an iterator
(preferred), or with a deprecation warning an Instruction *, or a
BasicBlock *. This commit also updates the existing calls to the
DIBuilder methods to pass in iterators.
As a special exception, DIBuilder::insertDeclare() keeps a separate
overload accepting a BasicBlock *InsertAtEnd. This is because despite
the name, this method does not insert at the end of the block, therefore
this cannot be handled implicitly by using InsertPosition.
- Set the shader flag `DisableOptimizations` based on `optnone`
attribute of shader entry functions.
- Add DXIL Metadata Analysis pass as pre-requisite for Shader Flags pass
to obtain entry function information collected therein.
- Named module metadata `dx.disable_optimizations` is intended to
indicate disabling optimizations (`-O0`) via commandline flag. However,
its intent is fulfilled by `optnone` attribute of shader entry functions as
implemented in a recent change, and thus not needed. Delete
generation of named metadata and corresponding test file
`disable_opt.ll`.
- Add tests to verify correctness of setting shader flag.
Closes#112263
These files are relatively old and don't confront our formatting rules.
It's hard to change them without massive clang-format changes.
---------
Signed-off-by: Peter Rong <PeterRong@meta.com>
After #124287 updated several functions to return iterators rather than
Instruction *, it was no longer straightforward to pass their result to
DIBuilder. This commit updates DIBuilder methods to accept an
InsertPosition instead, so that they can be called with an iterator
(preferred), or with a deprecation warning an Instruction *, or a
BasicBlock *. This commit also updates the existing calls to the
DIBuilder methods to pass in iterators.
We should be able to use `spirv64` as a device variant match and it
should be considered a GPU.
Also add the triple to an RTTI check.
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Replace some more nvvm.annotations with function attributes,
auto-upgrading the annotations as needed. These new attributes will be
more idiomatic and compile-time efficient than the annotations.
- !"maxclusterrank" / !"cluster_max_blocks" -> "nvvm.maxclusterrank"
- !"minctasm" -> "nvvm.minctasm"
- !"maxnreg" -> "nvvm.maxnreg"
A proposed fix for the issue #95611, [OpenMP][SIMD] ordered has no
effect in a loop SIMD region as of LLVM 18.1.0
Changes:
- Implement new lowering behavior: Conservatively serialize "omp simd"
loops that have `omp simd ordered` directive to prevent incorrect
vectorization (which results in incorrect execution behavior of the
miscompiled program).
Implementation outline:
- We start with the optimistic default initial value of
`LoopStack.setParallel(/Enable=/true);` in
`CodeGenFunction::EmitOMPSimdInit(const OMPLoopDirective &D)`.
- We only disable the loop parallel memory access assumption with `if
(HasOrderedDirective) LoopStack.setParallel(/Enable=/false);` using the
`HasOrderedDirective` (which tests for the presence of an
`OMPOrderedDirective`).
- This results in no longer incorrectly vectorizing the loop when the
`omp simd ordered` directive is present.
Motivation: We'd like to prevent incorrect vectorization of the loops
marked with the `#pragma omp ordered simd` directive which has
previously resulted in miscompiled code.
At the same time, we'd like the usage outside of the `#pragma omp
ordered simd` context to remain unaffected: Note that in the test
"clang/test/OpenMP/ordered_codegen.cpp" we only "lose" the
`!llvm.access.group` metadata in `foo_simd` alone.
This is conservative, in that it's possible some of the loops would be
possible to vectorize, but we prefer to avoid miscompilation of the
loops that are currently illegal to vectorize.
A concrete example follows:
```cpp
// "test.c"
#include <float.h>
#include <math.h>
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int compare_float(float x1, float x2, float scalar) {
const float diff = fabsf(x1 - x2);
x1 = fabsf(x1);
x2 = fabsf(x2);
const float l = (x2 > x1) ? x2 : x1;
if (diff <= l * scalar * FLT_EPSILON)
return 1;
else
return 0;
}
#define ARRAY_SIZE 256
__attribute__((noinline)) void initialization_loop(
float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) {
const float max = 1000.0;
srand(time(NULL));
for (int r = 0; r < ARRAY_SIZE; r++) {
for (int c = 0; c < ARRAY_SIZE; c++) {
X[r][c] = ((float)rand() / (float)(RAND_MAX)) * max;
Y[r][c] = X[r][c];
}
}
}
__attribute__((noinline)) void omp_simd_loop(float X[ARRAY_SIZE][ARRAY_SIZE]) {
for (int r = 1; r < ARRAY_SIZE; ++r) {
for (int c = 1; c < ARRAY_SIZE; ++c) {
#pragma omp simd
for (int k = 2; k < ARRAY_SIZE; ++k) {
#pragma omp ordered simd
X[r][k] = X[r][k - 2] + sinf((float)(r / c));
}
}
}
}
__attribute__((noinline)) int comparison_loop(float X[ARRAY_SIZE][ARRAY_SIZE],
float Y[ARRAY_SIZE][ARRAY_SIZE]) {
int totalErrors_simd = 0;
const float scalar = 1.0;
for (int r = 1; r < ARRAY_SIZE; ++r) {
for (int c = 1; c < ARRAY_SIZE; ++c) {
for (int k = 2; k < ARRAY_SIZE; ++k) {
Y[r][k] = Y[r][k - 2] + sinf((float)(r / c));
}
}
// check row for simd update
for (int k = 0; k < ARRAY_SIZE; ++k) {
if (!compare_float(X[r][k], Y[r][k], scalar)) {
++totalErrors_simd;
}
}
}
return totalErrors_simd;
}
int main(void) {
float X[ARRAY_SIZE][ARRAY_SIZE];
float Y[ARRAY_SIZE][ARRAY_SIZE];
initialization_loop(X, Y);
omp_simd_loop(X);
const int totalErrors_simd = comparison_loop(X, Y);
if (totalErrors_simd) {
fprintf(stdout, "totalErrors_simd: %d \n", totalErrors_simd);
fprintf(stdout, "%s : %d - FAIL: error in ordered simd computation.\n",
__FILE__, __LINE__);
} else {
fprintf(stdout, "Success!\n");
}
return totalErrors_simd;
}
```
Before:
```
$ clang -fopenmp-simd -O3 -ffast-math -lm test.c -o test && ./test
totalErrors_simd: 15408
test.c : 76 - FAIL: error in ordered simd computation.
```
clang 19.1.0: https://godbolt.org/z/6EvhxqEhe
After:
```
$ clang -fopenmp-simd -O3 -ffast-math test.c -o test && ./test
Success!
```
Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski@hpe.com>
As Intel is working to add support for SPIR-V OpenMP device offloading
in upstream clang/liboffload, we need to modify the OpenMP frontend to
allow SPIR-V as well as generate valid IR for SPIR-V. For example, we
need the frontend to generate code to define and interact with device
globals used in the DeviceRTL.
This is the beginning of what I expect will be (many) other changes, but
let's get started with something simple.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
In Continuous instrumentation profiling mode, profile or coverage data
collected via compiler instrumentation is continuously synced to the
profile file. This feature has existed for a while, and is documented
here:
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program
This PR creates a user facing option to enable the feature.
---------
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
This introduces options `-floop-interchange` and `-fno-loop-interchange`
to enable/disable the loop-interchange pass. This is part of the work
that tries to get that pass enabled by default (#124911), where it was
remarked that a user facing option to control this would be convenient
to have. The option name is the same as GCC's.
This is the 2nd part to
https://github.com/llvm/llvm-project/pull/124752. Here we make sure to
set the `DICompositeType` `EnumKind` if the enum was declared with
`__attribute__((enum_extensibility(...)))`. In DWARF this will be
rendered as `DW_AT_APPLE_enum_kind` and will be used by LLDB when
creating `clang::EnumDecl`s from debug-info.
Depends on https://github.com/llvm/llvm-project/pull/126044
Implement HLSLElementwiseCast excluding support for splat cases
Do not support casting types that contain bitfields.
Partly closes#100609 and partly closes#100619
A proposed fix for #95611 [OpenMP][SIMD] ordered has no effect in a loop
SIMD region as of LLVM 18.1.0
Changes:
- Implement new lowering behavior: Conservatively serialize "omp simd"
loops that have `omp simd ordered` directive to prevent incorrect
vectorization (which results in incorrect execution behavior of the
miscompiled program).
Implementation outline:
- We start with the optimistic default initial value of
`LoopStack.setParallel(/Enable=/true);` in
`CodeGenFunction::EmitOMPSimdInit(const OMPLoopDirective &D)`.
- We only disable the loop parallel memory access assumption with `if
(HasOrderedDirective) LoopStack.setParallel(/Enable=/false);` using the
`HasOrderedDirective` (which tests for the presence of an
`OMPOrderedDirective`).
- This results in no longer incorrectly vectorizing the loop when the
`omp simd ordered` directive is present.
Motivation: We'd like to prevent incorrect vectorization of the loops
marked with the `#pragma omp ordered simd` directive which has
previously resulted in miscompiled code.
At the same time, we'd like the usage outside of the `#pragma omp
ordered simd` context to remain unaffected: Note that in the test
"clang/test/OpenMP/ordered_codegen.cpp" we only "lose" the
`!llvm.access.group` metadata in `foo_simd` alone.
This is conservative, in that it's possible some of the loops would be
possible to vectorize, but we prefer to avoid miscompilation of the
loops that are currently illegal to vectorize.
A concrete example follows:
```cpp
// "test.c"
#include <float.h>
#include <math.h>
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int compare_float(float x1, float x2, float scalar) {
const float diff = fabsf(x1 - x2);
x1 = fabsf(x1);
x2 = fabsf(x2);
const float l = (x2 > x1) ? x2 : x1;
if (diff <= l * scalar * FLT_EPSILON)
return 1;
else
return 0;
}
#define ARRAY_SIZE 256
__attribute__((noinline)) void initialization_loop(
float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) {
const float max = 1000.0;
srand(time(NULL));
for (int r = 0; r < ARRAY_SIZE; r++) {
for (int c = 0; c < ARRAY_SIZE; c++) {
X[r][c] = ((float)rand() / (float)(RAND_MAX)) * max;
Y[r][c] = X[r][c];
}
}
}
__attribute__((noinline)) void omp_simd_loop(float X[ARRAY_SIZE][ARRAY_SIZE]) {
for (int r = 1; r < ARRAY_SIZE; ++r) {
for (int c = 1; c < ARRAY_SIZE; ++c) {
#pragma omp simd
for (int k = 2; k < ARRAY_SIZE; ++k) {
#pragma omp ordered simd
X[r][k] = X[r][k - 2] + sinf((float)(r / c));
}
}
}
}
__attribute__((noinline)) int comparison_loop(float X[ARRAY_SIZE][ARRAY_SIZE],
float Y[ARRAY_SIZE][ARRAY_SIZE]) {
int totalErrors_simd = 0;
const float scalar = 1.0;
for (int r = 1; r < ARRAY_SIZE; ++r) {
for (int c = 1; c < ARRAY_SIZE; ++c) {
for (int k = 2; k < ARRAY_SIZE; ++k) {
Y[r][k] = Y[r][k - 2] + sinf((float)(r / c));
}
}
// check row for simd update
for (int k = 0; k < ARRAY_SIZE; ++k) {
if (!compare_float(X[r][k], Y[r][k], scalar)) {
++totalErrors_simd;
}
}
}
return totalErrors_simd;
}
int main(void) {
float X[ARRAY_SIZE][ARRAY_SIZE];
float Y[ARRAY_SIZE][ARRAY_SIZE];
initialization_loop(X, Y);
omp_simd_loop(X);
const int totalErrors_simd = comparison_loop(X, Y);
if (totalErrors_simd) {
fprintf(stdout, "totalErrors_simd: %d \n", totalErrors_simd);
fprintf(stdout, "%s : %d - FAIL: error in ordered simd computation.\n",
__FILE__, __LINE__);
} else {
fprintf(stdout, "Success!\n");
}
return totalErrors_simd;
}
```
Before:
```
$ clang -fopenmp-simd -O3 -ffast-math -lm test.c -o test && ./test
totalErrors_simd: 15408
test.c : 76 - FAIL: error in ordered simd computation.
```
clang 19.1.0: https://godbolt.org/z/6EvhxqEhe
After:
```
$ clang -fopenmp-simd -O3 -ffast-math test.c -o test && ./test
Success!
```
Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski@hpe.com>
Summary:
This patch unifies the existing offloading entires into a single section
called `llvm_offload_entires`. This lets us use a more unified
offloading infrastructure so that all targets share the same handling.
The effect is that people in the runtimes now need to check if the kind
is what they expect, but the expectation is that you can combine
multiple potential providers into a compile job. Doesn't fully work
yet because of other runtime issues, but some day. Mostly this helps the
future of liboffload where we want to handle different languages than
OpenMP.
Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect
calls by comparing a 32-bit hash of the function pointer's type against
a hash of the target function's type. If the hashes do not match, the
kernel may panic (or log the hash check failure, depending on the
kernel's configuration). These hashes are computed at compile time by
applying the xxHash64 algorithm to each mangled canonical function (or
function pointer) type, then truncating the result to 32 bits. This hash
is written into each indirect-callable function header by encoding it as
the 32-bit immediate operand to a `MOVri` instruction, e.g.:
```
__cfi_foo:
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
movl $199571451, %eax # hash of foo's type = 0xBE537FB
foo:
...
```
This PR extends x86-based kCFI with a 3-bit arity indicator encoded in
the `MOVri` instruction's register (reg) field as follows:
| Arity Indicator | Description | Encoding in reg field |
| --------------- | --------------- | --------------- |
| 0 | 0 parameters | EAX |
| 1 | 1 parameter in RDI | ECX |
| 2 | 2 parameters in RDI and RSI | EDX |
| 3 | 3 parameters in RDI, RSI, and RDX | EBX |
| 4 | 4 parameters in RDI, RSI, RDX, and RCX | ESP |
| 5 | 5 parameters in RDI, RSI, RDX, RCX, and R8 | EBP |
| 6 | 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 | ESI |
| 7 | At least one parameter may be passed on the stack | EDI |
For example, if `foo` takes 3 register arguments and no stack arguments
then the `MOVri` instruction in its kCFI header would instead be written
as:
```
movl $199571451, %ebx # hash of foo's type = 0xBE537FB
```
This PR will benefit other CFI approaches that build on kCFI, such as
FineIBT. For example, this proposed enhancement to FineIBT must be able
to infer (at kernel init time) which registers are live at an indirect
call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are
available in the kCFI function header, then this information is trivial
to infer.
Note that there is another existing PR proposal that includes the 3-bit
arity within the existing 32-bit immediate field, which introduces
different security properties:
https://github.com/llvm/llvm-project/pull/117121.