18194 Commits

Author SHA1 Message Date
Helena Kotas
776cddacb1
[HLSL] Implement default constant buffer $Globals (#125807)
All variable declarations in the global scope that are not resources,
static or empty are implicitly added to implicit constant buffer
`$Globals`. They are created in `hlsl_constant` address space and
collected in an implicit `HLSLBufferDecl` node that is added to the AST
at the end of the translation unit. Codegen is the same as for explicit
constant buffers.

Fixes #123801
2025-02-20 17:27:53 -08:00
Helena Kotas
19af8581d5
[HLSL] Constant Buffers CodeGen (#124886)
Translates `cbuffer` declaration blocks to `target("dx.CBuffer")` type. Creates global variables in `hlsl_constant` address space for all `cbuffer` constant and adds metadata describing which global constant belongs to which constant buffer. For explicit constant buffer layout information an explicit layout type `target("dx.Layout")` is used. This might change in the future.

The constant globals are temporary and will be removed in upcoming pass that will translate `load` instructions in the `hlsl_constant` address space to constant buffer load intrinsics calls off a CBV handle (#124630, #112992).

See [Constant buffer design
doc](https://github.com/llvm/wg-hlsl/pull/94) for more details.

Fixes #113514, #106596
2025-02-20 10:32:14 -08:00
Bruno De Fraine
458b1e9d2c
[TBAA] Refine pointer-tbaa for void pointers by pointer depth (#126047)
Commit 77d3f8a avoids distinct tags for any pointers where the ultimate
pointee type is `void`, to solve breakage in real-world code that uses
(indirections to) `void*` for polymorphism over different pointer types.

While this matches the TBAA implementation in GCC, this patch implements
a refinement that distinguishes void pointers by pointer depth, as
described in the "strict aliasing" documentation included in the
aforementioned commit:
> `void*` is permitted to alias any pointer type, `void**` is permitted
> to alias any pointer to pointer type, and so on.

For example, `void**` is no longer considered to alias `int*` in this
refinement, but it remains possible to use `void**` for polymorphism
over pointers to pointers.
2025-02-20 16:06:56 +04:00
Benjamin Maxwell
d595fc91ae
Revert "[clang] Lower modf builtin using llvm.modf intrinsic" (#127987)
Reverts llvm/llvm-project#126750

Revering while I investigate:
https://lab.llvm.org/buildbot/#/builders/72/builds/8406
2025-02-20 10:25:18 +00:00
Anshil Gandhi
95000fdb9e
[CUDA] Increment VTable index for device thunks (#124989)
Currently, the clang frontend incorrectly emits the callee instead of
the thunk for the callee in the VTable. This is the case because the
thunk index is not incremented when their callees cannot be emitted.
This patch fixes the bug.
2025-02-19 23:47:22 -05:00
Sebastian Jodłowski
0127f169dc
[CUDA] Add support for sm101 and sm120 target architectures (#127187)
Add support for sm101 and sm120 target architectures. It requires CUDA
12.8.

---------

Co-authored-by: Sebastian Jodlowski <sjodlowski@nuro.ai>
2025-02-19 14:41:07 -08:00
Deric Cheung
1c762c288f
[HLSL] Implement the 'and' HLSL function (#127098)
Addresses #125604 

- Implements `and` as an HLSL builtin function
- The `and` HLSL builtin function gets lowered to the the LLVM `and`
instruction
2025-02-19 11:22:46 -08:00
Jakub Ficek
fda0e63e73
[clang] handle fp options in __builtin_convertvector (#125522)
This patch allows using fpfeatures pragmas with __builtin_convertvector:
- added TrailingObjects with FPOptionsOverride and methods for handling
it to ConvertVectorExpr
- added support for codegen, node dumping, and serialization of
fpfeatures contained in ConvertVectorExpr
2025-02-19 09:03:18 -08:00
Benjamin Maxwell
d804c83893
[clang] Lower modf builtin using llvm.modf intrinsic (#126750)
This updates the existing `modf[f|l]` builtin to be lowered via the
`llvm.modf.*` intrinsic (rather than directly to a library call).
2025-02-19 14:49:22 +00:00
Benjamin Maxwell
7781e1040d
[clang] Lower non-builtin sincos[f|l] calls to llvm.sincos.* when -fno-math-errno is set (#121763)
This will allow vectorizing these calls (after a few more patches). This
should not change the codegen for targets that enable the use of AA
during the codegen (in `TargetSubtargetInfo::useAA()`). This includes
targets such as AArch64. This notably does not include x86 but can be
worked around by passing `-mllvm -combiner-global-alias-analysis=true`
to clang.

Follow up to #114086.
2025-02-19 10:13:41 +00:00
Fabian Ritter
029c8e783d
[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.

For SWDEV-512631
2025-02-19 10:11:48 +01:00
Krzysztof Drewniak
f7d03707d1
[AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (#126828)
Attempting to pass a `ptr addrspace(7)` to functions that take `ptr`
arguments produces undesirable `addrspacecast(addrspacecast(p8 x to p7)
to p0) => addrspacecast(p8 x to p0)` folds. This results in illegal GEP
operations on buffer resources, which can't be GEP'd. (However, note
that, while unimplemneted, addressspacecast from ptr addrspace(7) to ptr
is legal - it's just an effective address computation)

To resolve this problem, and thus prevent illegal
`getelementptr T, ptr addrspace(8) %x, ...` s from being produces, this
commit extends amdgcn.make.buffer.rsrc to also be variadic in its result
type, auto-upgrading old manglings.

The logic for handling a make.buffer.rsrc in instruction selection
remains untouched and expects the output type to be a ptr addrspace(8),
as does the Clang lowering for its builtin (the pointer-to-pointer
version might want a different name in clang). LowerBufferFatPointers
has been updated to lower
amdgcn.make.buffer.rsrc.p7.p* to amdgcn.make.buffer.rsrc.p8.p* .

This'll also make exposing buffer fat pointers in Clang easier, since
you don't have to cast between a `__amdgcn_rsrc_t` and a pointer.
2025-02-18 14:15:28 -06:00
Akash Banerjee
785a5b4676
[MLIR][OpenMP] Add LLVM translation support for OpenMP UserDefinedMappers (#124746)
This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare
Mapper directive.

Since both MLIR and Clang now support custom mappers, I've changed the
respective function params to no longer be optional as well.

Depends on #121005
2025-02-18 17:55:48 +00:00
Alex Voicu
a7a356833d
[NFC][Clang][CodeGen] Remove vestigial assertion (#127528)
This removes a vestigial assertion, which would erroneously trigger even
though we now correctly handle valid arg mismatches
(<2dda529838/clang/lib/CodeGen/CGCall.cpp (L5397)>),
after #114062 went in.
2025-02-17 22:05:22 +02:00
Akira Hatanaka
f5c5bc5ed5
[CodeGen][ObjC] Invalidate cached ObjC class layout information after parsing ObjC class implementations if new ivars are added to the interface (#126591)
The layout and the size of an ObjC interface can change after its
corresponding implementation is parsed when synthesized ivars or ivars
declared in categories are added to the interface's list of ivars. This
can cause clang to mis-compile if the optimization that emits fixed
offsets for ivars (see 923ddf65f4e21ec67018cf56e823895de18d83bc) uses an
ObjC class layout that is outdated and no longer reflects the current
state of the class.

For example, when compiling `constant-non-fragile-ivar-offset.m`, clang
emits 20 instead of 24 as the offset for `IntermediateClass2Property` as
the class layout for `SuperClass2`, which is created when the
implementation of IntermediateClass3 is parsed, is outdated when the
implementation of `IntermediateClass2` is parsed.

This commit invalidates the stale layout information of the class and
its subclasses if new ivars are added to the interface.

With this change, we can also stop using ObjC implementation decls as
the key to retrieve ObjC class layouts information as the layout
retrieved using the ObjC interface as the key will always be up to date.

rdar://139531391
2025-02-17 11:50:44 -08:00
Chris B
761d422441
[HLSL] Implement HLSL intialization list support (#123141)
This PR implements HLSL's initialization list behvaior as specified in
the draft language specifcation under

[*Decl.Init.Agg*](https://microsoft.github.io/hlsl-specs/specs/hlsl.html#Decl.Init.Agg).

This behavior is a bit unusual for C/C++ because intermediate braces in
initializer lists are ignored and a whole array of additional
conversions occur unintuitively to how initializaiton works in C.

The implementaiton in this PR generates a valid C/C++ initialization
list AST for the HLSL initializer so that there are no changes required
to Clang's CodeGen to support this. This design will also allow us to
use Clang's rewrite to convert HLSL initializers to valid C/C++
initializers that are equivalent. It does have the downside that it will
generate often redundant accesses during codegen. The IR optimizer is
extremely good at eliminating those so this will have no impact on the
final executable performance.

There is some opportunity for optimizing the initializer list generation
that we could consider in subsequent commits. One notable opportunity
would be to identify aggregate objects that occur in the same place in
both initializers and do not require converison, those aggregates could
be initialized as aggregates rather than fully scalarized.

Closes #56067

---------

Co-authored-by: Finn Plummer <50529406+inbelic@users.noreply.github.com>
Co-authored-by: Helena Kotas <hekotas@microsoft.com>
Co-authored-by: Justin Bogner <mail@justinbogner.com>
2025-02-15 13:21:36 -06:00
Sarah Spall
4d2d0afcee
[HLSL] Implement HLSL Aggregate splatting (#118992)
Implement HLSL Aggregate Splat casting that handles splatting for arrays
and structs, and vectors if splatting from a vec1.
Closes #100609 and Closes #100619 
Depends on #118842
2025-02-14 09:25:24 -08:00
Florian Hahn
50d10b5c1c
[Clang] Add __builtin_assume_dereferenceable to encode deref assumption. (#121789)
This patch adds a new __builtin_assume_dereferenceable to encode
dereferenceability of a pointer using llvm.assume with an operand
bundle.

For now the builtin only accepts constant sizes, I am planning to drop
this restriction in a follow-up change.

This can be used to better optimize cases where a pointer is known to be
dereferenceable, e.g. unconditionally loading from p2 when vectorizing
the loop.

    int *get_ptr();

    void foo(int* src, int x) {
      int *p2 = get_ptr();
      __builtin_assume_aligned(p2, 4);
      __builtin_assume_dereferenceable(p2, 4000);
      for (unsigned I = 0; I != 1000; ++I) {
        int x = src[I];
        if (x == 0)
          x = p2[I];
	 src[I] = x;
      }
    }


PR: https://github.com/llvm/llvm-project/pull/121789
2025-02-14 12:44:20 +01:00
Alex Voicu
39ec9de7c2
[clang][CodeGen] sret args should always point to the alloca AS, so use that (#114062)
`sret` arguments are always going to reside in the stack/`alloca`
address space, which makes the current formulation where their AS is
derived from the pointee somewhat quaint. This patch ensures that `sret`
ends up pointing to the `alloca` AS in IR function signatures, and also
guards agains trying to pass a casted `alloca`d pointer to a `sret` arg,
which can happen for most languages, when compiled for targets that have
a non-zero `alloca` AS (e.g. AMDGCN) / map `LangAS::default` to a
non-zero value (SPIR-V). A target could still choose to do something
different here, by e.g. overriding `classifyReturnType` behaviour.

In a broader sense, this patch extends non-aliased indirect args to also
carry an AS, which leads to changing the `getIndirect()` interface. At
the moment we're only using this for (indirect) returns, but it allows
for future handling of indirect args themselves. We default to using the
AllocaAS as that matches what Clang is currently doing, however if, in
the future, a target would opt for e.g. placing indirect returns in some
other storage, with another AS, this will require revisiting.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2025-02-14 11:20:45 +00:00
schittir
8a0914c245
[clang][NFC] Avoid potential null dereferences (#127017)
Add null checking.
2025-02-13 21:14:36 -08:00
Ulrich Weigand
adacbf68eb [SystemZ] Add codegen support for llvm.roundeven
This is straightforward as we already had all the necessary
instructions, they simply were not wired up.

Also allows implementing the vec_round intrinsic via the
standard llvm.roundeven IR instead of a platform intrinsic now.
2025-02-14 00:10:37 +01:00
Zahira Ammarguellat
cf69b4c668
[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#126927)
This patch was reviewed and approved here:
https://github.com/llvm/llvm-project/pull/119891
However it has been reverted here:
083df25dc2
due to a build issue here:
https://lab.llvm.org/buildbot/#/builders/51/builds/10694

This patch is reintroducing the support.
2025-02-13 07:14:36 -05:00
Harald van Dijk
1083ec647f
[reland][DebugInfo] Update DIBuilder insertion to take InsertPosition (#126967)
After #124287 updated several functions to return iterators rather than
Instruction *, it was no longer straightforward to pass their result to
DIBuilder. This commit updates DIBuilder methods to accept an
InsertPosition instead, so that they can be called with an iterator
(preferred), or with a deprecation warning an Instruction *, or a
BasicBlock *. This commit also updates the existing calls to the
DIBuilder methods to pass in iterators.

As a special exception, DIBuilder::insertDeclare() keeps a separate
overload accepting a BasicBlock *InsertAtEnd. This is because despite
the name, this method does not insert at the end of the block, therefore
this cannot be handled implicitly by using InsertPosition.
2025-02-13 10:46:42 +00:00
S. Bharadwaj Yadavalli
f2650c54c9
[DirectX] Set Shader Flag DisableOptimizations (#126813)
- Set the shader flag `DisableOptimizations` based on `optnone`
attribute of shader entry functions.

- Add DXIL Metadata Analysis pass as pre-requisite for Shader Flags pass
to obtain entry function information collected therein.

- Named module metadata `dx.disable_optimizations` is intended to
indicate disabling optimizations (`-O0`) via commandline flag. However,
its intent is fulfilled by `optnone` attribute of shader entry functions as 
implemented in a recent change, and thus not needed. Delete
generation of named metadata and corresponding test file
`disable_opt.ll`.

- Add tests to verify correctness of setting shader flag.

Closes #112263
2025-02-12 16:45:01 -05:00
Peter Rong
53c618c071
[clang] run clang-format on some CGObjC files (#126644)
These files are relatively old and don't confront our formatting rules.
It's hard to change them without massive clang-format changes.

---------

Signed-off-by: Peter Rong <PeterRong@meta.com>
2025-02-12 11:52:49 -08:00
Harald van Dijk
23209eb1d9 Revert "[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059)"
This reverts commit 3ec9f7494b31f2fe51d5ed0e07adcf4b7199def6.
2025-02-12 17:50:39 +00:00
Harald van Dijk
3ec9f7494b
[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059)
After #124287 updated several functions to return iterators rather than
Instruction *, it was no longer straightforward to pass their result to
DIBuilder. This commit updates DIBuilder methods to accept an
InsertPosition instead, so that they can be called with an iterator
(preferred), or with a deprecation warning an Instruction *, or a
BasicBlock *. This commit also updates the existing calls to the
DIBuilder methods to pass in iterators.
2025-02-12 17:38:59 +00:00
Nick Sarnie
cb3498c670
[OpenMP][OpenMPIRBuilder] Support SPIR-V device variant matches (#126801)
We should be able to use `spirv64` as a device variant match and it
should be considered a GPU.

Also add the triple to an RTTI check.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-02-12 16:40:05 +00:00
Alex MacLean
a282b6c486
[NVPTX] Convert scalar function nvvm.annotations to attributes (#125908)
Replace some more nvvm.annotations with function attributes,
auto-upgrading the annotations as needed. These new attributes will be
more idiomatic and compile-time efficient than the annotations.

- !"maxclusterrank" / !"cluster_max_blocks" -> "nvvm.maxclusterrank"
- !"minctasm" -> "nvvm.minctasm"
- !"maxnreg" -> "nvvm.maxnreg"
2025-02-12 07:33:22 -08:00
Matt
a1826b4d26
[OpenMP][SIMD][FIX] Use conservative "omp simd ordered" lowering (#126172)
A proposed fix for the issue #95611, [OpenMP][SIMD] ordered has no
effect in a loop SIMD region as of LLVM 18.1.0

Changes:

- Implement new lowering behavior: Conservatively serialize "omp simd"
loops that have `omp simd ordered` directive to prevent incorrect
vectorization (which results in incorrect execution behavior of the
miscompiled program).

Implementation outline:

- We start with the optimistic default initial value of
`LoopStack.setParallel(/Enable=/true);` in
`CodeGenFunction::EmitOMPSimdInit(const OMPLoopDirective &D)`.
- We only disable the loop parallel memory access assumption with `if
(HasOrderedDirective) LoopStack.setParallel(/Enable=/false);` using the
`HasOrderedDirective` (which tests for the presence of an
`OMPOrderedDirective`).
- This results in no longer incorrectly vectorizing the loop when the
`omp simd ordered` directive is present.

Motivation: We'd like to prevent incorrect vectorization of the loops
marked with the `#pragma omp ordered simd` directive which has
previously resulted in miscompiled code.

At the same time, we'd like the usage outside of the `#pragma omp
ordered simd` context to remain unaffected: Note that in the test
"clang/test/OpenMP/ordered_codegen.cpp" we only "lose" the
`!llvm.access.group` metadata in `foo_simd` alone.

This is conservative, in that it's possible some of the loops would be
possible to vectorize, but we prefer to avoid miscompilation of the
loops that are currently illegal to vectorize.

A concrete example follows:

```cpp
// "test.c"
#include <float.h>
#include <math.h>
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int compare_float(float x1, float x2, float scalar) {
    const float diff = fabsf(x1 - x2);
    x1 = fabsf(x1);
    x2 = fabsf(x2);
    const float l = (x2 > x1) ? x2 : x1;
    if (diff <= l * scalar * FLT_EPSILON)
        return 1;
    else
        return 0;
}

#define ARRAY_SIZE 256

__attribute__((noinline)) void initialization_loop(
    float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) {
    const float max = 1000.0;
    srand(time(NULL));
    for (int r = 0; r < ARRAY_SIZE; r++) {
        for (int c = 0; c < ARRAY_SIZE; c++) {
            X[r][c] = ((float)rand() / (float)(RAND_MAX)) * max;
            Y[r][c] = X[r][c];
        }
    }
}

__attribute__((noinline)) void omp_simd_loop(float X[ARRAY_SIZE][ARRAY_SIZE]) {
    for (int r = 1; r < ARRAY_SIZE; ++r) {
        for (int c = 1; c < ARRAY_SIZE; ++c) {
#pragma omp simd
            for (int k = 2; k < ARRAY_SIZE; ++k) {
#pragma omp ordered simd
                X[r][k] = X[r][k - 2] + sinf((float)(r / c));
            }
        }
    }
}

__attribute__((noinline)) int comparison_loop(float X[ARRAY_SIZE][ARRAY_SIZE],
                                              float Y[ARRAY_SIZE][ARRAY_SIZE]) {
    int totalErrors_simd = 0;
    const float scalar = 1.0;
    for (int r = 1; r < ARRAY_SIZE; ++r) {
        for (int c = 1; c < ARRAY_SIZE; ++c) {
            for (int k = 2; k < ARRAY_SIZE; ++k) {
                Y[r][k] = Y[r][k - 2] + sinf((float)(r / c));
            }
        }
        // check row for simd update
        for (int k = 0; k < ARRAY_SIZE; ++k) {
            if (!compare_float(X[r][k], Y[r][k], scalar)) {
                ++totalErrors_simd;
            }
        }
    }
    return totalErrors_simd;
}

int main(void) {
    float X[ARRAY_SIZE][ARRAY_SIZE];
    float Y[ARRAY_SIZE][ARRAY_SIZE];

    initialization_loop(X, Y);
    omp_simd_loop(X);
    const int totalErrors_simd = comparison_loop(X, Y);

    if (totalErrors_simd) {
        fprintf(stdout, "totalErrors_simd: %d \n", totalErrors_simd);
        fprintf(stdout, "%s : %d - FAIL: error in ordered simd computation.\n",
                __FILE__, __LINE__);
    } else {
        fprintf(stdout, "Success!\n");
    }

    return totalErrors_simd;
}
```

Before:

```
$ clang -fopenmp-simd -O3 -ffast-math -lm test.c -o test && ./test
totalErrors_simd: 15408
test.c : 76 - FAIL: error in ordered simd computation.
```

clang 19.1.0: https://godbolt.org/z/6EvhxqEhe

After:

```
$ clang -fopenmp-simd -O3 -ffast-math test.c -o test && ./test
Success!
```

Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski@hpe.com>
2025-02-12 08:53:47 -05:00
Kazu Hirata
67e1e98811 Revert "[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#119891)"
This reverts commit 070f84ebc89b11df616a83a56df9ac56efbab783.

Buildbot failure:
https://lab.llvm.org/buildbot/#/builders/51/builds/10694
2025-02-11 12:39:01 -08:00
Zahira Ammarguellat
070f84ebc8
[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#119891)
Implement basic parsing and semantic support for `#pragma omp stripe`
constuct introduced in
https://www.openmp.org/wp-content/uploads/[OpenMP-API-Specification-6-0.pdf](https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-6-0.pdf),
section 11.7.
2025-02-11 13:58:21 -05:00
S. Bharadwaj Yadavalli
b92bab3c01
[HLSL] Appropriately set function attribute optnone (#125937)
When optimization is disabled, set `optnone` attribute all module entry
functions.

Updated test in accordance with the change.

Closes #124796
2025-02-11 12:29:05 -05:00
Kazu Hirata
8e4e144931
[CodeGen] Avoid repeated hash lookups (NFC) (#126672) 2025-02-11 09:06:40 -08:00
Nick Sarnie
f3cd223838
[OpenMP][OpenMPIRBuilder] Add initial changes for SPIR-V target frontend support (#125920)
As Intel is working to add support for SPIR-V OpenMP device offloading
in upstream clang/liboffload, we need to modify the OpenMP frontend to
allow SPIR-V as well as generate valid IR for SPIR-V. For example, we
need the frontend to generate code to define and interact with device
globals used in the DeviceRTL.

This is the beginning of what I expect will be (many) other changes, but
let's get started with something simple.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-02-10 16:16:40 +00:00
Wael Yehia
8e61aae4a8
[profile] Add a clang option -fprofile-continuous that enables continuous instrumentation profiling mode (#124353)
In Continuous instrumentation profiling mode, profile or coverage data
collected via compiler instrumentation is continuously synced to the
profile file. This feature has existed for a while, and is documented
here:

https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program
This PR creates a user facing option to enable the feature.

---------

Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
2025-02-08 17:25:07 -05:00
Mats Jun Larsen
a07928c3ce
[CodeGen][Hexagon] Replace PointerType::getUnqual(Type) with opaque version (NFC) (#126274)
Follow-up to https://github.com/llvm/llvm-project/issues/123569

The obsolete bitcasts on the LoadInsts are also removed.
2025-02-08 15:13:23 +00:00
Mats Jun Larsen
e0fee55a55
[CodeGen] Replace of PointerType::get(Type) with opaque version (NFC) (#124771)
Follow-up to https://github.com/llvm/llvm-project/issues/123569
2025-02-08 15:13:02 +00:00
Mats Jun Larsen
df2e8ee7ae
[CodeGen][AArch64] Replace PointerType::getUnqual(Type) with opaque version (NFC) (#126278)
Follow-up to #123569
2025-02-08 13:23:08 +00:00
Mats Jun Larsen
54e0c2bbe2
[CodeGen][SystemZ] Replace PointerType::getUnqual(Type) with opaque version (NFC) (#126280)
Follow-up to #126278
2025-02-08 13:22:53 +00:00
Mats Jun Larsen
4e29148cca
[CodeGen][XCore] Replace PointerType::getUnqual(Type) with opaque version (NFC) (#126279)
Follow-up to #123569
2025-02-08 13:22:42 +00:00
Sarah Spall
3f8e280206
[HLSL] Implement HLSL Elementwise casting (excluding splat cases); Re-land #118842 (#126258)
Implement HLSLElementwiseCast excluding support for splat cases
Do not support casting types that contain bitfields.
Partly closes https://github.com/llvm/llvm-project/issues/100609 and
partly closes https://github.com/llvm/llvm-project/issues/100619
Re-land #118842 after fixing warning as an error, found by a buildbot.
2025-02-07 09:12:55 -08:00
Sjoerd Meijer
612df14c00
[Clang][Driver] Add an option to control loop-interchange (#125830)
This introduces options `-floop-interchange` and `-fno-loop-interchange`
to enable/disable the loop-interchange pass. This is part of the work
that tries to get that pass enabled by default (#124911), where it was
remarked that a user facing option to control this would be convenient
to have. The option name is the same as GCC's.
2025-02-07 10:31:24 +00:00
Michael Buch
e00fc80c19
[clang][DebugInfo] Set EnumKind based on enum_extensibility attribute (#126045)
This is the 2nd part to
https://github.com/llvm/llvm-project/pull/124752. Here we make sure to
set the `DICompositeType` `EnumKind` if the enum was declared with
`__attribute__((enum_extensibility(...)))`. In DWARF this will be
rendered as `DW_AT_APPLE_enum_kind` and will be used by LLDB when
creating `clang::EnumDecl`s from debug-info.
 
Depends on https://github.com/llvm/llvm-project/pull/126044
2025-02-07 09:28:10 +00:00
Sarah Spall
14716f2e4b
Revert "[HLSL] Implement HLSL Flat casting (excluding splat cases)" (#126149)
Reverts llvm/llvm-project#118842
2025-02-06 15:25:20 -08:00
Sarah Spall
01072e546f
[HLSL] Implement HLSL Flat casting (excluding splat cases) (#118842)
Implement HLSLElementwiseCast excluding support for splat cases
Do not support casting types that contain bitfields.
Partly closes #100609 and partly closes #100619
2025-02-06 14:38:01 -08:00
Alexey Bataev
3041dd5c20
Revert "[OpenMP][SIMD][FIX] Use conservative "omp simd ordered" lowering" (#126079)
Reverts llvm/llvm-project#123867 to fix the test failures
https://lab.llvm.org/buildbot/#/builders/144/builds/17521
2025-02-06 10:04:11 -05:00
Matt
60d8e6f528
[OpenMP][SIMD][FIX] Use conservative "omp simd ordered" lowering (#123867)
A proposed fix for #95611 [OpenMP][SIMD] ordered has no effect in a loop
SIMD region as of LLVM 18.1.0

Changes:

- Implement new lowering behavior: Conservatively serialize "omp simd"
loops that have `omp simd ordered` directive to prevent incorrect
vectorization (which results in incorrect execution behavior of the
miscompiled program).

Implementation outline:

- We start with the optimistic default initial value of
`LoopStack.setParallel(/Enable=/true);` in
`CodeGenFunction::EmitOMPSimdInit(const OMPLoopDirective &D)`.
- We only disable the loop parallel memory access assumption with `if
(HasOrderedDirective) LoopStack.setParallel(/Enable=/false);` using the
`HasOrderedDirective` (which tests for the presence of an
`OMPOrderedDirective`).
- This results in no longer incorrectly vectorizing the loop when the
`omp simd ordered` directive is present.

Motivation: We'd like to prevent incorrect vectorization of the loops
marked with the `#pragma omp ordered simd` directive which has
previously resulted in miscompiled code.

At the same time, we'd like the usage outside of the `#pragma omp
ordered simd` context to remain unaffected: Note that in the test
"clang/test/OpenMP/ordered_codegen.cpp" we only "lose" the
`!llvm.access.group` metadata in `foo_simd` alone.

This is conservative, in that it's possible some of the loops would be
possible to vectorize, but we prefer to avoid miscompilation of the
loops that are currently illegal to vectorize.

A concrete example follows:

```cpp
// "test.c"
#include <float.h>
#include <math.h>
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int compare_float(float x1, float x2, float scalar) {
    const float diff = fabsf(x1 - x2);
    x1 = fabsf(x1);
    x2 = fabsf(x2);
    const float l = (x2 > x1) ? x2 : x1;
    if (diff <= l * scalar * FLT_EPSILON)
        return 1;
    else
        return 0;
}

#define ARRAY_SIZE 256

__attribute__((noinline)) void initialization_loop(
    float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) {
    const float max = 1000.0;
    srand(time(NULL));
    for (int r = 0; r < ARRAY_SIZE; r++) {
        for (int c = 0; c < ARRAY_SIZE; c++) {
            X[r][c] = ((float)rand() / (float)(RAND_MAX)) * max;
            Y[r][c] = X[r][c];
        }
    }
}

__attribute__((noinline)) void omp_simd_loop(float X[ARRAY_SIZE][ARRAY_SIZE]) {
    for (int r = 1; r < ARRAY_SIZE; ++r) {
        for (int c = 1; c < ARRAY_SIZE; ++c) {
#pragma omp simd
            for (int k = 2; k < ARRAY_SIZE; ++k) {
#pragma omp ordered simd
                X[r][k] = X[r][k - 2] + sinf((float)(r / c));
            }
        }
    }
}

__attribute__((noinline)) int comparison_loop(float X[ARRAY_SIZE][ARRAY_SIZE],
                                              float Y[ARRAY_SIZE][ARRAY_SIZE]) {
    int totalErrors_simd = 0;
    const float scalar = 1.0;
    for (int r = 1; r < ARRAY_SIZE; ++r) {
        for (int c = 1; c < ARRAY_SIZE; ++c) {
            for (int k = 2; k < ARRAY_SIZE; ++k) {
                Y[r][k] = Y[r][k - 2] + sinf((float)(r / c));
            }
        }
        // check row for simd update
        for (int k = 0; k < ARRAY_SIZE; ++k) {
            if (!compare_float(X[r][k], Y[r][k], scalar)) {
                ++totalErrors_simd;
            }
        }
    }
    return totalErrors_simd;
}

int main(void) {
    float X[ARRAY_SIZE][ARRAY_SIZE];
    float Y[ARRAY_SIZE][ARRAY_SIZE];

    initialization_loop(X, Y);
    omp_simd_loop(X);
    const int totalErrors_simd = comparison_loop(X, Y);

    if (totalErrors_simd) {
        fprintf(stdout, "totalErrors_simd: %d \n", totalErrors_simd);
        fprintf(stdout, "%s : %d - FAIL: error in ordered simd computation.\n",
                __FILE__, __LINE__);
    } else {
        fprintf(stdout, "Success!\n");
    }

    return totalErrors_simd;
}
```

Before:

```
$ clang -fopenmp-simd -O3 -ffast-math -lm test.c -o test && ./test
totalErrors_simd: 15408
test.c : 76 - FAIL: error in ordered simd computation.
```

clang 19.1.0: https://godbolt.org/z/6EvhxqEhe

After:

```
$ clang -fopenmp-simd -O3 -ffast-math test.c -o test && ./test
Success!
```

Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski@hpe.com>
2025-02-06 09:44:11 -05:00
Joseph Huber
f1e917d07b
[Offload] Unify offloading entries into a single section (#125731)
Summary:
This patch unifies the existing offloading entires into a single section
called `llvm_offload_entires`. This lets us use a more unified
offloading infrastructure so that all targets share the same handling.
The effect is that people in the runtimes now need to check if the kind
is what they expect, but the expectation is that you can combine
multiple potential providers into a compile job. Doesn't fully work
yet because of other runtime issues, but some day. Mostly this helps the
future of liboffload where we want to handle different languages than
OpenMP.
2025-02-06 08:24:01 -06:00
Scott Constable
e223485c9b
[X86] Extend kCFI with a 3-bit arity indicator (#121070)
Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect
calls by comparing a 32-bit hash of the function pointer's type against
a hash of the target function's type. If the hashes do not match, the
kernel may panic (or log the hash check failure, depending on the
kernel's configuration). These hashes are computed at compile time by
applying the xxHash64 algorithm to each mangled canonical function (or
function pointer) type, then truncating the result to 32 bits. This hash
is written into each indirect-callable function header by encoding it as
the 32-bit immediate operand to a `MOVri` instruction, e.g.:
```
__cfi_foo:
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	movl	$199571451, %eax                # hash of foo's type = 0xBE537FB
foo:
        ...
```

This PR extends x86-based kCFI with a 3-bit arity indicator encoded in
the `MOVri` instruction's register (reg) field as follows:

| Arity Indicator | Description | Encoding in reg field |
| --------------- | --------------- | --------------- |
| 0 | 0 parameters | EAX |
| 1 | 1 parameter in RDI | ECX |
| 2 | 2 parameters in RDI and RSI | EDX |
| 3 | 3 parameters in RDI, RSI, and RDX | EBX |
| 4 | 4 parameters in RDI, RSI, RDX, and RCX | ESP |
| 5 | 5 parameters in RDI, RSI, RDX, RCX, and R8 | EBP |
| 6 | 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 | ESI |
| 7 | At least one parameter may be passed on the stack | EDI |

For example, if `foo` takes 3 register arguments and no stack arguments
then the `MOVri` instruction in its kCFI header would instead be written
as:
```
	movl	$199571451, %ebx                # hash of foo's type = 0xBE537FB
```

This PR will benefit other CFI approaches that build on kCFI, such as
FineIBT. For example, this proposed enhancement to FineIBT must be able
to infer (at kernel init time) which registers are live at an indirect
call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are
available in the kCFI function header, then this information is trivial
to infer.

Note that there is another existing PR proposal that includes the 3-bit
arity within the existing 32-bit immediate field, which introduces
different security properties:
https://github.com/llvm/llvm-project/pull/117121.
2025-02-06 10:54:22 +08:00