18194 Commits

Author SHA1 Message Date
Saleem Abdulrasool
1901f4ac8e
CodeGen: support static linking for libclosure (#125384)
When building on Windows, dealing with the BlocksRuntime is slightly
more complicated. As we are not guaranteed a formward declaration for
the blocks runtime ABI symbols, we may generate the declarations for
them. In order to properly link against the well-known types, we always
annotated them as `__declspec(dllimport)`. This would require the
dynamic linking of the blocks runtime under all conditions. However,
this is the only the only possible way to us the library. We may be
building a fully sealed (static) executable. In such a case, the well
known symbols should not be marked as `dllimport` as they are assumed to
be statically available with the static linking to the BlocksRuntime.

Introduce a new driver/cc1 option `-static-libclosure` which mirrors the
myriad of similar options (`-static-libgcc`, `-static-libstdc++`,
-static-libsan`, etc).
2025-02-05 15:15:36 -08:00
Daniil Kovalev
84b0c128a7
[PAC] Do not support some values of branch-protection with ptrauth-returns (#125280)
This patch does two things.

1. Previously, when checking driver arguments, we emitted an error for
unsupported values of `-mbranch-protection` when using pauthtest ABI.
The reason for that was ptrauth-returns being enabled as part of
pauthtest. This patch changes the check against pauthtest to a check
against ptrauth-returns.

2. Similarly, check against values of the following function attribute
which are unsupported with ptrauth-returns:
`__attribute__((target("branch-protection=XXX`. Note that existing
`validateBranchProtection` function is used, and current behavior is to
ignore the unsupported attribute value, so no error is emitted.
2025-02-05 11:39:27 +03:00
Sameer Sahasrabuddhe
b85e71b9f2
[llvm] Create() functions for ConvergenceControlInst (#125627) 2025-02-05 11:41:26 +05:30
Bill Wendling
2eb44aa0a9
[Clang][counted-by] Bail out of visitor for LValueToRValue cast (#125571)
An LValueToRValue cast shouldn't be ignored, so bail out of the visitor
if we encounter one.
2025-02-04 11:00:44 -08:00
Chandler Carruth
cd269fee05 [StrTable] Switch Clang builtins to use string tables
This both reapplies #118734, the initial attempt at this, and updates it
significantly.

First, it uses the newly added `StringTable` abstraction for string
tables, and simplifies the construction to build the string table and
info arrays separately. This should reduce any `constexpr` compile time
memory or CPU cost of the original PR while significantly improving the
APIs throughout.

It also restructures the builtins to support sharding across several
independent tables. This accomplishes two improvements from the
original PR:

1) It improves the APIs used significantly.

2) When builtins are defined from different sources (like SVE vs MVE in
   AArch64), this allows each of them to build their own string table
   independently rather than having to merge the string tables and info
   structures.

3) It allows each shard to factor out a common prefix, often cutting the
   size of the strings needed for the builtins by a factor two.

The second point is important both to allow different mechanisms of
construction (for example a `.def` file and a tablegen'ed `.inc` file,
or different tablegen'ed `.inc files), it also simply reduces the sizes
of these tables which is valuable given how large they are in some
cases. The third builds on that size reduction.

Initially, we use this new sharding rather than merging tables in
AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system
works, as without further changes these still push scaling limits.
Subsequent commits will more deeply leverage the new structure,
including using the prefix capabilities which cannot be easily factored
out here and requires deep changes to the targets.
2025-02-04 18:04:57 +00:00
Pranav Kant
e8a486ea97
[clang] Return larger CXX records in memory (#120670)
We incorrectly return CXX records in AVX registers when they should be
returned in memory. This is violation of x86-64 psABI.

Detailed discussion is here:
https://groups.google.com/g/x86-64-abi/c/BjOOyihHuqg/m/KurXdUcWAgAJ
2025-02-04 09:42:12 -08:00
Hans Wennborg
83ff9d4a34 Revert "[Win/X86] Make _m_prefetch[w] builtins to avoid winnt.h conflicts (#115099)"
This broke the build, see buildbot comments on the PR.

This reverts commit ee92122b53c7af26bb766e89e1d30ceb2fd5bb93 and
follow-up 5dccfd9283cd784758aa3d16fcb6e31f135c080f.
2025-02-04 11:19:20 +01:00
Reid Kleckner
ee92122b53
[Win/X86] Make _m_prefetch[w] builtins to avoid winnt.h conflicts (#115099)
This is similar in spirit to previous changes to make _mm_mfence
builtins to avoid conflicts with winnt.h and other MSVC ecosystem
headers that pre-declare compiler intrinsics as extern "C" symbols.

Also update the feature flag for _mm_prefetch to sse, which is more accurate than mmx.

This should fix issue #87515.
2025-02-03 14:05:58 -08:00
Kazu Hirata
cd4e36027f
[CodeGen] Migrate away from PointerUnion::dyn_cast (NFC) (#125456)
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect E to be nonnull.
2025-02-03 12:27:21 -08:00
erichkeane
99a9133a68 [OpenACC] Implement Sema/AST for 'atomic' construct
The atomic construct is a particularly complicated one.  The directive
itself is pretty simple, it has 5 options for the 'atomic-clause'.
However, the associated statement is fairly complicated.

'read' accepts:
  v = x;
'write' accepts:
  x = expr;
'update' (or no clause) accepts:
  x++;
  x--;
  ++x;
  --x;
  x binop= expr;
  x = x binop expr;
  x = expr binop x;

'capture' accepts either a compound statement, or:
  v = x++;
  v = x--;
  v = ++x;
  v = --x;
  v = x binop= expr;
  v = x = x binop expr;
  v = x = expr binop x;

IF 'capture' has a compound statement, it accepts:
  {v = x; x binop= expr; }
  {x binop= expr; v = x; }
  {v = x; x = x binop expr; }
  {v = x; x = expr binop x; }
  {x = x binop expr ;v = x; }
  {x = expr binop x; v = x; }
  {v = x; x = expr; }
  {v = x; x++; }
  {v = x; ++x; }
  {x++; v = x; }
  {++x; v = x; }
  {v = x; x--; }
  {v = x; --x; }
  {x--; v = x; }
  {--x; v = x; }

While these are all quite complicated, there is a significant amount
of similarity between the 'capture' and 'update' lists, so this patch
reuses a lot of the same functions.

This patch implements the entirety of 'atomic', creating a new Sema file
for the sema for it, as it is fairly sizable.
2025-02-03 07:22:22 -08:00
Owen Anderson
8f025f2a93
[clang] Do not emit template parameter objects as COMDATs when they have internal linkage. (#125448)
Per the ELF spec, section groups may only contain local symbols if those
symbols are only
referenced from within the section group. [1] In the case of template
parameter objects,
they can be referenced from outside the group when the type of the
object was declared
in an anonymous namespace. In that case, we can't place the object in a
COMDAT. This matches
GCC's linkage behavior on the test input.

[1]:
https://www.sco.com/developers/gabi/latest/ch4.sheader.html#section_groups
2025-02-03 23:26:22 +13:00
Kazu Hirata
e11e65f08b
[CodeGen] Migrate away from PointerUnion::dyn_cast (NFC) (#125336)
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect E to be nonnull.
2025-02-01 08:13:41 -08:00
Balazs Benics
65708bad57
[clang][CodeGenOpenCL][NFC] Remove redundant map lookups (#125285) 2025-02-01 08:21:15 +01:00
Florian Hahn
77d3f8a925
[TBAA] Don't emit pointer-tbaa for void pointers. (#122116)
While there are no special rules in the standards regarding void
pointers and strict aliasing, emitting distinct tags for void pointers
break some common idioms and there is no good alternative to re-write
the code without strict-aliasing violations. An example is to count the
entries in an array of pointers:

    int count_elements(void * values) {
      void **seq = values;
      int count;
      for (count = 0; seq && seq[count]; count++);
      return count;
    }

https://clang.godbolt.org/z/8dTv51v8W

An example in the wild is from
https://github.com/llvm/llvm-project/issues/119099

This patch avoids emitting distinct tags for void pointers, to avoid
those idioms causing mis-compiles for now.

Fixes https://github.com/llvm/llvm-project/issues/119099.
Fixes https://github.com/llvm/llvm-project/issues/122537.

PR: https://github.com/llvm/llvm-project/pull/122116
2025-01-31 11:38:14 +00:00
Oliver Stannard
97b066f4e9
[ARM] Empty structs are 1-byte for C++ ABI (#124762)
For C++ (but not C), empty structs should be passed to functions as if
they are a 1 byte object with 1 byte alignment.

This is defined in Arm's CPPABI32:
  https://github.com/ARM-software/abi-aa/blob/main/cppabi32/cppabi32.rst
  For the purposes of parameter passing in AAPCS32, a parameter whose
  type is an empty class shall be treated as if its type were an
  aggregate with a single member of type unsigned byte.

The AArch64 equivalent of this has an exception for structs containing
an array of size zero, I've kept that logic for ARM. I've not found a
reason for this exception, but I've checked that GCC does have the same
behaviour for ARM as it does for AArch64.

The AArch64 version has an Apple ABI with different rules, which ignores
empty structs in both C and C++. This is documented at
https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms.
The ARM equivalent of that appears to be AAPCS16_VFP, used for WatchOS,
but I can't find any documentation for that ABI, so I'm not sure what
rules it should follow. For now I've left it following the AArch64 Apple
rules.
2025-01-31 09:03:01 +00:00
David Green
9f1c825fb6
[AArch64] Enable vscale_range with +sme (#124466)
If we have +sme but not +sve, we would not set vscale_range on
functions. It should be valid to apply it with the same range with just
+sme, which can help mitigate some performance regressions in cases such
as scalable vector bitcasts (https://godbolt.org/z/exhe4jd8d).
2025-01-31 07:57:43 +00:00
Bill Wendling
cff0a460ae
[Clang][counted_by] Refactor __builtin_dynamic_object_size on FAMs (#122198)
Refactoring of how __builtin_dynamic_object_size() is calculated for
flexible array members (in preparation for adding support for the
'counted_by' attribute on pointers in structs).

The only functionality change is that we use the already emitted Expr
code to build our calculations off of rather than re-emitting the Expr.
That allows the 'StructFieldAccess' visitor to sift through all casts
and ArraySubscriptExprs to find the first MemberExpr. We build our GEPs
and calculate offsets based off of relative distances from that
MemberExpr.

The testcase passes execution tests.

Calculate the flexible array member's object size using these formulae
(note: if the calculation is negative, we return 0.): 

     struct p;
     struct s {
         /* ... */
         int count;
         struct p *array[] __attribute__((counted_by(count)));
     };   

1) 'ptr->array':

   count = ptr->count;

   flexible_array_member_base_size = sizeof (*ptr->array);
   flexible_array_member_size =
           count * flexible_array_member_base_size;

   if (flexible_array_member_size < 0) 
       return 0;
   return flexible_array_member_size;

2) '&ptr->array[idx]':

   count = ptr->count;
   index = idx; 

   flexible_array_member_base_size = sizeof (*ptr->array);
   flexible_array_member_size =
           count * flexible_array_member_base_size;

   index_size = index * flexible_array_member_base_size;

   if (flexible_array_member_size < 0 || index < 0) 
       return 0;
   return flexible_array_member_size - index_size;

3) '&ptr->field':

   count = ptr->count;
   sizeof_struct = sizeof (struct s);

   flexible_array_member_base_size = sizeof (*ptr->array);
   flexible_array_member_size =
           count * flexible_array_member_base_size;

   field_offset = offsetof (struct s, field);
   offset_diff = sizeof_struct - field_offset;

   if (flexible_array_member_size < 0) 
       return 0;
   return offset_diff + flexible_array_member_size;

4) '&ptr->field_array[idx]':

   count = ptr->count;
   index = idx; 
   sizeof_struct = sizeof (struct s);

   flexible_array_member_base_size = sizeof (*ptr->array);
   flexible_array_member_size =
           count * flexible_array_member_base_size;

   field_base_size = sizeof (*ptr->field_array);
   field_offset = offsetof (struct s, field)
   field_offset += index * field_base_size;

   offset_diff = sizeof_struct - field_offset;

   if (flexible_array_member_size < 0 || index < 0) 
       return 0;
   return offset_diff + flexible_array_member_size;

---------

Signed-off-by: Bill Wendling <morbo@google.com>
2025-01-30 15:36:13 -08:00
Daniel Paoliello
845cc968e9
[clang][llvm][aarch64][win] Add a clang flag and module attribute for import call optimization, and remove LLVM flag (#122831)
Switches import call optimization from being enabled by an LLVM flag to
instead using a module attribute, and creates a new Clang flag that will
set that attribute. This addresses the concern raised in the original
PR:
<https://github.com/llvm/llvm-project/pull/121516#discussion_r1911763991>

This change also only creates the Called Global info if the module
attribute is present, addressing this concern:
<https://github.com/llvm/llvm-project/pull/122762#pullrequestreview-2547595934>
2025-01-30 09:51:43 -08:00
Thurston Dang
9c0606a08b
Reapply "[ubsan] Connect -fsanitize-skip-hot-cutoff to LowerAllowCheckPass<cutoffs>" (#125032) (#125037)
This reverts commit 928cad49beec0120686478f502899222e836b545 i.e.,
relands dccd27112722109d2e2f03e8da9ce8690f06e11b, with a fix to avoid
use-after-scope by changing the lambda to capture by value.
2025-01-30 09:37:16 -08:00
Thurston Dang
928cad49be
Revert "[ubsan] Connect -fsanitize-skip-hot-cutoff to LowerAllowCheckPass<cutoffs>" (#125032)
Reverts llvm/llvm-project#124857 due to buildbot breakage
(https://lab.llvm.org/buildbot/#/builders/46/builds/11310)
2025-01-29 22:03:05 -08:00
Thurston Dang
dccd271127
[ubsan] Connect -fsanitize-skip-hot-cutoff to LowerAllowCheckPass<cutoffs> (#124857)
This adds the plumbing between -fsanitize-skip-hot-cutoff (introduced in
https://github.com/llvm/llvm-project/pull/121619) and
LowerAllowCheckPass<cutoffs> (introduced in
https://github.com/llvm/llvm-project/pull/124211).
    
The net effect is that -fsanitize-skip-hot-cutoff now combines the
functionality of -ubsan-guard-checks and
-lower-allow-check-percentile-cutoff (though this patch does not remove
those yet), and generalizes the latter to allow per-sanitizer cutoffs.
    
Note: this patch replaces Intrinsic::allow_ubsan_check's
SanitizerHandler parameter with SanitizerOrdinal; this is necessary
because the hot cutoffs are specified in terms of SanitizerOrdinal
(e.g., null, alignment), not SanitizerHandler (e.g., TypeMismatch).
Likewise, CodeGenFunction::EmitCheck is changed to emit
allow_ubsan_check() for each individual check.

---------

Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
2025-01-29 21:03:26 -08:00
Jason Rice
abc8812df0
[Clang][P1061] Add stuctured binding packs (#121417)
This is an implementation of P1061 Structure Bindings Introduce a Pack
without the ability to use packs outside of templates. There is a couple
of ways the AST could have been sliced so let me know what you think.
The only part of this change that I am unsure of is the
serialization/deserialization stuff. I followed the implementation of
other Exprs, but I do not really know how it is tested. Thank you for
your time considering this.

---------

Co-authored-by: Yanzuo Liu <zwuis@outlook.com>
2025-01-29 21:43:52 +01:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Oliver Stannard
e9c2e0acd7
[AArch64] Match GCC behaviour for zero-size structs (#124760)
We had a test claiming that this empty struct type consumes a register
slot when passing it to a function with GCC, but that does not appear to
be the case, at least with GCC versions going back to 4.8.

This also caused a miscompilation when passing one of these structs to a
variadic function, but it turned out that our implementation of `va_arg`
matches GCC's ABI, so the one change fixes both bugs.
2025-01-29 15:02:37 +00:00
nerix
381218950e
[clang-cl]: generate debug info when novtable is specified (#124643)
When no vtable is emitted in the debug info because a record was marked
`__declspec(novtable)`, only a forward declaration of that type will be
emitted. This PR fixes that by not omitting the definition for the
`RecordDecl` in this case.

Fixes #124638.
2025-01-28 15:02:33 -08:00
Stephen Tozer
822f74a911
[Clang] Cleanup docs and comments relating to -fextend-variable-liveness (#124767)
This patch contains a number of changes relating to the above flag;
primarily it updates comment references to the old flag names,
"-fextend-lifetimes" and "-fextend-this-ptr" to refer to the new names,
"-fextend-variable-liveness[={all,this}]". These changes are all NFC.

This patch also removes the explicit -fextend-this-ptr-liveness flag
alias, and shortens the help-text for the main flag; these are both
changes that were meant to be applied in the initial PR (#110000), but
due to some user-error on my part they were not included in the merged
commit.
2025-01-28 18:25:32 +00:00
Joseph Huber
13dcc95dcd
[Offload] Rework offloading entry type to be more generic (#124018)
Summary:
The previous offloading entry type did not fit the current use-cases
very well. This widens it and adds a version to prevent further
annoyances. It also includes the kind to better sort who's using it.

The first 64-bytes are reserved as zero so the OpenMP runtime can detect
the old format for binary compatibilitry.
2025-01-28 07:26:13 -06:00
Stephen Tozer
8ad9e1ecb7 [Clang] Fix use of deprecated method and missing triple
Fixes two buildbot errors caused by 4424c44c (#110102):

The first error, seen on some sanitizer bots:
https://lab.llvm.org/buildbot/#/builders/51/builds/9901

The initial commit used the deprecated getDeclaration intrinsic instead
of the non-deprecated getOrInsert- equivalent. This patch trivially
updates the code in question to use the new intrinsic.

The second error, seen on the clang-armv8-quick bot:
https://lab.llvm.org/buildbot/#/builders/154/builds/10983

One of the tests depends on a particular triple to get the exact output
expected by the test, but did not specify this triple; this patch adds
the triple in question.
2025-01-28 13:21:41 +00:00
Wolfgang Pieb
4424c44c8c [Clang] Add fake use emission to Clang with -fextend-lifetimes (#110102)
Following the previous patch which adds the "extend lifetimes" flag
without (almost) any functionality, this patch adds the real feature by
allowing Clang to emit fake uses. These are emitted as a new form of cleanup,
set for variable addresses, which just emits a fake use intrinsic when the
variable falls out of scope. The code for achieving this is simple, with most
of the logic centered on determining whether to emit a fake use for a given
address, and on ensuring that fake uses are ignored in a few cases.

Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
2025-01-28 12:30:31 +00:00
Nikita Popov
1295aa2e81
[Clang] Add -fwrapv-pointer flag (#122486)
GCC supports three flags related to overflow behavior:
 * `-fwrapv`: Makes signed integer overflow well-defined.
 * `-fwrapv-pointer`: Makes pointer overflow well-defined.
* `-fno-strict-overflow`: Implies `-fwrapv -fwrapv-pointer`, making both
signed integer overflow and pointer overflow well-defined.

Clang currently only supports `-fno-strict-overflow` and `-fwrapv`, but
not `-fwrapv-pointer`.

This PR proposes to introduce `-fwrapv-pointer` and adjust the semantics
of `-fwrapv` to match GCC.

This allows signed integer overflow and pointer overflow to be
controlled independently, while `-fno-strict-overflow` still exists to
control both at the same time (and that option is consistent across GCC
and Clang).
2025-01-28 09:57:00 +01:00
Adam Yang
aab25f20f6
[HLSL][SPIRV][DXIL] Implement WaveActiveMax intrinsic (#123428)
```    - add clang builtin to Builtins.td
      - link builtin in hlsl_intrinsics
      - add codegen for spirv intrinsic and two directx intrinsics to retain
        signedness information of the operands in CGBuiltin.cpp
      - add semantic analysis in SemaHLSL.cpp
      - add lowering of spirv intrinsic to spirv backend in
        SPIRVInstructionSelector.cpp
      - add lowering of directx intrinsics to WaveActiveOp dxil op in
    DXIL.td

      - add test cases to illustrate passespendent pr merges.
```
Resolves #99170
2025-01-27 23:26:56 -08:00
Momchil Velikov
f75860f895
[AArch64] Implement NEON FP8 intrinsics for fused multiply-add (#123615)
This patch adds the following intrinsics:

* Fused multiply-add non-indexed

float16x8_t vmlalbq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t,
fpm_t)
float16x8_t vmlaltq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t,
fpm_t)
        
float32x4_t vmlallbbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)
float32x4_t vmlallbtq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)
float32x4_t vmlalltbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)
float32x4_t vmlallttq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)

* Floating-point multiply-add long to half-precision (vector, by
element)

float16x8_t vmlalbq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vmlalbq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vmlaltq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vmlaltq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
    
* Floating-point multiply-add long-long to single-precision (vector, by
element)

float32x4_t vmlallbbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallbbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallbtq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallbtq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlalltbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlalltbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallttq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallttq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
2025-01-28 00:38:44 +00:00
Momchil Velikov
804b81d39f
[AArch64] Add FP8 Neon intrinsics for dot-product (#123613)
This patch adds the following intrinsics:

float16x4_t vdot_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t
vm, fpm_t fpm)
float16x8_t vdotq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, fpm_t fpm)
    
float16x4_t vdot_lane_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x4_t vdot_laneq_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vdotq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vdotq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
    
float32x2_t vdot_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t
vm, fpm_t fpm)
float32x4_t vdotq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, fpm_t fpm)

float32x2_t vdot_lane_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x2_t vdot_laneq_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vdotq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vdotq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
2025-01-27 21:14:16 +00:00
Jeremy Morse
285009f202 [NFC][DebugInfo] Rewrite more call-sites to insert with iterators (#124288)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators. The
call-sites updated in this patch are those where the dyn_cast_or_null cast
utility doesn't compose well with iterator insertion. It can distinguish
between nullptr and a "present" (non-null) Instruction pointer, but not
between a legal and illegal instruction iterator. This can lead to
end-iterator dereferences and thus crashes.

We can improve this in the future (as parent-pointers can now be accessed
from ilist nodes), but for the moment, add explicit tests for end()
iterators at the five call sites affected by this.
2025-01-27 20:30:45 +00:00
cor3ntin
a85b2dc45a
[Clang] only inherit the parent eval context inside of lambdas (#124426)
As we create defaul constructors lazily, we should not inherit from the
parent evaluation context.
However, we need to make an exception for lambdas (in particular their
conversion operators, which are also implicitly defined).

As a drive-by, we introduce a generic way to query whether a function is
a member of a lambda.

This fixes a regression introduced by baf6bd3.

Fixes #118000
2025-01-27 21:30:29 +01:00
Momchil Velikov
99bd2e3f12
[AArch64] Add Neon FP8 conversion intrinsics (#123612)
The patch adds the following intrinsics:

    bfloat16x8_t vcvt1_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    bfloat16x8_t vcvt1_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    bfloat16x8_t vcvt2_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    bfloat16x8_t vcvt2_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
    bfloat16x8_t vcvt1_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    bfloat16x8_t vcvt2_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
    float16x8_t vcvt1_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    float16x8_t vcvt1_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    float16x8_t vcvt2_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    float16x8_t vcvt2_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
    float16x8_t vcvt1_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    float16x8_t vcvt2_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
mfloat8x8_t vcvt_mf8_f32_fpm(float32x4_t vn, float32x4_t vm, fpm_t fpm)
mfloat8x16_t vcvt_high_mf8_f32_fpm(mfloat8x8_t vd, float32x4_t vn,
float32x4_t vm, fpm_t fpm)
    
mfloat8x8_t vcvt_mf8_f16_fpm(float16x4_t vn, float16x4_t vm, fpm_t fpm)
mfloat8x16_t vcvtq_mf8_f16_fpm(float16x8_t vn, float16x8_t vm, fpm_t
fpm)

Co-Authored-By: Caroline Concatto <caroline.concatto@arm.com>
2025-01-27 17:32:47 +00:00
Jeremy Morse
e14962a39c
[NFC][DebugInfo] Use iterators for instruction insertion in more places (#124291)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators.
This patch changes some more complex call-sites, those crossing file
boundaries and where I've had to perform some minor rewrites.
2025-01-27 15:25:17 +00:00
Momchil Velikov
f95a8bde34
[AArch64] Refactor implementation of FP8 types (NFC) (#123604)
- The FP8 scalar type (`__mfp8`) was described as a vector type
- The FP8 vector types were described/assumed to have integer element
type (the element type ought to be `__mfp8`)
- Add support for `m` type specifier (denoting `__mfp8`) in
`DecodeTypeFromStr` and create builtin function prototypes using that
specifier, instead of `int8_t`
2025-01-27 14:31:41 +00:00
Momchil Velikov
87103a016f
[AArch64] Implement NEON FP8 vectors as VectorType (#123603)
Reimplement Neon FP8 vector types using attribute `neon_vector_type`
instead of having them as builtin types.
This allows to implement FP8 Neon intrinsics without the need to add
special cases for these types when using `__builtin_shufflevector`
or bitcast (using C-style cast operator) between vectors, both
extensively used in the generated code in `arm_neon.h`.
2025-01-27 10:41:53 +00:00
Helena Kotas
d92bac8a3e
[HLSL] Introduce address space hlsl_constant(2) for constant buffer declarations (#123411)
Introduces a new address space `hlsl_constant(2)` for constant buffer
declarations.

This address space is applied to declarations inside `cbuffer` block.
Later on, it will also be applied to `ConstantBuffer<T>` syntax and the
default `$Globals` constant buffer.

Clang codegen translates constant buffer declarations to global
variables and loads from `hlsl_constant(2)` address space. More work
coming soon will include addition of metadata that will map these
globals to individual constant buffers and enable their transformation
to appropriate constant buffer load intrinsics later on in an LLVM pass.

Fixes #123406
2025-01-24 16:48:35 -08:00
Jeremy Morse
6292a808b3
[NFC][DebugInfo] Use iterator-flavour getFirstNonPHI at many call-sites (#123737)
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.

This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.

We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).

---------

Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
2025-01-24 13:27:56 +00:00
Jeremy Morse
8e70273509
[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583)
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.

This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.

We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
2025-01-24 10:53:11 +00:00
Phoebe Wang
ee2722fc88
[X86][AVX10.2-BF16] Remove [NE]P from intrinsic and instruction name (#123335)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2025-01-24 15:49:28 +08:00
Kazu Hirata
113e1fdc8c
[CodeGen] Migrate away from PointerUnion::dyn_cast (NFC) (#124076)
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect Pos to be nonnull.
2025-01-23 08:45:26 -08:00
Finn Plummer
0fe8e70c66
Revert "Reland "[HLSL] Implement the reflect HLSL function"" (#124046)
Reverts llvm/llvm-project#123853

The introduction of `reflect-error.ll` surfaced a bug with the use of
`report_fatal_error` in `SPIRVInstructionSelector` that was propagated
into the pr. This has caused a build-bot breakage, and the work to solve
the underlying issue is tracked here:
https://github.com/llvm/llvm-project/issues/124045. We can re-apply this
commit when the underlying issue is resolved.
2025-01-22 18:22:03 -08:00
Ben Langmuir
9fbf5cfebc
[clang][modules] Partially revert 48d0eb518 to fix -gmodules output (#124003)
With the changes in 48d0eb518, the CodeGenOptions used to emit .pcm
files with -fmodule-format=obj (-gmodules) were the ones from the
original invocation, rather than the ones specifically crafted for
outputting the pcm. This was causing the pcm to be written with only the
debug info and without the __clangast section in some cases (e.g. -O2).
This unforunately was not covered by existing tests, because compiling
and loading a module within a single compilation load the ast content
from the in-memory module cache rather than reading it from the pcm file
that was written. This broke bootstrapping a build of clang with modules
enabled on Darwin.

rdar://143418834
2025-01-22 16:24:56 -08:00
Tom Honermann
8fb42300a0
[SYCL] AST support for SYCL kernel entry point functions. (#122379)
A SYCL kernel entry point function is a non-member function or a static
member function declared with the `sycl_kernel_entry_point` attribute.
Such functions define a pattern for an offload kernel entry point
function to be generated to enable execution of a SYCL kernel on a
device. A SYCL library implementation orchestrates the invocation of
these functions with corresponding SYCL kernel arguments in response to
calls to SYCL kernel invocation functions specified by the SYCL 2020
specification.

The offload kernel entry point function (sometimes referred to as the
SYCL kernel caller function) is generated from the SYCL kernel entry
point function by a transformation of the function parameters followed
by a transformation of the function body to replace references to the
original parameters with references to the transformed ones. Exactly how
parameters are transformed will be explained in a future change that
implements non-trivial transformations. For now, it suffices to state
that a given parameter of the SYCL kernel entry point function may be
transformed to multiple parameters of the offload kernel entry point as
needed to satisfy offload kernel argument passing requirements.
Parameters that are decomposed in this way are reconstituted as local
variables in the body of the generated offload kernel entry point
function.

For example, given the following SYCL kernel entry point function
definition:
```
template<typename KernelNameType, typename KernelType>
[[clang::sycl_kernel_entry_point(KernelNameType)]]
void sycl_kernel_entry_point(KernelType kernel) {
  kernel();
}
```

and the following call:
```
struct Kernel {
  int dm1;
  int dm2;
  void operator()() const;
};
Kernel k;
sycl_kernel_entry_point<class kernel_name>(k);
```

the corresponding offload kernel entry point function that is generated
might look as follows (assuming `Kernel` is a type that requires
decomposition):
```
void offload_kernel_entry_point_for_kernel_name(int dm1, int dm2) {
  Kernel kernel{dm1, dm2};
  kernel();
}
```

Other details of the generated offload kernel entry point function, such
as its name and calling convention, are implementation details that need
not be reflected in the AST and may differ across target devices. For
that reason, only the transformation described above is represented in
the AST; other details will be filled in during code generation.

These transformations are represented using new AST nodes introduced
with this change. `OutlinedFunctionDecl` holds a sequence of
`ImplicitParamDecl` nodes and a sequence of statement nodes that
correspond to the transformed parameters and function body.
`SYCLKernelCallStmt` wraps the original function body and associates it
with an `OutlinedFunctionDecl` instance. For the example above, the AST
generated for the `sycl_kernel_entry_point<kernel_name>` specialization
would look as follows:
```
FunctionDecl 'sycl_kernel_entry_point<kernel_name>(Kernel)'
  TemplateArgument type 'kernel_name'
  TemplateArgument type 'Kernel'
  ParmVarDecl kernel 'Kernel'
  SYCLKernelCallStmt
    CompoundStmt
      <original statements>
    OutlinedFunctionDecl
      ImplicitParamDecl 'dm1' 'int'
      ImplicitParamDecl 'dm2' 'int'
      CompoundStmt
        VarDecl 'kernel' 'Kernel'
          <initialization of 'kernel' with 'dm1' and 'dm2'>
        <transformed statements with redirected references of 'kernel'>
```

Any ODR-use of the SYCL kernel entry point function will (with future
changes) suffice for the offload kernel entry point to be emitted. An
actual call to the SYCL kernel entry point function will result in a
call to the function. However, evaluation of a `SYCLKernelCallStmt`
statement is a no-op, so such calls will have no effect other than to
trigger emission of the offload kernel entry point.

Additionally, as a related change inspired by code review feedback,
these changes disallow use of the `sycl_kernel_entry_point` attribute
with functions defined with a _function-try-block_. The SYCL 2020
specification prohibits the use of C++ exceptions in device functions.
Even if exceptions were not prohibited, it is unclear what the semantics
would be for an exception that escapes the SYCL kernel entry point
function; the boundary between host and device code could be an implicit
noexcept boundary that results in program termination if violated, or
the exception could perhaps be propagated to host code via the SYCL
library. Pending support for C++ exceptions in device code and clear
semantics for handling them at the host-device boundary, this change
makes use of the `sycl_kernel_entry_point` attribute with a function
defined with a _function-try-block_ an error.
2025-01-22 16:39:08 -05:00
Deric Cheung
2656928d0c
Reland "[HLSL] Implement the reflect HLSL function" (#123853)
This PR relands
[#122992](https://github.com/llvm/llvm-project/pull/122992).

Some machines were failing to run the `reflect-error.ll` test due to the
RUN lines
```llvm
; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %}
; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %}
```
which failed when `spirv-tools` was not present on the machine due to
running the command `not` without any arguments.

These RUN lines have been removed since they don't actually test
anything new compared to the other two RUN lines due to the expected
error during instruction selection.
```llvm
; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s
; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s
```
2025-01-22 13:29:19 -08:00
Helena Kotas
719f0d9253
[HLSL] Fix global resource initialization (#123394)
Create separate resource initialization function for each resource and
add them to CodeGenModule's `CXXGlobalInits` list.
Fixes #120636 and addresses this [comment
](https://github.com/llvm/llvm-project/pull/119755/files#r1894093603).
2025-01-22 12:39:35 -08:00
Andy Kaylor
7bf188fa99
[NFC] Minor fix to tryEmitAbstract type in EmitCXXNewAllocSize (#123433)
In EmitCXXNewAllocSize, when handling a constant array size, we were
calling tryEmitAbstract with the type of the object being allocated rather
than the expected type of the array size. This worked out because the
allocated type was always a pointer and tryEmitAbstract only ends up
using the size of the type to extend or truncate the constant, and in this
case the destination type should be size_t, which is usually the same
width as the pointer. This change fixes the type, but it makes no
functional difference with the current constant emitter implementation.
2025-01-22 10:11:53 -08:00