548631 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
ffe4870472
[flang][cuda] Add interfaces for __float2int_rX and __float2unit_rX (#153691) 2025-08-14 23:11:45 +00:00
LLVM GN Syncbot
47bc6acf86 [gn build] Port d56fa965243b 2025-08-14 22:56:30 +00:00
Valentin Clement (バレンタイン クレメン)
602f308d4f
[flang][cuda] Add interface for __saturatef (#153705) 2025-08-14 15:55:17 -07:00
Stanislav Mekhanoshin
a629119c75
[AMDGPU] Remove wave64 functions (#153690)
gfx1250 only supports wave32.
2025-08-14 15:54:33 -07:00
Valentin Clement (バレンタイン クレメン)
2775c79c4f
[flang][cuda] Add interfaces for __float2ll_rX (#153702) 2025-08-14 15:44:52 -07:00
joaosaffran
d56fa96524
[DirectX] Add Range Overlap validation (#152229)
As part of the Root Signature Spec, we need to validate if Root
Signatures are not defining overlapping ranges.
Closes: https://github.com/llvm/llvm-project/issues/126645

---------

Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com>
Co-authored-by: Joao Saffran <jderezende@microsoft.com>
2025-08-14 18:40:11 -04:00
Valentin Clement (バレンタイン クレメン)
ca9ddd54b7
[flang][cuda] Add interfaces for __ll2float_rX (#153694) 2025-08-14 15:35:02 -07:00
Daniel Paoliello
fc4df2c917
[win][arm64ec] XFAIL x64 intrinsic tests on Arm64EC (#153474)
Clang defines the x64 preprocessor macro (`__x86_64__`) when building
Arm64EC, however the tests for x64 built-ins and intrinsics are
currently failing since the relevant functions don't exist, resulting in
errors like:

```
Line 165: invalid conversion between vector type '__v2di' (vector of 2 'long long' values) and integer type 'int' of different size
```

(Clang doesn't know the intrinsics being called, and so treats it like
an undefined function, which makes it assume the return type is `int`)

For now, expect these tests to fail until someone decides to implement
these intrinsics.
2025-08-14 15:29:20 -07:00
Stanislav Mekhanoshin
57c1e01e48
[AMDGPU] Don't allow wgp mode on gfx1250 (#153680)
- gfx1250 only supports cu mode
2025-08-14 15:16:56 -07:00
Andy Kaylor
a1529cd85a
[CIR] Add index support for global_view (#153254)
The #cir.global_view attribute was initially added without support for
the optional index list. This change adds index list support. This is
used when the address of an array or structure member is used as an
initializer.

This patch does not include support for taking the address of a
structure or class member. That will be added later.
2025-08-14 15:14:12 -07:00
Valentin Clement (バレンタイン クレメン)
df15c0d716
[flang][cuda] Add interfaces for __dsqrt_rn and __dsqrt_rz (#153624) 2025-08-14 22:08:33 +00:00
DeanSturtevant1
cb2f0d0a5f
[bazel] Fix mlir/BUILD.bazel for VectorToXeGPU. (#153696) 2025-08-14 15:03:41 -07:00
Craig Topper
defbbf0129
[RISCV][MoveMerge] Don't copy kill flag when moving past an instruction that reads the register. (#153644)
If we're moving the second copy before another instruction that reads
the copied register, we need to clear the kill flag on the combined
move.

Fixes #153598.
2025-08-14 14:52:54 -07:00
Valentin Clement (バレンタイン クレメン)
b989c7c2e0
[flang][cuda] Add interfaces for __drcp_rX (#153681) 2025-08-14 21:44:47 +00:00
DeanSturtevant1
4e63d704e8
Fix mlir/BUILD.bazel for XeGPUUtils. (#153689) 2025-08-14 14:32:18 -07:00
Alex Bradbury
db5f7dc374 Revert "[SLP]Support LShr as base for copyable elements"
This reverts commit ca4ebf95172d24f8c47655709b2c9eb85bda5cb2.

Causes compile-time crashes for some inputs with RVV zvl512b/zvl1024b
configurations. See here for a minimal reproducer:
https://github.com/llvm/llvm-project/pull/153393#issuecomment-3189898813
2025-08-14 22:18:24 +01:00
Valentin Clement (バレンタイン クレメン)
06590444f5
[flang][cuda] Add bind names for __double2ull_rX interfaces (#153678) 2025-08-14 21:10:20 +00:00
David Green
5836bae463
[AArch64] Change the cost of fma and fmuladd to match fmul. (#152963)
As fmul and fmadd are so similar, their performance characteristics tend
to be the same on most platforms, at least in terms of reciprocal
throughputs. Processors capable of performing a given number of fmul per
cycle can usually perform the same number of fma, with the extra add
being relatively simple on top. This patch makes the scores of the two
operations the same, which brings the throughput cost of a fma/fmuladd
to 2, and the latency to 3, which are the defaults for fmul.

Note that we might also want to change the throughput cost of a fmul to
1, as most processors have ample bandwidth for them, but they should
still stay in-line with one another.
2025-08-14 21:53:45 +01:00
Morris Hafner
e56ae9651b
[CIR][NFC] Add Symbol Table to CIRGenFunction (#153625)
This patchs adds a symbol table to CIRGenFunction plus scopes and
insertions to the table where we were missing them previously.
2025-08-14 22:53:09 +02:00
Bill Wendling
1e9fc8edd0 [Clang][attr] Add '-std=c11' to allow for typedef redefinition 2025-08-14 13:51:58 -07:00
Zhaoxuan Jiang
76dd742f7b
[CGData] Lazy loading support for stable function map (#151660)
The stable function map could be huge for a large application. Fully
loading it is slow and consumes a significant amount of memory, which is
unnecessary and drastically slows down compilation especially for
non-LTO and distributed-ThinLTO setups. This patch introduces an opt-in
lazy loading support for the stable function map. The detailed changes
are:

- `StableFunctionMap`
- The map now stores entries in an `EntryStorage` struct, which includes
offsets for serialized entries and a `std::once_flag` for thread-safe
lazy loading.
- The underlying map type is changed from `DenseMap` to
`std::unordered_map` for compatibility with `std::once_flag`.
- `contains()`, `size()` and `at()` are implemented to only load
requested entries on demand.

- Lazy Loading Mechanism
- When reading indexed codegen data, if the newly-introduced
`-indexed-codegen-data-lazy-loading` flag is set, the stable function
map is not fully deserialized up front. The binary format for the stable
function map now includes offsets and sizes to support lazy loading.
- The safety of lazy loading is guarded by the once flag per function
hash. This guarantees that even in a multi-threaded environment, the
deserialization for a given function hash will happen exactly once. The
first thread to request it performs the load, and subsequent threads
will wait for it to complete before using the data. For single-threaded
builds, the overhead is negligible (a single check on the once flag).
For multi-threaded scenarios, users can omit the flag to retain the
previous eager-loading behavior.
2025-08-14 13:49:09 -07:00
Valentin Clement (バレンタイン クレメン)
bad3df4764
[flang][cuda] Add bind names for __double2ll_rX interfaces (#153660) 2025-08-14 13:34:25 -07:00
Jonas Devlieghere
52c9489d1d
[lldb] Use the Python limited API with SWIG 4.2 or later (#153119) (#153472)
Use the Python limited API when building with SWIG 4.2 or later.
2025-08-14 15:28:02 -05:00
Florian Hahn
8a0c7e9b32
[LV] Regenerate some more tests. 2025-08-14 21:21:03 +01:00
Stanislav Mekhanoshin
6b316ecb5f
[AMDGPU] Encode NV bit in VIMAGE/VSAMPLE. NFC (#153654)
This is NFC as this target does not have it.
2025-08-14 13:19:38 -07:00
Erich Keane
e5e3e4bdb5
[OpenACC] Add firstprivate recipe helper methods to ACC dialect (#153604)
Like we did for the 'private' clause, this adds an easier to use helper
function to add the 'firstprivate' clause + recipe to the Parallel and
Serial ops.
2025-08-14 13:07:59 -07:00
Bill Wendling
aa4805a090
[Clang][attr] Add 'cfi_salt' attribute (#141846)
The 'cfi_salt' attribute specifies a string literal that is used as a
"salt" for Control-Flow Integrity (CFI) checks to distinguish between
functions with the same type signature. This attribute can be applied
to function declarations, function definitions, and function pointer
typedefs.

This attribute prevents function pointers from being replaced with
pointers to functions that have a compatible type, which can be a CFI
bypass vector.

The attribute affects type compatibility during compilation and CFI
hash generation during code generation.

  Attribute syntax: [[clang::cfi_salt("<salt_string>")]]
  GNU-style syntax: __attribute__((cfi_salt("<salt_string>")))

- The attribute takes a single string of non-NULL ASCII characters.
- It only applies to function types; using it on a non-function type
  will generate an error.
- All function declarations and the function definition must include
  the attribute and use identical salt values.

Example usage:

  // Header file:
  #define __cfi_salt(S) __attribute__((cfi_salt(S)))

  // Convenient typedefs to avoid nested declarator syntax.
  typedef int (*fp_unsalted_t)(void);
  typedef int (*fp_salted_t)(void) __cfi_salt("pepper");

  struct widget_ops {
    fp_unsalted_t init;     // Regular CFI.
    fp_salted_t exec;       // Salted CFI.
    fp_unsalted_t teardown; // Regular CFI.
  };

  // bar.c file:
  static int bar_init(void) { ... }
  static int bar_salted_exec(void) __cfi_salt("pepper") { ... }
  static int bar_teardown(void) { ... }

  static struct widget_generator _generator = {
    .init = bar_init,
    .exec = bar_salted_exec,
    .teardown = bar_teardown,
  };

  struct widget_generator *widget_gen = _generator;

  // 2nd .c file:
  int generate_a_widget(void) {
    int ret;

    // Called with non-salted CFI.
    ret = widget_gen.init();
    if (ret)
      return ret;

    // Called with salted CFI.
    ret = widget_gen.exec();
    if (ret)
      return ret;

    // Called with non-salted CFI.
    return widget_gen.teardown();
  }

Link: https://github.com/ClangBuiltLinux/linux/issues/1736
Link: https://github.com/KSPP/linux/issues/365

---------

Signed-off-by: Bill Wendling <morbo@google.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
2025-08-14 13:07:38 -07:00
CatherineMoore
5479b7ed42
[OpenMP] Update printf stmt in kmp_settings.cpp (#152800)
Remove extraneous argument from printf statement

---------

Co-authored-by: Joachim <protze@rz.rwth-aachen.de>
2025-08-14 20:04:03 +00:00
Stanislav Mekhanoshin
49f2093477
[AMDGPU] Increase LDS to 320K on gfx1250 (#153645) 2025-08-14 12:52:00 -07:00
Michael Berg
334a046a3c
[LoopDist] Consider reads and writes together for runtime checks (#145623)
Emit safety guards for ptr accesses when cross partition loads exist
which have a corresponding store to the same address in a different
partition. This will emit the necessary ptr checks for these accesses.

The test case was obtained from SuperTest, which SiFive runs regularly.
We enabled LoopDistribution by default in our downstream compiler, this
change was part of that enablement.
2025-08-14 12:50:17 -07:00
Matheus Izvekov
eeada0d30f
[clang] fix source range computation for DeducedTemplateSpecializationType (#153646)
This was a regression introduced in
https://github.com/llvm/llvm-project/pull/147835

Since this regression was never released, there are no release notes.

Fixes https://github.com/llvm/llvm-project/issues/153540
2025-08-14 16:42:34 -03:00
Mircea Trofin
a508ea2ad7
Add dependency on ProfileData from ScalarOpts (#153651)
Fixing buildbot failures after PR #153305, e.g.
https://lab.llvm.org/buildbot/#/builders/203/builds/19861

Analysis already depends on `ProfileData`, so the transitive closure of
the dependencies of `ScalarOpts` doesn't change.

Also avoided an extra dependency (and very unnecessary) on
`Instrumentation`. The API previously used doesn't need to live in
Instrumentation to begin with, but that's something to address in a
follow-up.
2025-08-14 12:37:17 -07:00
Abhinav Gaba
2912c9c249
[NFC][Offload] Add missing maps to OpenMP offloading tests. (#153103)
A few tests were only mapping a pointee, like: `map(pp[0][0])`, on an
`int** pp`, but expecting the pointers, like `pp`, `pp[0]` to also be
mapped, which is incorrect.

This change fixes six such tests.
2025-08-14 12:22:28 -07:00
Erick Velez
4f007041a8
[clang-doc] place HTML/JSON output inside their own directories (#150655)
Instead of just outputting everything into the designated root folder,
HTML and JSON output will be placed in html/ and json/ directories.
2025-08-14 12:21:40 -07:00
Kaitlin Peng
cbfc22c06b
Fix typo in step intrinsic comment (#153642)
`y` should be the first argument and `x` should be the second, otherwise
the formula is wrong. This also matches the documentation
[here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-step).
2025-08-14 12:02:34 -07:00
Mircea Trofin
016c301d30
[NFC] Use [[maybe_unused]] for variable used in assertion (#153639) 2025-08-14 18:52:56 +00:00
Jonas Devlieghere
b62b65a95f
[lldb] Use (only) PyImport_AppendInittab to patch readline (#153329)
The current implementation tries to (1) patch the existing readline
module definition if it's already present in the inittab and (2) append
our patched readline module to the inittab. The former (1) uses the
non-stable Python API and I can't find a situation where this is
necessary. 

We do this work before initialization, so for the readline
module to exist, it either needs to be added by Python itself (which
doesn't seem to be the case), or someone would have had to have added it
without initializing.
2025-08-14 13:47:48 -05:00
Valentin Clement (バレンタイン クレメン)
20a829937c
[flang][cuda] Add interfaces for __expf and __exp10f (#153633) 2025-08-14 11:36:55 -07:00
Florian Hahn
db98ac43ec
[LV] Use shl for ((VF * Step) * vscale) in createStepForVF. (#153495)
Directly emit shl instead of a multiply if VF * Step is a power-of-2. The
main motivation here is to prepare the code and test for directly
generating and expanding a SCEV expression of the minimum iteration
count. SCEVExpander will directly emit shl for multiplies with
powers-of-2.

InstCombine will also performs this combine, so end-to-end this should
effectively by NFC.

PR: https://github.com/llvm/llvm-project/pull/153495
2025-08-14 19:27:51 +01:00
Jianhui Li
98728d9dc8
[MLIR][XeGPU] Add lowering from transfer_read/transfer_write to load_gather/store_scatter (#152429)
Lowering transfer_read/transfer_write to load_gather/store_scatter in
case the target uArch doesn't support load_nd/store_nd. The high level
steps:
  1. compute Strides;
  2. compute Offsets;
  3. collapseMemrefTo1D;
  4. create Load gather or store_scatter op
2025-08-14 11:27:07 -07:00
Thurston Dang
37cc010b91
[asan] Fix-forward undefined type in test from #153142 (#153636)
Fix Mac build breakage (reported by aeubanks in
https://github.com/llvm/llvm-project/pull/153142#issuecomment-3189202274)
by including stdint.h and using uintptr_t
2025-08-14 11:20:11 -07:00
Mircea Trofin
f5d284309f
[JTS] Propagate profile info (#153305)
If the indirect call target being recognized as a jump table has profile info, we can accurately synthesize the branch weights of the switch that replaces the indirect call.

Otherwise we insert the "unknown" `MD_prof` to indicate this is the best we can do here.

Part of Issue #147390
2025-08-14 11:17:57 -07:00
Min-Yih Hsu
c202d2f515
[IA][RISCV] Recognizing gap masks assembled from bitwise AND (#153324)
For a deinterleaved masked.load / vp.load, if it's mask, `%c`, is
synthesized by the following snippet:
```
%m = shufflevector %s, poison, <0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3>
%g = <1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0>
%c = and %m, %g
```
Then we can know that `%g` is the gap mask and `%s` is the mask for each
field / component. This patch teaches InterleaveAccess pass to recognize
such patterns
2025-08-14 11:17:50 -07:00
Florian Hahn
ff0ce74be8
[VPlan] Replace scalar preheader with VPIRBB at single place (NFC).
Replace the scalar preheader VPBB with an VPIRBB wrapping the IR basic
block created by createVectorizedLoopSkeleton.
2025-08-14 19:11:34 +01:00
Valentin Clement (バレンタイン クレメン)
e27e4f3a99
[flang][cuda] Add interfaces for __uint2float_rX functions (#153620)
Also add bind name for __uint2double_rn
2025-08-14 18:05:37 +00:00
Iris Shi
dc0becc4d0
[CIR] Add InlineAsmOp lowering to LLVM (#153387)
- Part of #153267

Added support for lowering `InlineAsmOp` directly to LLVM IR

---------
Co-authored-by: Morris Hafner <mhafner@nvidia.com>
2025-08-14 17:48:14 +00:00
Jonas Devlieghere
ac0ad5093a
[lldb] Use PyThread_get_thread_ident instead of accessing PyThreadState (#153460)
Use `PyThread_get_thread_ident`, which is part of the Stable API,
instead of accessing a member of the PyThreadState, which is opaque when
using the Stable API.
2025-08-14 12:41:49 -05:00
Leandro Lupori
91418ecbde
Revert "[lldb] refactor PlatformAndroid and make threadsafe" (#153626)
Reverts llvm/llvm-project#145382

This broke a couple of buildbots.
2025-08-14 14:36:50 -03:00
Iris Shi
9a28783f5d
[CIR] Add InlineAsmOp (#153362)
- Part of #153267

---------

Co-authored-by: Andy Kaylor <akaylor@nvidia.com>
Co-authored-by: Morris Hafner <mmha@users.noreply.github.com>
2025-08-14 17:34:38 +00:00
Valentin Clement (バレンタイン クレメン)
efce767a88
[flang][cuda] Add interfaces for __ull2float_rX functions (#153613) 2025-08-14 10:28:17 -07:00