549144 Commits

Author SHA1 Message Date
erichkeane
d0dc3799b7 [OpenACC][NFCI] Add AST Infrastructure for reduction recipes
This patch does the bare minimum to start setting up the reduction
recipe support, including adding a type to the AST to store it. No real
additional work is done, and a bunch of static_asserts are left around
to allow us to do this properly.
2025-08-19 07:58:11 -07:00
Frederik Harwath
50a3368f22
[Clang] Take libstdc++ into account during GCC detection (#145056)
The Generic_GCC::GCCInstallationDetector class picks the GCC
installation directory with the largest version number. Since the
location of the libstdc++ include directories is tied to the GCC
version, this can break C++ compilation if the libstdc++ headers for
this particular GCC version are not available. Linux distributions tend
to package the libstdc++ headers separately from GCC. This frequently
leads to situations in which a newer version of GCC gets installed as a
dependency of another package without installing the corresponding
libstdc++ package. Clang then fails to compile C++ code because it
cannot find the libstdc++ headers. Since libstdc++ headers are in fact
installed on the system, the GCC installation continues to work, the
user may not be aware of the details of the GCC detection, and the
compiler does not recognize the situation and emit a warning, this
behavior can be hard to understand - as witnessed by many related bug
reports over the years.

The goal of this work is to change the GCC detection to prefer GCC
installations that contain libstdc++ include directories over those
which do not. This should happen regardless of the input language since
picking different GCC installations for a build that mixes C and C++
might lead to incompatibilities.
Any change to the GCC installation detection will probably have a
negative impact on some users. For instance, for a C user who relies on
using the GCC installation with the largest version number, it might
become necessary to use the --gcc-install-dir option to ensure that this
GCC version is selected.
This seems like an acceptable trade-off given that the situation for
users who do not have any special demands on the particular GCC
installation directory would be improved significantly.
 
This patch does not yet change the automatic GCC installation directory
choice. Instead, it does introduce a warning that informs the user about
the future change if the chosen GCC installation directory differs from
the one that would be chosen if the libstdc++ headers are taken into
account.

See also this related Discourse discussion:
https://discourse.llvm.org/t/rfc-take-libstdc-into-account-during-gcc-detection/86992.
2025-08-19 16:55:45 +02:00
Simon Pilgrim
355b747acd
[Headers][X86] Enable constexpr handling for pmulhw/pmulhuw avx512 mask/maskz intrinsics (#154341)
Followup to #152524 / #152540 - allow the predicated variants to be used in constexpr as well
2025-08-19 15:55:16 +01:00
Ross Brunton
2c11a83691
[Offload] Add olCalculateOptimalOccupancy (#142950)
This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is
currently
only implemented on Cuda; AMDGPU and Host return unsupported.

---------

Co-authored-by: Callum Fare <callum@codeplay.com>
2025-08-19 15:16:47 +01:00
Kazu Hirata
2c4f0e7ac6
[mlir] Replace SmallSet with SmallPtrSet (NFC) (#154265)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 30 instances that rely on this "redirection".  Since the
redirection doesn't improve readability, this patch replaces SmallSet
with SmallPtrSet for pointer element types.

I'm planning to remove the redirection eventually.
2025-08-19 07:11:47 -07:00
Kazu Hirata
136b541304
[clang] Replace SmallSet with SmallPtrSet (NFC) (#154262)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 30 instances that rely on this "redirection", with about
half of them under clang/.  Since the redirection doesn't improve
readability, this patch replaces SmallSet with SmallPtrSet for pointer
element types.

I'm planning to remove the redirection eventually.
2025-08-19 07:11:39 -07:00
Timm Baeder
965b7c2bfc
[clang][bytecode] Implement ia32_pmul* builtins (#154315) 2025-08-19 16:05:20 +02:00
Matt Arsenault
ed0e531044
AMDGPU: Use Register type for isStackAccess (#154320) 2025-08-19 23:00:45 +09:00
Younan Zhang
2b32ad1316
[Clang] Only remove lambda scope after computing evaluation context (#154106)
The immediate evaluation context needs the lambda scope info to
propagate some flags, however that LSI was removed in
ActOnFinishFunctionBody which happened before rebuilding a lambda
expression.

This also converts the wrapper function to default arguments as a
drive-by fix.

Fixes https://github.com/llvm/llvm-project/issues/145776
2025-08-19 21:59:23 +08:00
Charles Zablit
f55dc0824e
[lldb][windows] use Windows APIs to print to the console (#149493)
This patch uses the Windows APIs to print to the Windows Console,
through `llvm::raw_fd_ostream`.

This fixes a rendering issue where the characters defined in
`DiagnosticsRendering.cpp` (`"╰"` for instance) are not rendered
properly on Windows out of the box, because the default codepage is not
`utf-8`.

This solution is based on [this patch
downstream](https://github.com/swiftlang/swift/pull/40632/files#diff-e948e4bd7a601e3ca82d596058ccb39326459a4751470eec4d393adeaf516977R37-R38).

rdar://156064500
2025-08-19 14:52:29 +01:00
Simon Pilgrim
1359f72a03
[X86] canCreateUndefOrPoisonForTargetNode - add X86ISD::MOVMSK (#154321)
MOVMSK nodes don't create undef/poison when extracting the signbits from the source operand
2025-08-19 14:40:42 +01:00
Krzysztof Parzyszek
42350f428d
[flang][OpenMP] Parse GROUPPRIVATE directive (#153807)
No semantic checks or lowering yet.
2025-08-19 08:32:43 -05:00
Krzysztof Parzyszek
292faf6133
[Frontend][OpenMP] Add definition of groupprivate directive (#153799)
This is the common point for clang and flang implementations.
2025-08-19 08:27:29 -05:00
Alexandre Ganea
13391ce183
On Windows, in the release build script, fix detecting if clang-cl is in PATH (#149597)
The checks for detecting if `clang-cl` and `lld-link` are in `%PATH`
were wrong.

This fixes the comment in
https://github.com/llvm/llvm-project/pull/135446#discussion_r2215511129
2025-08-19 13:13:51 +00:00
Erich Keane
dab8c88f15
[OpenACC] Implement 'firstprivate' clause copy lowering (#154150)
This patch is the last of the 'firstprivate' clause lowering patches. It
takes the already generated 'copy' init from Sema and uses it to
generate the IR for the copy section of the recipe.

However, one thing that this patch had to do, was come up with a way to
hijack the decl registration in CIRGenFunction. Because these decls are
being created in a 'different' place, we need to remove the things we've
added. We could alternatively generate these 'differently', but it seems
worth a little extra effort here to avoid having to re-implement
variable initialization.
2025-08-19 06:02:10 -07:00
Aiden Grossman
01f2d70222
[X86] Remove unused variable from Atom Scheduling Model (#154191)
Related to #154180.
2025-08-19 06:00:18 -07:00
Orlando Cazalet-Hyams
da45b6c71d
[RemoveDIs][NFC] Remove dbg intrinsic version of calculateFragmentIntersect (#153378) 2025-08-19 13:44:25 +01:00
Yang Bai
b4c31dc98d
[mlir][Vector] add vector.insert canonicalization pattern to convert a chain of insertions to vector.from_elements (#142944)
## Description

This change introduces a new canonicalization pattern for the MLIR
Vector dialect that optimizes chains of insertions. The optimization
identifies when a vector is **completely** initialized through a series
of vector.insert operations and replaces the entire chain with a
single `vector.from_elements` operation.

Please be aware that the new pattern **doesn't** work for poison vectors
where only **some** elements are set, as MLIR doesn't support partial
poison vectors for now.

**New Pattern: InsertChainFullyInitialized**

* Detects chains of vector.insert operations.
* Validates that all insertions are at static positions, and all
intermediate insertions have only one use.
* Ensures the entire vector is **completely** initialized.
* Replaces the entire chain with a
single vector.from_elementts operation.

**Refactored Helper Function**

* Extracted `calculateInsertPosition` from
`foldDenseElementsAttrDestInsertOp` to avoid code duplication.

## Example

```
// Before:
%v1 = vector.insert %c10, %v0[0] : i64 into vector<2xi64>
%v2 = vector.insert %c20, %v1[1] : i64 into vector<2xi64>

// After:
%v2 = vector.from_elements %c10, %c20 : vector<2xi64>
```

It also works for multidimensional vectors.

```
// Before:
%v1 = vector.insert %cv0, %v0[0] : vector<3xi64> into vector<2x3xi64>
%v2 = vector.insert %cv1, %v1[1] : vector<3xi64> into vector<2x3xi64>

// After:
%0:3 = vector.to_elements %arg1 : vector<3xi64>
%1:3 = vector.to_elements %arg2 : vector<3xi64>
%v2 = vector.from_elements %0#0, %0#1, %0#2, %1#0, %1#1, %1#2 : vector<2x3xi64>
```

---------

Co-authored-by: Yang Bai <yangb@nvidia.com>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
2025-08-19 13:43:31 +01:00
Krzysztof Parzyszek
c65c0e87fc
[flang][OpenMP] Break up CheckThreadprivateOrDeclareTargetVar, NFC (#153809)
Extract the visitors into separate functions to make the code more
readable. Join each message string into a single line.
2025-08-19 07:36:25 -05:00
Jie Fu
81f1b46cc6 [AArch64] Silent an unused-variable warning (NFC)
/llvm-project/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp:1042:11:
 error: unused variable 'TRI' [-Werror,-Wunused-variable]
    auto *TRI = MBB.getParent()->getSubtarget().getRegisterInfo();
          ^
1 error generated.
2025-08-19 20:32:33 +08:00
Simon Pilgrim
62d6c10b24 [X86] Add test showing the failure to fold FREEZE(MOVMSK(X)) -> MOVMSK(FREEZE(X)) 2025-08-19 13:25:36 +01:00
Leandro Lupori
ddb36a8102
[flang] Preserve dynamic length of characters in ALLOCATE (#152564)
Fixes #151895
2025-08-19 09:25:08 -03:00
Florian Hahn
1217c8226b
[LoopIdiom] Add test for simplifying SCEV during expansion with flags. 2025-08-19 13:22:45 +01:00
Utkarsh Saxena
92a91f71ee
[LifetimeSafety] Improve Origin information in debug output (#153951)
The previous debug output only showed numeric IDs for origins, making it
difficult to understand what each origin represented. This change makes
the debug output more informative by showing what kind of entity each
origin refers to (declaration or expression) and additional details like
declaration names or expression class names. This improved output makes
it easier to debug and understand the lifetime safety analysis.
2025-08-19 12:06:10 +00:00
Mehdi Amini
dc82b2cc70
Revert "[MLIR][WASM] Extending the Wasm binary to WasmSSA dialect importer" (#154314)
Reverts llvm/llvm-project#154053

Seems like an endianness sensitivity failing a big-endian bot.
2025-08-19 14:05:09 +02:00
Rafal Bielski
9c9d9e4cb6
[Offload] Define additional device info properties (#152533)
Add the following properties in Offload device info:
* VENDOR_ID
* NUM_COMPUTE_UNITS
* [SINGLE|DOUBLE|HALF]_FP_CONFIG
* NATIVE_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF]
* MAX_CLOCK_FREQUENCY
* MEMORY_CLOCK_RATE
* ADDRESS_BITS
* MAX_MEM_ALLOC_SIZE
* GLOBAL_MEM_SIZE

Add a bitfield option to enumerators, allowing the values to be
bit-shifted instead of incremented. Generate the per-type enums using
`foreach` to reduce code duplication.

Use macros in unit test definitions to reduce code duplication.
2025-08-19 13:02:01 +01:00
Simon Pilgrim
fcb36ca8cc
[DAG] visitTRUNCATE - merge the trunc(abd) and trunc(avg) handling which are almost identical (#154301)
CC @houngkoungting
2025-08-19 12:59:39 +01:00
Luc Forget
df57bb8c49
[MLIR][WASM] Extending the Wasm binary to WasmSSA dialect importer (#154053)
This is the continuation of  #152131 

This PR adds support for parsing the global initializer and function
body, and support for decoding scalar numerical instructions and
variable related instructions.

---------

Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire@woven-planet.global>
Co-authored-by: Jessica Paquette <jessica.paquette@woven-planet.global>
Co-authored-by: Luc Forget <luc.forget@woven.toyota>
2025-08-19 13:42:47 +02:00
Timm Baeder
2f011ea37a
[clang][bytecode][NFC] Remove unused Program::relocs (#154308) 2025-08-19 13:29:56 +02:00
Dan Blackwell
2e9494ff96
[ASan] Re-enable duplicate_os_log_reports test and include cstdlib for malloc (#153195)
rdar://62141527
2025-08-19 12:12:48 +01:00
David Green
22b4021f01 [AArch64][GlobalISel] Add additional vecreduce.fadd and fadd 0.0 tests. NFC 2025-08-19 11:52:50 +01:00
Stephen Tozer
5cedb01487 [Debugify] Fix compile error in tracking coverage build
Forward-fixes a compile error in bc216b057d (#150212) in specific build
configurations, due to a missing const_cast.
2025-08-19 11:18:42 +01:00
Nuno Lopes
d0029b87d8
remove UB from test [NFC] 2025-08-19 11:18:27 +01:00
yronglin
57bf5dd7a0
[libc++][tuple.apply] Implement P2255R2 make_from_tuple part. (#152867)
Implement  P2255R2 tuple.apply part wording for `std::make_from_tuple`.
```
Mandates: If tuple_size_v<remove_reference_t<Tuple>> is 1, then reference_constructs_from_temporary_v<T, decltype(get<0>(declval<Tuple>()))> is false.
```

Fixes #154274

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
2025-08-19 18:14:13 +08:00
Kareem Ergawy
0e93dbc6b1
[flang] do concurrent: Enable delayed localization by default (#154303)
Enables delayed localization by default for `do concurrent`. Tested both
gfortran and Fujitsu test suites.

All tests pass for gfortran tests. For Fujitsu, enabled delayed
localization passes more tests:

Delayed localization disabled:
Testing Time: 7251.76s
  Passed            : 88520
  Failed            :   162
  Executable Missing:   408

Delayed localization enabled:
Testing Time: 7216.73s
  Passed            : 88522
  Failed            :   160
  Executable Missing:   408
2025-08-19 12:07:17 +02:00
Benjamin Maxwell
7170a81241
[AArch64][SME] Rename EdgeBundles to Bundles (NFC) (#154295)
It seems some buildbots do not like the shadowing. See:
https://lab.llvm.org/buildbot/#/builders/137/builds/23838
2025-08-19 10:58:19 +01:00
Lang Hames
615f8393c9
[orc-rt] Remove unused LLVM_RT_TOOLS_BINARY_DIR cmake variable. (#154254)
This was accidentally left in ee7a6a45bdb.
2025-08-19 19:56:27 +10:00
Timm Baeder
ab8b4f6629
[clang][bytecode][NFC] Replace std::optional<unsigned> with UnsignedO… (#154286)
…rNone
2025-08-19 11:37:32 +02:00
David Green
a7df02f83c
[InstCombine] Make strlen optimization more resilient to different gep types. (#153623)
This makes the optimization in optimizeStringLength for strlen(gep
@glob, %x) -> sub endof@glob, %x a little more resilient, and maybe a
bit more correct for geps with non-array types.
2025-08-19 10:37:17 +01:00
Vinay Deshmukh
d286f2ef55
[libc++] Make std::__tree_node member private to prepare for UB removal (#154225)
Prepare for:
https://github.com/llvm/llvm-project/pull/153908#discussion_r2281756219
2025-08-19 11:27:56 +02:00
tangaac
ccbcebcfd3
[LoongArch] Fix implicit PesudoXVINSGR2VR error (#152432)
According to the instructions manual, when `vr0` is changed, high 128
bit of `xr0` is undefined.
Use `vinsgr2vr.b/h` to insert an `i8/i16` to low 128bit of a 256 vector
may cause undefined behavior when high 128bit is used in later
instructions.
2025-08-19 17:22:00 +08:00
Aditi Medhane
948abf1bf5
[PowerPC] Add BCDCOPYSIGN and BCDSETSIGN Instruction Support (#144874)
Support the following BCD format conversion builtins for PowerPC.

- `__builtin_bcdcopysign` – Conversion that returns the decimal value of
the first parameter combined with the sign code of the second parameter.
`
- `__builtin_bcdsetsign` – Conversion that sets the sign code of the
input parameter in packed decimal format.

> Note: This built-in function is valid only when all following
conditions are met:
> -qarch is set to utilize POWER9 technology.
> The bcd.h file is included.

## Prototypes

```c
vector unsigned char __builtin_bcdcopysign(vector unsigned char, vector unsigned char);
vector unsigned char __builtin_bcdsetsign(vector unsigned char, unsigned char);
```

## Usage Details

`__builtin_bcdsetsign`: Returns the packed decimal value of the first
parameter combined with the sign code.
The sign code is set according to the following rules:
- If the packed decimal value of the first parameter is positive, the
following rules apply:
     - If the second parameter is 0, the sign code is set to 0xC.
     - If the second parameter is 1, the sign code is set to 0xF.
- If the packed decimal value of the first parameter is negative, the
sign code is set to 0xD.
> notes:
>     The second parameter can only be 0 or 1.
> You can determine whether a packed decimal value is positive or
negative as follows:
> - Packed decimal values with sign codes **0xA, 0xC, 0xE, or 0xF** are
interpreted as positive.
> - Packed decimal values with sign codes **0xB or 0xD** are interpreted
as negative.

---------

Co-authored-by: Aditi-Medhane <aditi.medhane@ibm.com>
2025-08-19 14:47:27 +05:30
Temperz87
9cadc4e153
[DAG] SelectionDAG::canCreateUndefOrPoison - add ISD::SCMP/UCMP handling + tests (#154127)
This pr aims to resolve #152144 
In SelectionDAG::canCreateUndefOrPoison the ISD::SCMP/UCMP cases are
added to always return false as they cannot generate poison or undef
The `freeze-binary.ll` file is now testing the SCMP/UCMP cases

---------

Co-authored-by: Temperz87 <= temperz871@gmail.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-19 10:06:14 +01:00
Timm Baeder
a1039c1b84
[clang][bytecode] Fix initializing float elements from #embed (#154285)
Fixes #152885
2025-08-19 11:04:21 +02:00
Sergei Barannikov
56ce40bc73
[TableGen][DecoderEmitter] Stop duplicating encodings (NFC) (#154288)
When HwModes are involved, we can duplicate an instruction encoding that
does not belong to any HwMode multiple times. We can do better by
mapping HwMode to a list of encoding IDs it contains. (That is,
duplicate IDs instead of encodings.)

The encodings that were duplicated are still processed multiple times
(e.g., we call an expensive populateInstruction() on each instance).
This is going to be fixed in subsequent patches.
2025-08-19 09:02:22 +00:00
Benjamin Maxwell
eb764040bc
[AArch64][SME] Implement the SME ABI (ZA state management) in Machine IR (#149062)
## Short Summary

This patch adds a new pass `aarch64-machine-sme-abi` to handle the ABI
for ZA state (e.g., lazy saves and agnostic ZA functions). This is
currently not enabled by default (but aims to be by LLVM 22). The goal
is for this new pass to more optimally place ZA saves/restores and to
work with exception handling.

## Long Description

This patch reimplements management of ZA state for functions with
private and shared ZA state. Agnostic ZA functions will be handled in a
later patch. For now, this is under the flag `-aarch64-new-sme-abi`,
however, we intend for this to replace the current SelectionDAG
implementation once complete.

The approach taken here is to mark instructions as needing ZA to be in a
specific ("ACTIVE" or "LOCAL_SAVED"). Machine instructions implicitly
defining or using ZA registers (such as $zt0 or $zab0) require the
"ACTIVE" state. Function calls may need the "LOCAL_SAVED" or "ACTIVE"
state depending on the callee (having shared or private ZA).

We already add ZA register uses/definitions to machine instructions, so
no extra work is needed to mark these.

Calls need to be marked by glueing Arch64ISD::INOUT_ZA_USE or
Arch64ISD::REQUIRES_ZA_SAVE to the CALLSEQ_START.

These markers are then used by the MachineSMEABIPass to find
instructions where there is a transition between required ZA states.
These are the points we need to insert code to set up or restore a ZA
save (or initialize ZA).

To handle control flow between blocks (which may have different ZA state
requirements), we bundle the incoming and outgoing edges of blocks.
Bundles are formed by assigning each block an incoming and outgoing
bundle (initially, all blocks have their own two bundles). Bundles are
then combined by joining the outgoing bundle of a block with the
incoming bundle of all successors.

These bundles are then assigned a ZA state based on the blocks that
participate in the bundle. Blocks whose incoming edges are in a bundle
"vote" for a ZA state that matches the state required at the first
instruction in the block, and likewise, blocks whose outgoing edges are
in a bundle vote for the ZA state that matches the last instruction in
the block. The ZA state with the most votes is used, which aims to
minimize the number of state transitions.
2025-08-19 10:00:28 +01:00
Nikita Popov
4ab87ffd1e
[SCCP] Enable PredicateInfo for non-interprocedural SCCP (#153003)
SCCP can use PredicateInfo to constrain ranges based on assume and
branch conditions. Currently, this is only enabled during IPSCCP.

This enables it for SCCP as well, which runs after functions have
already been simplified, while IPSCCP runs pre-inline. To a large
degree, CVP already handles range-based optimizations, but SCCP is more
reliable for the cases it can handle. In particular, SCCP works reliably
inside loops, which is something that CVP struggles with due to LVI
cycles.

I have made various optimizations to make PredicateInfo more efficient,
but unfortunately this still has significant compile-time cost (around
0.1-0.2%).
2025-08-19 10:59:38 +02:00
Timm Baeder
fb8ee3adb6
[clang][bytecode] Move pointers from extern globals to new decls (#154273) 2025-08-19 10:54:33 +02:00
Michael Buch
d9d5090b03
[CI] Run LLDB tests on Clang changes in pre-merge CI (#154154)
This attempts https://github.com/llvm/llvm-project/issues/132795 again.
Last time we tried this we didn't have enough infra capacity, so had to
revert. According to recent communication from the Infrastructure Area
Team, we should now have enough capacity to re-enable the LLDB tests.
2025-08-19 09:44:08 +01:00
Baranov Victor
ef3ce0dcb2
[Github] Remove redundant 'START_REV', 'END_REV' env variables (NFC) (#154218)
After https://github.com/llvm/llvm-project/pull/133023, `START_REV` and
`END_REV` env variables became redundant.
2025-08-19 11:41:37 +03:00