548981 Commits

Author SHA1 Message Date
Aiden Grossman
5b2c3aac90
[MCA][X86] Pretend To Have a Stack Engine (#153348)
This patch removes RSP dependencies from push and pop instructions to
pretend that we have a stack engine. This does not model details like
sync uops that are relevant implementation details due to complexity.
This is just enabled on all X86 CPUs given LLVM does not have a
scheduling model for any X86 CPU that does not have a stack engine.

This fixes #152008.
2025-08-18 13:44:43 +00:00
Shilei Tian
e37eff5dcd
[AMDGPU] Add an option to completely disable kernel argument preload (#153975)
The existing `amdgpu-kernarg-preload-count` can't be used as a switch to
turn it off if it is set to 0. This PR adds an extra option to turn it
off.

Fixes SWDEV-550147.
2025-08-18 09:44:20 -04:00
Jonathan Thackray
f38c83c582
[AArch64][llvm] Disassemble instructions in SYS alias encoding space more correctly (#153905)
For instructions in the `SYS` alias encoding space which take no
register operands, and where the unused 5 register bits are not all set
(0x31, 0b11111), then disassemble to a `SYS` alias and not the
instruction, since it is not considered valid.

This is because it is specified in the Arm ARM in text similar to this
(e.g. page C5-1037 of DDI0487L.b for `TLBI ALLE1`, or page C5-1585 for
`GCSPOPX`):
```
  Rt should be encoded as 0b11111. If the Rt field is not set to 0b11111,
  it is CONSTRAINED UNPREDICTABLE whether:
    * The instruction is UNDEFINED.
    * The instruction behaves as if the Rt field is set to 0b11111.
```

Since we want to follow "should" directives, and not encourage undefined
behaviour, only assemble or disassemble instructions considered valid.
Add an extra test-case for this, and all existing test-cases are
continuing to pass.
2025-08-18 14:41:41 +01:00
Timm Baeder
31d2db2a68
[clang][bytecode][NFC] Use UnsignedOrNone for Block::DeclID (#154104) 2025-08-18 15:40:44 +02:00
Erich Keane
340fa3e1bb
[OpenACC] Implement firstprivate lowering except init. (#153847)
This patch implements the basic lowering infrastructure, but does not
quite implement the copy initialization, which requires #153622.

It does however pass verification for the 'copy' section, which just
contains a yield.
2025-08-18 06:33:40 -07:00
Aiden Grossman
1650e4a73c
[X86] Remove TuningPOPCNTFalseDeps from AlderLake (#154004)
This false dependency issue was fixed in CannonLake looking at the data
from uops.info. This is confirmed not to be an issue based on
benchmarking data in #153983. Setting this can potentially lead to extra
xor instructions whihc could consume extra frontend/renaming resources.

None of the other CPUs that have had this fixed have the tuning flag.

Fixes #153983.
2025-08-18 06:31:16 -07:00
Matthias Springer
f84aaa6eaa
[mlir][Transforms] Dialect conversion: Add flag to dump materialization kind (#119532)
Add a debugging flag to the dialect conversion to dump the
materialization kind. This flag is useful to find out whether a missing
materialization rule is for source or target materializations.

Also add missing test coverage for the `buildMaterializations` flag.
2025-08-18 13:25:18 +00:00
Nikita Popov
ba45ac61b6 [CAS] Temporarily disable broken test
This test hangs forever if executed with less than three cores
available, see:
https://github.com/llvm/llvm-project/pull/114096#issuecomment-3196698403
2025-08-18 15:09:08 +02:00
Chaitanya
4a3bf27c69
[OpenMP] Introduce omp.target_allocmem and omp.target_freemem omp dialect ops. (#145464)
This PR introduces two new ops in omp dialect, omp.target_allocmem and
omp.target_freemem.
omp.target_allocmem: Allocates heap memory on device. Will be lowered to
omp_target_alloc call in llvm.
omp.target_freemem: Deallocates heap memory on device. Will be lowered
to omp+target_free call in llvm.


Example:
  %1 = omp.target_allocmem %device : i32, i64
  omp.target_freemem %device, %1 : i32, i64

The work in this PR is C-P/inspired from @ivanradanov commit from
coexecute implementation:
[Add fir omp target alloc and free
ops](be860ac8ba)
[Lower omp_target_{alloc,free} to
llvm](6e2d584dc9)
2025-08-18 18:15:11 +05:30
jofrn
e8e3e6e893
[LiveVariables] Mark use without implicit if defined at instr (#119446)
LiveVariables will mark instructions with their implicit subregister
uses. However, it will also mark the subregister as an implicit if its
own definition is a subregister of it, i.e. `$r3 = OP val, implicit-def
$r0_r1_r2_r3, ..., implicit $r2_r3`, even if it is otherwise unused,
which defines $r3 on the same line it is used.

This change ensures such uses are marked without implicit, i.e. `$r3 =
OP val, implicit-def $r0_r1_r2_r3, ..., $r2_r3`.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-08-18 08:34:59 -04:00
Akash Banerjee
6aafe6582d Fix test added in 1fd1d634630754cc9b9c4b5526961d5856f64ff9 2025-08-18 13:29:23 +01:00
ZhaoQi
8f671a675f
[LoongArch] Always emit symbol-based relocations regardless of relaxation (#153943)
This commit changes all relocations to be relocated with symbols.

Without this commit, errors may occur in some cases, such as when using
`llc/lto+relax`, or combining relaxed and norelaxed object files using
`ld -r`.

Some tests updated.
2025-08-18 20:15:49 +08:00
Jonathan Cohen
c6fe567064
[AArch64][MachineCombiner] Combine sequences of gather patterns (#152979)
Reland of #142941

Squashed with fixes for #150004, #149585

This pattern matches gather-like patterns where
values are loaded per lane into neon registers, and 
replaces it with loads into 2 separate registers, which
will be combined with a zip instruction. This decreases
the critical path length and improves Memory Level
Parallelism.

rdar://151851094
2025-08-18 15:10:59 +03:00
Utkarsh Saxena
673750feea
[LifetimeSafety] Implement a basic use-after-free diagnostic (#149731)
Implement use-after-free detection in the lifetime safety analysis with two warning levels.

- Added a `LifetimeSafetyReporter` interface for reporting lifetime safety issues
- Created two warning levels:
    - Definite errors (reported with `-Wexperimental-lifetime-safety-permissive`)
    - Potential errors (reported with `-Wexperimental-lifetime-safety-strict`)
- Implemented a `LifetimeChecker` class that analyzes loan propagation and expired loans to detect use-after-free issues.
- Added tracking of use sites through a new `UseFact` class.
- Enhanced the `ExpireFact` to track the expressions where objects are destroyed.
- Added test cases for both definite and potential use-after-free scenarios.

The implementation now tracks pointer uses and can determine when a pointer is dereferenced after its loan has been expired, with appropriate diagnostics.

The two warning levels provide flexibility - definite errors for high-confidence issues and potential errors for cases that depend on control flow.
2025-08-18 13:46:43 +02:00
Kareem Ergawy
c1e2a9c66d
[flang][OpenMP] Only privaize pre-determined symbols when defined the evaluation. (#154070)
Fixes a regression uncovered by Fujitsu test 0686_0024.f90. In
particular, verifies that a pre-determined symbol is only privatized by
its defining evaluation (e.g. the loop for which the symbol was marked
as pre-determined).
2025-08-18 13:36:08 +02:00
Mehdi Amini
cfe5975eaf
[MLIR] Fix SCF verifier crash (#153974)
An operand of the nested yield op can be null and hasn't been verified
yet when processing the enclosing operation. Using `getResultTypes()`
will dereference this null Value and crash in the verifier.
2025-08-18 12:48:55 +02:00
Simon Pilgrim
681ecae913
[DAG] visitTRUNCATE - test abd legality early to avoid unnecessary computeKnownBits/ComputeNumSignBits calls. NFC. (#154085)
isOperationLegal is much cheaper than value tracking
2025-08-18 11:06:29 +01:00
林克
6842cc5562
[RISCV] Add SpacemiT XSMTVDot (SpacemiT Vector Dot Product) extension. (#151706)
The full spec can be found at spacemit-x60 processor support scope:
Section 2.1.2.2 (Features):

https://developer.spacemit.com/documentation?token=BWbGwbx7liGW21kq9lucSA6Vnpb#2.1

This patch only supports assembler.
2025-08-18 18:03:17 +08:00
Arne Stenkrona
ea2f5395b1
[SimplifyCFG] Avoid threading for loop headers (#151142)
Updates SimplifyCFG to avoid jump threading through loop headers if
-keep-loops is requested. Canonical loop form requires a loop header
that dominates all blocks in the loop. If we thread through a header, we
risk breaking its domination of the loop. This change avoids this issue
by conservatively avoiding threading through headers entirely.

Fixes: https://github.com/llvm/llvm-project/issues/151144
2025-08-18 09:46:55 +00:00
Simon Pilgrim
169b43d4b8 Remove unused variable introduced in #152705 2025-08-18 10:45:09 +01:00
ZhaoQi
76fb1619f0
[LoongArch] Reduce number of reserved relocations when relax enabled (#153769) 2025-08-18 17:42:43 +08:00
Andrzej Warzyński
51b5a3e1a6
[MLIR] Add Egress dialects maintainers (#151721)
As per https://discourse.llvm.org/t/mlir-project-maintainers/87189, this
PR adds maintainers for the "egress" dialects.

Compared to the original proposal, two changes are included:
* The "mesh" dialect has been renamed to "shard"
(https://discourse.llvm.org/t/mlir-mesh-cleanup-mesh/).
* The "XeVM" dialect has been added
(https://discourse.llvm.org/t/rfc-proposal-for-new-xevm-dialect/).
2025-08-18 10:34:44 +01:00
Simon Pilgrim
36f911173a [X86] avx512vlbw-builtins.c - add C/C++ test coverage 2025-08-18 10:30:15 +01:00
Simon Pilgrim
6036e5d0d7 [X86] avx512vlbw-reduceIntrin.c - add C/C++ and -fno-signed-char test coverage 2025-08-18 10:30:14 +01:00
Timm Baeder
0d05c42b6a
[clang][bytecode] Improve __builtin_{,dynamic_}object_size implementation (#153601) 2025-08-18 11:12:33 +02:00
Oliver Hunt
bcab8ac126
[clang] return type not correctly deduced for discarded lambdas (#153921)
The early return for lamda expressions with deduced return types in
Sema::ActOnCapScopeReturnStmt meant that we were not actually perform
the required return type deduction for such lambdas when in a discarded
context.

This PR removes that early return allowing the existing return type
deduction steps to be performed.

Fixes #153884

Fix developed by, and

Co-authored-by: Corentin Jabot <corentinjabot@gmail.com>
2025-08-18 02:07:27 -07:00
Mehdi Amini
16aa283344
[MLIR] Refactor the walkAndApplyPatterns driver to remove the recursion (#154037)
This is in preparation of a follow-up change to stop traversing
unreachable blocks.

This is not NFC because of a subtlety of the early_inc. On a test case
like:

```
  scf.if %cond {
    "test.move_after_parent_op"() ({
      "test.any_attr_of_i32_str"() {attr = 0 : i32} : () -> ()
    }) : () -> ()
  }
```

We recursively traverse the nested regions, and process an op when the
region is done (post-order).
We need to pre-increment the iterator before processing an operation in
case it gets deleted. However
we can do this before or after processing the nested region. This
implementation does the latter.
2025-08-18 09:07:19 +00:00
Balázs Kéri
a0f325bd41
[clang-tidy] Added check 'misc-override-with-different-visibility' (#140086) 2025-08-18 11:00:42 +02:00
Mehdi Amini
87e6fd161a
[MLIR] Erase unreachable blocks before applying patterns in the greedy rewriter (#153957)
Operations like:

    %add = arith.addi %add, %add : i64

are legal in unreachable code. Unfortunately many patterns would be
unsafe to apply on such IR and can lead to crashes or infinite loops. To
avoid this we can remove unreachable blocks before attempting to apply
patterns.
We may have to do this also whenever the CFG is changed by a pattern, it
is left up for future work right now.

Fixes #153732
2025-08-18 10:59:43 +02:00
David Sherwood
7ee6cf06c8
[LV] Fix incorrect cost kind in VPReplicateRecipe::computeCost (#153216)
We were incorrectly using the TTI::TCK_RecipThroughput cost kind and
ignoring the kind set in the context.
2025-08-18 09:52:31 +01:00
hstk30-hw
c99cbc880f
[llvm] Fix typo for CGProfile (NFC) (#153370) 2025-08-18 16:46:27 +08:00
Joachim
98dd1888bf
[OpenMP][Test][NFC] output tool data as hex to improve readibility (#152757)
Using hex format allows to better interpret IDs: 
the first digits represent the thread number, the last digits represent
the ID within a thread

The main change is in callback.h: PRIu64 -> PRIx64

The patch also guards RUN/CHECK lines in openmp/runtime/tests/ompt with clang-format on/off comments and clang-formats the directory.

---------

Co-authored-by: Kaloyan Ignatov <kaloyan.ignatov@rwth-aachen.de>
2025-08-18 10:42:33 +02:00
Simon Pilgrim
ce5276f61c
[Clang][X86] Add avx512 __builtin_ia32_select* constexpr handling (#152705)
This should allow us to constexpr many avx512 predicated intrinsics where they wrap basic intrinsics that are already constexpr

Fixes #152321
2025-08-18 09:37:20 +01:00
Matthias Springer
ff68f7115c
[mlir][builtin] Make unrealized_conversion_cast inlineable (#139722)
Until now, `builtin.unrealized_conversion_cast` ops could not be inlined
by the Inliner pass.
2025-08-18 10:23:26 +02:00
Matt Arsenault
53e9d3247e
DAG: Remove unnecessary getPointerTy call (#154055)
getValueType already did this
2025-08-18 17:12:16 +09:00
David Green
8f98529209 [AArch64] Remove SIMDLongThreeVectorTiedBHSabal tablegen class.
Similar to #152987 this removes SIMDLongThreeVectorTiedBHSabal as it is
equivalent to SIMDLongThreeVectorTiedBHS with a better TriOpFrag pattern.
2025-08-18 09:11:13 +01:00
ZhaoQi
8181c76bca
[LoongArch][NFC] More tests to ensure branch relocs reserved when relax enabled (#153768) 2025-08-18 16:07:36 +08:00
Ahmad Yasin
1b0bce972b
Reorder checks to speed up getAppleRuntimeUnrollPreferences() (#154010)
- Delay load/store values calculation unless a best unroll-count is
found
- Remove extra getLoopLatch() invocation
2025-08-18 11:06:37 +03:00
Matthias Springer
f7b09ad700
[mlir][LLVM] ArithToLLVM: Add 1:N support for arith.select lowering (#153944)
Add 1:N support for the `arith.select` lowering. Only cases where the
entire true/false value is selected are supported.
2025-08-18 09:42:37 +02:00
Jim Lin
127ba533bd
[RISCV] Remove ST->hasVInstructions() from getIntrinsicInstrCost for cttz/ctlz/ctpop. NFC. (#154064)
That isn't necessary if we've checked ST->hasStdExtZvbb().
2025-08-18 15:24:25 +08:00
Nikita Popov
246a64a12e
[Clang] Rename HasLegalHalfType -> HasFastHalfType (NFC) (#153163)
This option is confusingly named. What it actually controls is whether,
under the default of `-ffloat16-excess-precision=standard`, it is
beneficial for performance to perform calculations on float (without
intermediate rounding) or not. For `-ffloat16-excess-precision=none` the
LLVM `half` type will always be used, and all backends are expected to
legalize it correctly.
2025-08-18 09:23:48 +02:00
Nikita Popov
238c3dcd0d
[CodeGen][Mips] Remove fp128 libcall list (#153798)
Mips requires fp128 args/returns to be passed differently than i128. It
handles this by inspecting the pre-legalization type. However, for soft
float libcalls, the original type is currently not provided (it will
look like a i128 call). To work around that, MIPS maintains a list of
libcalls working on fp128.

This patch removes that list by providing the original, pre-softening
type to calling convention lowering. This is done by carrying additional
information in CallLoweringInfo, as we unfortunately do need both types
(we want the un-softened type for OrigTy, but we need the softened type
for the actual register assignment etc.)

This is in preparation for completely removing all the custom
pre-analysis code in the Mips backend and replacing it with use of
OrigTy.
2025-08-18 09:22:41 +02:00
David Green
790bee99de
[VectorCombine] Remove dead node immediately in VectorCombine (#149047)
The vector combiner will process all instructions as it first loops
through the function, adding any newly added and deleted instructions to
a worklist which is then processed when all nodes are done. These leaves
extra uses in the graph as the initial processing is performed, leading
to sub-optimal decisions being made for other combines. This changes it
so that trivially dead instructions are removed immediately. The main
changes that this requires is to make sure iterator invalidation does not
occur.
2025-08-18 07:55:21 +01:00
ZhaoQi
6957e44d8e
[LoongArch][MC] Refine conditions for emitting ALIGN relocations (#153365)
According to the suggestions in
https://github.com/llvm/llvm-project/pull/150816, this commit refine the
conditions for emitting R_LARCH_ALIGN relocations.

Some existing tests are updated to avoid being affected by this
optimization. New tests are added to verify: removal of redundant ALIGN
relocations, ALIGN emitted after the first linker-relaxable instruction,
and conservatively emitted ALIGN in lower-numbered subsections.
2025-08-18 14:54:27 +08:00
Kazu Hirata
b6a62a496f
[ADT] Use range-based for loops in SetVector (NFC) (#154058) 2025-08-17 23:46:43 -07:00
Kazu Hirata
cbf5af9668
[llvm] Remove unused includes (NFC) (#154051)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-08-17 23:46:35 -07:00
Kazu Hirata
400dde6ca8
[RISCV] Remove an unnecessary cast (NFC) (#154049)
&UncompressedMI is already of MCInst *.
2025-08-17 23:46:27 -07:00
Kazu Hirata
1f3c38f125
[Support] Remove an unnecessary cast (NFC) (#154048)
qp is already of uint64_t.
2025-08-17 23:46:20 -07:00
Guray Ozen
5d300afa80
[MLIR][NVVM] Add support for multiple return values in inline_ptx (#153774)
This PR adds the ability for `nvvm.inline_ptx` to return multiple
values, matching the expected semantics in PTX while respecting LLVM’s
constraints.

LLVM’s `inline_asm` op does not natively support multiple returns —
instead, it requires packing results into an LLVM `struct` and then
extracting them. This PR implements automatic packing/unpacking so that
multiple return values can be expressed naturally in MLIR without extra
user boilerplate.

**Example**
MLIR:

```
%r1, %r2 = nvvm.inline_ptx  "{
   .reg .pred p;
   setp.ge.s32 p, $2, $3;
   selp.s32 $0, $2, $3, p;
   selp.s32 $1, $2, $3, !p;
}" (%a, %b) : i32, i32 -> i32, i32

%r3 = llvm.add %r1, %r2 : i32
```

Lowered LLVM IR:

```
%1 = llvm.inline_asm has_side_effects asm_dialect = att "{\0A\09 .reg .pred p;\0A\09 setp.ge.s32 p, $2, $3;\0A\09 selp.s32 $0, $2, $3, p;\0A\09 selp.s32 $1, $2, $3, !p;\0A\09}\0A", "=r,=r,r,r" %a, %b : (i32, i32) -> !llvm.struct<(i32, i32)>
%2 = llvm.extractvalue %1[0] : !llvm.struct<(i32, i32)>
%3 = llvm.extractvalue %1[1] : !llvm.struct<(i32, i32)>
%4 = llvm.add %2, %3 : i32
```
2025-08-18 08:37:55 +02:00
yronglin
e6e874ce8f
[clang] Allow trivial pp-directives before C++ module directive (#153641)
Consider the following code:

```cpp
# 1 __FILE__ 1 3
export module a;
```

According to the wording in
[P1857R3](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1857r3.html):
```
A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)
```

and the wording in
[[cpp.pre]](https://eel.is/c++draft/cpp.pre#nt:module-file)
```
module-file:
    pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt]
```

`#` is the first pp-token in the translation unit, and it was rejected
by clang, but they really should be exempted from this rule. The goal is
to not allow any preprocessor conditionals or most state changes, but
these don't fit that.

State change would mean most semantically observable preprocessor state,
particularly anything that is order dependent. Global flags like being a
system header/module shouldn't matter.

We should exempt a brunch of directives, even though it violates the
current standard wording.

In this patch, we introduce a `TrivialDirectiveTracer` to trace the
**State change** that described above and propose to exempt the
following kind of directive: `#line`, GNU line marker, `#ident`,
`#pragma comment`, `#pragma mark`, `#pragma detect_mismatch`, `#pragma
clang __debug`, `#pragma message`, `#pragma GCC warning`, `#pragma GCC
error`, `#pragma gcc diagnostic`, `#pragma OPENCL EXTENSION`, `#pragma
warning`, `#pragma execution_character_set`, `#pragma clang
assume_nonnull` and builtin macro expansion.

Fixes https://github.com/llvm/llvm-project/issues/145274

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
2025-08-18 14:17:35 +08:00