542054 Commits

Author SHA1 Message Date
Paschalis Mpeis
08513281bd
[BOLT][test] Drop toolname from X86/perf2bolt-spe.test (#145515) 2025-06-24 15:12:16 +01:00
Kazu Hirata
63f30d7d82
[mlir] Migrate away from {TypeRange,ValueRange}(std::nullopt) (NFC) (#145445)
ArrayRef has a constructor that accepts std::nullopt.  This
constructor dates back to the days when we still had llvm::Optional.

Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.

This patch migrates away from TypeRagne(std::nullopt) and
ValueRange(std::nullopt).
2025-06-24 07:03:59 -07:00
Nicolas Vasilache
6dad1e87fb [mlir][transform][Linalg] NFC - DCE unused options in PadTilingInterfaceOptions 2025-06-24 15:46:15 +02:00
David
8fec6d1177
llvm-c: Introduce 'LLVMDISubprogramReplaceType' (#143461)
The C API does not provide a way to replace the subroutine type after
creating a subprogram. This functionality is useful for creating a
subroutine type composed of types which have the subprogram as scope
2025-06-24 14:42:06 +01:00
Nikita Popov
68f09370f9 [Module] Use getDeclarationIfExists() (NFC)
Don't insert declarations in order to immediately remove them
again.
2025-06-24 15:37:27 +02:00
Orlando Cazalet-Hyams
75cf826849 [KeyInstr][Clang] Fix atomic ops atoms test
Fixup test added in #141624 (ddecfa696c4929ac364053f3eef66fefe4873448).
2025-06-24 14:37:21 +01:00
Ellis Hoag
b77c7138a8
[lld][BP] Fix duplicate section size measurment (#145384) 2025-06-24 06:31:23 -07:00
Pavel Labath
3e98d2b031 [lldb] Fix windows build for #145293 2025-06-24 15:25:10 +02:00
Tobias Stadler
9186df9b08
[InlineCost] Simplify extractvalue across callsite (#145054)
Motivation: When using libc++, `std::bitset<64>::count()` doesn't
optimize to a single popcount instruction on AArch64, because we fail to
inline the library code completely. Inlining fails, because the internal
bit_iterator struct is passed as a [2 x i64] %arg value on AArch64. The
value is built using insertvalue instructions and only one of the array
entries is constant. If we know that this entry is constant, we can
prove that half the function becomes dead. However, InlineCost only
considers operands for simplification if they are Constants, which %arg
is not. Without this simplification the function is too expensive to
inline.

Therefore, we had to teach InlineCost to support non-Constant simplified values
(PR #145083). Now, we enable this for extractvalue, because we want to simplify
the extractvalue with the insertvalues from the caller function. This is enough to
get bitset::count fully optimized.

There are similar opportunities we can explore for BinOps in the future
(e.g. cmp eq %arg1, %arg2 when the caller passes the same value into
both arguments), but we need to be careful here, because InstSimplify
isn't completely safe to use with operands owned by different functions.
2025-06-24 14:15:27 +01:00
Balázs Benics
e04c938cc0
[analyzer][NFC] Add xrefs to a test case that has poor git blame (#145501) 2025-06-24 14:50:14 +02:00
Balázs Benics
6fe8543a2a
[analyzer][docs] Mention perfetto for visualizing trace JSONs (#145500) 2025-06-24 14:49:43 +02:00
Simon Pilgrim
db4dc88d06
[X86] combineEXTRACT_SUBVECTOR - remove unnecessary bitcast handling. (#145496)
We already aggressively fold extract_subvector(bitcast()) -> bitcast(extract_subvector())
2025-06-24 13:47:03 +01:00
Darren Wihandi
9f3931b659
[AMDGPU] Fold fmed3 when inputs include infinity (#144824) 2025-06-24 21:44:17 +09:00
Ross Brunton
4785832144
[Offload] Fix cmake warning (#145488)
Cmake was unhappy that there was no space between arguments, now it
is.
2025-06-24 13:42:03 +01:00
Kareem Ergawy
9aebfde1e7
[flang] Allow cycle in target teams distribute [simd] (#145462)
flang incorrectly issues a semantic erorr when a `cycle` statement is
used inside a `target teams distribute [simd]` associated loop. This is
not prevented by the spec, therefore this PR allows such construct.
2025-06-24 14:21:06 +02:00
Orlando Cazalet-Hyams
352baa386c
[RemoveDIs] Resolve RemoveRedundantDbgInstrs fwd scan FIXME (#144718)
These FIXMEs were added to keep the dbg_record implementation identical to the
dbg intrinsic versions, which have since been removed. I don't think there's any
reason for the old behaviour; my understanding is it was a minor bug no one got
round to fixing.

I've upgraded the test to be written with dbg_records while I'm here.
2025-06-24 13:09:49 +01:00
David Green
825ad86aea
[DAG] Fold nested add(add(reduce(a), b), add(reduce(c), d)) (#115150)
This patch reassociates `add(add(vecreduce(a), b), add(vecreduce(c),
d))` into `add(vecreduce(add(a, c)), add(b, d))`, to combine the
reductions into a single node. This comes up after unrolling vectorized
loops.

There is another small change to move reassociateReduction inside fadd
outside of a AllowNewConst block, as new constants will not be created
and it should be OK to perform the combine later after legalization.
2025-06-24 13:08:59 +01:00
Orlando Cazalet-Hyams
db72f6cbe6
[RemoveDIs][NFC] Remove dbg intrinsic handling code from AssignmentTrackingAnalysis (#144674)
See PR for breakdown into individual commits.
2025-06-24 13:07:31 +01:00
Fabian Mora
8f4da2cbf0
[mlir][affine] Fix min simplification in makeComposedAffineApply (#145376)
This patch fixes a bug discovered in the
`affine::makeComposedFoldedAffineApply` function when `composeAffineMin
== true`. The bug happened because the simplification assumed the
symbols appearing in the `affine.apply` op corresponded to symbols in
the `affine.min` op, and that's not always the case. For example:

```mlir
#map = affine_map<()[s0, s1] -> (s1)>
#map1 = affine_map<()[s0, s1] -> (s0 ceildiv s1)>
module {
  func.func @min_max_full_simplify() -> index {
    %0 = test.value_with_bounds {max = 64 : index, min = 32 : index}
    %1 = test.value_with_bounds {max = 64 : index, min = 32 : index}
    %2 = affine.min #map()[%0, %1]
    %3 = affine.apply #map1()[%2, %0]
    return %3 : index
  }
}
```

This patch also introduces the test `make_composed_folded_affine_apply`
transform operation to test this simplification. It also adds tests
ensuring we get correct behavior.

---------

Co-authored-by: Nicolas Vasilache <nico.vasilache@amd.com>
2025-06-24 07:55:12 -04:00
Orlando Cazalet-Hyams
1dc46d45fc
[RemoveDIs] Fix rotten --implicit-check-not lines (#144711) 2025-06-24 12:32:50 +01:00
Orlando Cazalet-Hyams
ddecfa696c
[KeyInstr][Clang] Atomic ops atoms (#141624)
This patch is part of a stack that teaches Clang to generate Key
Instructions metadata for C and C++.

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be
removed.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
2025-06-24 12:20:44 +01:00
David Spickett
fa5d7c926f [lldb][lldb-dap] Fix runInTerminal test program on Windows 2025-06-24 11:07:45 +00:00
Florian Hahn
b8769104f1
[LAA] Address follow-up suggestions for #128061.
Adjust naming and add argument comments as suggested.
2025-06-24 12:00:17 +01:00
yronglin
e8976e92f6
[clang][Preprocessor] Add peekNextPPToken, makes look ahead next token without side-effects (#143898)
This PR introduce a new function `peekNextPPToken`. It's an extension of
`isNextPPTokenLParen` and can makes look ahead one token in preprocessor
without side-effects.

It's also the 1st part of
https://github.com/llvm/llvm-project/pull/107168 and it was used to look
ahead next token then determine whether current lexing pp directive is
one of pp-import or pp-module directive.

At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:

 - After skipping horizontal whitespace are
    - at the start of a logical line, or
    - preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
    - <, ", or : (but not ::) pp tokens for import, or
    - ; for module
Otherwise the token is treated as an identifier.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
2025-06-24 18:55:21 +08:00
Pavel Labath
4d2b79b04a [lldb] Fix build for #145017
Mid-flight collision with #145293.
2025-06-24 12:45:44 +02:00
Chris Jackson
bfde147761
[NFC][AMDGPU] Update and.ll test and automate check line generation (#145371)
- Convert the test to use update_llc_test_checks.py.
- Use different check prefixes for the different -mcpu options.
- Remove unused xnack 'off' flag.
2025-06-24 11:42:49 +01:00
Pavel Labath
24438aa488
[lldb] Use Socket::CreatePair for launching debugserver (#145017)
This lets get rid of platform-specific code in ProcessGDBRemote and use
the
same code path (module differences in socket types) everywhere. It also
unlocks
further cleanups in the debugserver launching code.

The main effect of this change is that lldb on windows will now use the
`--fd` lldb-server argument for "local remote" debug sessions instead of
having lldb-server connect back to lldb. This is the same method used by
lldb on non-windows platforms (for many years) and "lldb-server
platform" on windows for truly remote debug sessions (for ~one year).

Depends on #145015.
2025-06-24 12:39:24 +02:00
Michael Buch
371f12f96d
Revert "[lldb] Add count for number of DWO files loaded in statistics" (#145494)
Reverts llvm/llvm-project#144424

Caused CI failures.

macOS CI failure was:
```
10:20:36  FAIL: test_dwp_dwo_file_count (TestStats.TestCase)
10:20:36      Test "statistics dump" and the loaded dwo file count.
10:20:36  ----------------------------------------------------------------------
10:20:36  Traceback (most recent call last):
10:20:36    File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/test/API/commands/statistics/basic/TestStats.py", line 639, in test_dwp_dwo_file_count
10:20:36      self.assertEqual(debug_stats["totalDwoFileCount"], 2)
10:20:36  AssertionError: 0 != 2
10:20:36  Config=arm64-/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/lldb-build/bin/clang
10:20:36  ======================================================================
10:20:36  FAIL: test_no_debug_names_eager_loads_dwo_files (TestStats.TestCase)
10:20:36      Test the eager loading behavior of DWO files when debug_names is absent by
10:20:36  ----------------------------------------------------------------------
10:20:36  Traceback (most recent call last):
10:20:36    File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/test/API/commands/statistics/basic/TestStats.py", line 566, in test_no_debug_names_eager_loads_dwo_files
10:20:36      self.assertEqual(debug_stats["totalDwoFileCount"], 2)
10:20:36  AssertionError: 0 != 2
10:20:36  Config=arm64-/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/lldb-build/bin/clang
10:20:36  ======================================================================
10:20:36  FAIL: test_split_dwarf_dwo_file_count (TestStats.TestCase)
10:20:36      Test "statistics dump" and the dwo file count.
10:20:36  ----------------------------------------------------------------------
10:20:36  Traceback (most recent call last):
10:20:36    File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/test/API/commands/statistics/basic/TestStats.py", line 588, in test_split_dwarf_dwo_file_count
10:20:36      self.assertEqual(len(debug_stats["modules"]), 1)
10:20:36  AssertionError: 42 != 1
10:20:36  Config=arm64-/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/lldb-build/bin/clang
```
2025-06-24 11:33:00 +01:00
Kerry McLaughlin
61b99ca512
[AArch64] Consider StreamingSVE in shouldExpandGetActiveLaneMask (#144722)
If StreamingSVE is available, we may be able to lower the intrinsic
to the GET_ACTIVE_LANE_MASK node instead of expanding it.
Also adds the node to addTypeForFixedLengthSVE to ensure we lower
to the SVE instruction when useSVEForFixedLengthVectors is true.
2025-06-24 11:08:48 +01:00
David Truby
049d61ad65
[flang][AArch64] Always link compiler-rt to flang after libgcc (#144710)
This patch fixes an issue where the __trampoline_setup symbol is missing
with some programs compiled with flang. This symbol is present only in
compiler-rt and not in libgcc. This patch adds compiler-rt to the link
line after libgcc if libgcc is being used, so that only this symbol will
be picked from compiler-rt.

Fixes #141147
2025-06-24 11:08:13 +01:00
Simon Pilgrim
594ebe6340
[X86] combineSelect - move vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)) fold (#145475)
Move the OR(PSHUFB(),PSHUFB()) fold to reuse an existing
createShuffleMaskFromVSELECT result and ensure it is performed before
the combineX86ShufflesRecursively combine to prevent some hasOneUse
failures noticed in #133947 (combineX86ShufflesRecursively still
unnecessarily widens vectors in several locations).
2025-06-24 10:50:29 +01:00
Diana Picus
54b522f6fd
[AMDGPU] Fixup a201f8872a63 (#145486)
Fix test lines based on old revision for main.
2025-06-24 11:43:28 +02:00
Benjamin Kramer
e4b9aa6192 [bazel] Port d31ba5256327d30f264c2f671bf197877b242cde 2025-06-24 11:37:50 +02:00
Benjamin Kramer
45c5eb168f [bazel] mlir_copts doesn't exist 2025-06-24 11:31:55 +02:00
Pavel Labath
cf9546b826
[lldb] Remove GDBRemoteCommunication::ConnectLocally (#145293)
Originally added for reproducers, it is now only used for test code.

While we could make it a test helper, I think that after #145015 it is
simple enough to not be needed.

Also squeeze in a change to make ConnectionFileDescriptor accept a
unique_ptr<Socket>.
2025-06-24 11:11:35 +02:00
Pavel Labath
46e1e9f104
Reapply "[lldb/cmake] Plugin layering enforcement mechanism (#144543)" (#145305)
The only difference from the original PR are the added BRIEF and
FULL_DOCS arguments to define_property, which are required for
cmake<3.23.
2025-06-24 11:10:35 +02:00
Diana Picus
a201f8872a
[AMDGPU] Replace dynamic VGPR feature with attribute (#133444)
Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget
feature, as requested in #130030.
2025-06-24 11:09:36 +02:00
Lang Hames
6cfa03f1f1
[ORC] Drop unused LinkGraphLinkingLayer::Plugin::notifyLoaded method. (#145457)
This method was included in the original Plugin API as a counterpart to
JITEventListener::notifyLoaded but was never used.
2025-06-24 19:00:24 +10:00
antoine moynault
5fa55b2dfc
Revert "[flang][OpenMP] Skip runtime mapping with no offload targets (#144534)" (#145478)
And also revert 6ba1955 "[flang][OpenMP] Fix ignore-target-data.f90 test"

As it causes several bot failures
https://github.com/llvm/llvm-project/pull/144534#issuecomment-2995303224
2025-06-24 10:51:26 +02:00
Matt Arsenault
73e4f8a71f
ARM: Use member initializer list (#145459) 2025-06-24 17:47:34 +09:00
Kazu Hirata
8d9911e4a0
[Option] Use a range-based for loop (NFC) (#145446) 2025-06-24 00:46:17 -07:00
Aviad Cohen
d5c8024dae
[mlir][bazel]: Add FuncUtil rule in bazel files (#145463) 2025-06-24 10:40:57 +03:00
Nikita Popov
0112f12eb6
[EarlyCSE] Remove void return restriction for call CSE (#145320)
For readonly/readnone calls returning void we can't CSE the return
value. However, making these participate in CSE is still useful,
because it allows DCE of calls that are not willreturn/nounwind
(something no other part of LLVM is capable of removing).

The more interesting use-case is CSE for writeonly calls (not
yet supported), but I figured this change makes sense independently.

There is no impact on compile-time.
2025-06-24 09:20:03 +02:00
Juan Manuel Martinez Caamaño
8ec0552a7f
Reapply "[CUDA][HIP] Add a __device__ version of std::__glibcxx_assert_fail() (#144886)
Modifications to reapply the commit:
* Add noexcept only after C++11 on __glibcxx_assert_fail
* Remove vararg version of __glibcxx_assert_fail
2025-06-24 09:13:13 +02:00
Kazu Hirata
f704738781
[verify-uselistorder] Use llvm::is_sorted (NFC) (#145444)
We can pass a range to llvm::is_sorted.
2025-06-24 00:10:22 -07:00
Antonio Frighetto
1247fddf36 [SimplifyCFG] Relax cttz cost check in simplifySwitchOfPowersOfTwo
We should be able to allow `simplifySwitchOfPowersOfTwo` transform
to take place, as, on recent X86 targets, the weighted latency-size
appears to be 2. This favours computing trailing zeroes and indexing
into a smaller value table, over generating a jump table with an
indirect branch, which overall should be more efficient.
2025-06-24 09:06:18 +02:00
Matthias Springer
c5972da34a
[mlir][Transforms] Dialect Conversion: Simplify block-inline handling (#145308)
When a block is getting inlined, the destination block does not have to
be legalized. That's because the signature of the destination block does
not change by inlining.

This commit makes the implementation consistent with this comment:
```
  // If the pattern moved or created any blocks, make sure the types of block
  // arguments get legalized.
```
2025-06-24 08:52:13 +02:00
Fabian Ritter
3e1e368824
[AMDGPU][SDAG] Add tests for ISD::PTRADD DAG combines (#142738)
Pre-committing tests to show improvements in a follow-up PR with the
combines.
2025-06-24 08:43:54 +02:00
Feng Zou
b1dcf78378
[X86][APX] Fix issue of push2/pop2 instr with stack clash protection (#145303)
When -stack-clash-protection option is specified and APX push2pop2 is
enabled, there will be two calls to emitSPUpdate function which emits
two STACKALLOC_W_PROBING pseudo instructions. The pseudo instruction for
push2 padding is silently ignored which leads to the stack misaligned to
16 bytes and GP exception in runtime. Fixed by directly emitting "push
%rax" instruction for push2 padding, instead of calling emitSPUpdate.
There was a similar issue on https://reviews.llvm.org/D150033.
2025-06-24 14:22:14 +08:00
Durgadoss R
ef048471f7
[NVPTX][NFC] Rearrange the TMA-S2G intrinsics (#144903)
This patch moves the TMA S2G intrinsics into their own set of loops.
This is in preparation for adding im2colw/w128 modes support to
the G2S intrinsics (but the S2G ones do not support those modes).

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-06-24 11:47:21 +05:30