549622 Commits

Author SHA1 Message Date
Simon Pilgrim
2a79ef66eb
[AMDGPU] canCreateUndefOrPoisonForTargetNode - BFE_I32/U32 can't create poison/undef (#154932)
Add AMDGPUTargetLowering::canCreateUndefOrPoisonForTargetNode handler
and tag BFE_I32/U32 nodes as they can only propagate poison, not create
poison/undef.

Fighting some of the remaining regressions in #152107
2025-08-22 12:14:45 +00:00
Jungwook Park
b149fc7755
[mlir][scf] Quick fix to scf.execute_region no_inline (#154931)
Asm printer should exclude `no_inline` attr during printing optional
attrs at the bottom.
2025-08-22 13:11:27 +01:00
Michael Halkenhäuser
7c1d2467f1
Reland: [OpenMP] Add ompTest library to OpenMP (#154786)
Reland of https://github.com/llvm/llvm-project/pull/147381

Added changes to fix observed BuildBot failures:
 * CMake version (reduced minimum to `3.20`, was: `3.22`)
 * GoogleTest linking (missing `./build/lib/libllvm_gtest.a`)
* Related header issue (missing `#include
"llvm/Support/raw_os_ostream.h"`)

Original message

Description
===========
OpenMP Tooling Interface Testing Library (ompTest) ompTest is a unit testing framework for testing OpenMP implementations. It offers a simple-to-use framework that allows a tester to check for OMPT events in addition to regular unit testing code, supported by linking against GoogleTest by default. It also facilitates writing concise tests while bridging the semantic gap between the unit under test and the OMPT-event testing.

Background
==========
This library has been developed to provide the means of testing OMPT implementations with reasonable effort. Especially, asynchronous or unordered events are supported and can be verified with ease, which may prove to be challenging with LIT-based tests. Additionally, since the assertions are part of the code being tested, ompTest can reference all corresponding variables during assertion.

Basic Usage
===========
OMPT event assertions are placed before the code, which shall be tested. These assertion can either be provided as one block or interleaved with the test code. There are two types of asserters: (1) sequenced "order-sensitive" and (2) set "unordered" assserters. Once the test is being run, the corresponding events are triggered by the OpenMP runtime and can be observed. Each of these observed events notifies asserters, which then determine if the test should pass or fail.

Example (partial, interleaved)
==============================
```c++
  int N = 100000;
  int a[N];
  int b[N];

  OMPT_ASSERT_SEQUENCE(Target, TARGET, BEGIN, 0);
  OMPT_ASSERT_SEQUENCE(TargetDataOp, ALLOC, N * sizeof(int)); // a ?
  OMPT_ASSERT_SEQUENCE(TargetDataOp, H2D, N * sizeof(int), &a);
  OMPT_ASSERT_SEQUENCE(TargetDataOp, ALLOC, N * sizeof(int)); // b ?
  OMPT_ASSERT_SEQUENCE(TargetDataOp, H2D, N * sizeof(int), &b);
  OMPT_ASSERT_SEQUENCE(TargetSubmit, 1);
  OMPT_ASSERT_SEQUENCE(TargetDataOp, D2H, N * sizeof(int), nullptr, &b);
  OMPT_ASSERT_SEQUENCE(TargetDataOp, D2H, N * sizeof(int), nullptr, &a);
  OMPT_ASSERT_SEQUENCE(TargetDataOp, DELETE);
  OMPT_ASSERT_SEQUENCE(TargetDataOp, DELETE);
  OMPT_ASSERT_SEQUENCE(Target, TARGET, END, 0);

#pragma omp target parallel for
  {
    for (int j = 0; j < N; j++)
      a[j] = b[j];
  }
```

References
==========
This work has been presented at SC'24 workshops, see: https://ieeexplore.ieee.org/document/10820689

Current State and Future Work
=============================
ompTest's development was mostly device-centric and aimed at OMPT device callbacks and device-side tracing. Consequentially, a substantial part of host-related events or features may not be supported in its current state. However, we are confident that the related functionality can be added and ompTest provides a general foundation for future OpenMP and especially OMPT testing. This PR will allow us to upstream the corresponding features, like OMPT device-side tracing in the future with significantly reduced risk of introducing regressions in the process.

Build
=====
ompTest is linked against LLVM's GoogleTest by default, but can also be built 'standalone'. Additionally, it comes with a set of unit tests, which in turn require GoogleTest (overriding a standalone build). The unit tests are added to the `check-openmp` target.

Use the following parameters to perform the corresponding build: 
`LIBOMPTEST_BUILD_STANDALONE` (Default: ${OPENMP_STANDALONE_BUILD})
`LIBOMPTEST_BUILD_UNITTESTS` (Default: OFF)

---------

Co-authored-by: Jan-Patrick Lehr <JanPatrick.Lehr@amd.com>
Co-authored-by: Joachim <protze@rz.rwth-aachen.de>
Co-authored-by: Joachim Jenke <jenke@itc.rwth-aachen.de>
2025-08-22 13:56:12 +02:00
Leandro Lacerda
15a192cde5
[libc] Enable double math functions on the GPU (#154857)
This patch adds the `acos` math function to the NVPTX build. It also
adds the `sincos` math function to the `math.h` header.
2025-08-22 06:52:13 -05:00
paperchalice
2014890c09
[SelectionDAG] Remove UnsafeFPMath in visitFP_ROUND (#154768)
Remove `UnsafeFPMath` in `visitFP_ROUND` part, it blocks some bugfixes
related to clang and the ultimate goal is to remove `resetTargetOptions`
method in `TargetMachine`, see FIXME in `resetTargetOptions`.
See also
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast

https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract

Now all UnsafeFPMath uses are eliminated in LLVMCodeGen
2025-08-22 19:46:33 +08:00
Simon Pilgrim
d8769bb5b7 [AMDGPU] bf16-conversions.ll - regenerate checks
Reduce diffs in #152107
2025-08-22 12:20:50 +01:00
Lang Hames
3292edb7b4
[orc-rt] Add C and C++ APIs for WrapperFunctionResult. (#154927)
orc_rt_WrapperFunctionResult is a byte-buffer with inline storage and a
builtin error state. It is intended as a general purpose return type for
functions that return a serialized result (e.g. for communication across
ABIs or via IPC/RPC).

orc_rt_WrapperFunctionResult contains a small amount of inline storage,
allowing it to avoid heap-allocation for small return types (e.g. bools,
chars, pointers).
2025-08-22 21:18:30 +10:00
Mehdi Amini
d2b810e24f [MLIR] Apply clang-tidy fixes for readability-identifier-naming in DataFlowFramework.cpp (NFC) 2025-08-22 04:12:50 -07:00
Mehdi Amini
a8aacb1b66 [MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in toy Tutorial (NFC) 2025-08-22 04:12:50 -07:00
Mehdi Amini
d2dee948a4 [MLIR] Improve clang-tidy script
This just helping to better keep track of the failures.
2025-08-22 04:12:50 -07:00
Jacek Caban
a6fcd1a663
[LLD][COFF] Set isUsedInRegularObj for target symbols in resolveAlternateNames (#154837)
Fixes: #154595

Prior to commit bbc8346e6bb543b0a87f52114fed7d766446bee1, this flag was
set by `insert()` from `addUndefined()`. Set it explicitly now.
2025-08-22 13:05:19 +02:00
Ramkumar Ramachandra
2975e674ec
[VPlan] Improve style in match_combine_or (NFC) (#154793) 2025-08-22 12:01:42 +01:00
Hans Wennborg
ee5367bedb Revert "[compiler-rt]: fix CodeQL format-string warnings via explicit casts (#153843)"
It broke the build:

compiler-rt/lib/hwasan/hwasan_thread.cpp:177:11: error: unknown type name 'ssize_t'; did you mean 'size_t'?
   177 |          (ssize_t)unique_id_, (void *)this, (void *)stack_bottom(),
       |           ^~~~~~~
       |           size_t

> This change addresses CodeQL format-string warnings across multiple
> sanitizer libraries by adding explicit casts to ensure that printf-style
> format specifiers match the actual argument types.
>
> Key updates:
> - Cast pointer arguments to (void*) when used with %p.
> - Use appropriate integer types and specifiers (e.g., size_t -> %zu,
> ssize_t -> %zd) to avoid mismatches.
> - Fix format specifier mismatches across xray, memprof, lsan, hwasan,
> dfsan.
>
> These changes are no-ops at runtime but improve type safety, silence
> static analysis warnings, and reduce the risk of UB in variadic calls.

This reverts commit d3d5751a39452327690b4e011a23de8327f02e86.
2025-08-22 12:50:53 +02:00
Lang Hames
d5af08a221
[orc-rt] Add inline specifier to orc_rt::make_error. (#154922)
Prevents linker errors for duplicate definitions when make_error is used
from more than one file.
2025-08-22 20:37:10 +10:00
nerix
d6fcaef281
[LLDB][Value] Require type size when reading a scalar (#153386)
When reading a value as a scalar, the type size is required. It's
returned as a `std::optional`. This optional isn't checked for scalar
values, where it is unconditionally accessed.

This came up in the
[Shell/Process/Windows/msstl_smoke.cpp](4e10b62442/lldb/test/Shell/Process/Windows/msstl_smoke.cpp)
test. There, LLDB breaks at the function entry, so all locals aren't
initialized yet. Most values will contain garbage. The [`std::list`
synthetic
provider](4e10b62442/lldb/source/Plugins/Language/CPlusPlus/GenericList.cpp (L517))
tries to read the value using `GetData`. However, in
[`ValueObject::GetData`](4e10b62442/lldb/source/ValueObject/ValueObject.cpp (L766)),
[`ValueObjectChild::UpdateValue`](88c993fbc5/lldb/source/ValueObject/ValueObjectChild.cpp (L102))
fails because the parent already failed to read its data, so `m_value`
won't have a compiler type, thus the size can't be read.
2025-08-22 12:26:03 +02:00
Ross Brunton
17dbb92612
[Offload][NFC] Use tablegen names rather than name parameter for API (#154736) 2025-08-22 11:13:57 +01:00
tangaac
8439777131
[LoongArch] Pre-commit tests for vecreduce_and/or/... (#154879) 2025-08-22 17:52:43 +08:00
YafetBeyene
fda24dbc16
[BOLT] Add dump-dot-func option for selective function CFG dumping (#153007)
## Change:
* Added `--dump-dot-func` command-line option that allows users to dump
CFGs only for specific functions instead of dumping all functions (the
current only available option being `--dump-dot-all`)

## Usage:
* Users can now specify function names or regex patterns (e.g.,
`--dump-dot-func=main,helper` or `--dump-dot-func="init.*`") to generate
.dot files only for functions of interest
* Aims to save time when analysing specific functions in large binaries
(e.g., only dumping graphs for performance-critical functions identified
through profiling) and we can now avoid reduce output clutter from
generating thousands of unnecessary .dot files when analysing large
binaries

## Testing
The introduced test `dump-dot-func.test` confirms the new option does
the following:

- [x] 1. `dump-dot-func` can correctly filter a specified functions
- [x] 2. Can achieve the above with regexes
- [x] 3. Can do 1. with a list of functions
- [x] No option specified creates no dot files
- [x] Passing in a non-existent function generates no dumping messages
- [x] `dump-dot-all` continues to work as expected
2025-08-22 10:51:09 +01:00
Ivan Kosarev
7594b4b8d1 [AMDGPU] Fix compilation errors. 2025-08-22 10:30:43 +01:00
Abhinav Garg
bfc16510c7
[AMDGPU] Regenerate test case to cover gfx10 check lines. (#154909)
Check lines for GFX10 is missing in this test case. Regenerate to fix
test case.
2025-08-22 15:00:28 +05:30
Nikolas Klauser
fd52f4d232
[libc++][NFC] Simplify the special member functions of the node containers (#154707)
This patch does two things:
- Remove exception specifications of `= default`ed special member
functions
- `= default` special member functions

The first part is NFC because the explicit specification does exactly
the same as the implicit specification. The second is NFC because it
does exactly what the `= default`ed special member does.
2025-08-22 11:24:28 +02:00
Florian Hahn
8bc038daf2
[InstComb] Allow more user for (add (ptrtoint %B), %O) to GEP transform. (#153566)
Generalize the logic from
https://github.com/llvm/llvm-project/pull/153421 to support additional
cases where the pointer is only used as integer.

Alive2 Proof: https://alive2.llvm.org/ce/z/po58pP

This enables vectorizing std::find for some cases, if additional
assumptions are provided: https://godbolt.org/z/94oq3576E

Depends on https://github.com/llvm/llvm-project/pull/15342.

PR: https://github.com/llvm/llvm-project/pull/153566
2025-08-22 10:17:12 +01:00
Ivan Kosarev
faca8c9ed4
[AMDGPU][NFC] Only include CodeGenPassBuilder.h where needed. (#154769)
Saves around 125-210 MB of compilation memory usage per source for
roughly one third of our backend sources, ~60 MB on average.
2025-08-22 10:05:06 +01:00
Simon Pilgrim
1b4fe26343
[clang][x86] Add release note entries describing recent work to making SSE intrinsics generic and usable with constexpr (#154737)
I haven't created an exhaustive list of intrinsic changes, but I suppose I could if people see a strong need for it.
2025-08-22 09:59:10 +01:00
Baranov Victor
00a405f666
[clang-tidy][NFC] Fix "llvm-prefer-static-over-anonymous-namespace" warnings 1/N (#153885) 2025-08-22 11:54:17 +03:00
Hans Wennborg
8bf105cb01
[asan] Build the Windows runtime with /hotpatch (#154694)
Win/ASan relies on the runtime's functions being 16-byte aligned so it
can intercept them with hotpatching. This used to be true (but not
guaranteed) until #149444.

Passing /hotpatch will give us enough alignment and generally ensure
that the functions are hotpatchable.
2025-08-22 10:40:04 +02:00
Bjorn Pettersson
2d3167f8d8
[SeparateConstOffsetFromGEP] Avoid miscompiles related to trunc nuw/nsw (#154582)
Drop poison generating flags on trunc when distributing trunc over
add/sub/or. We need to do this since for example
(add (trunc nuw A), (trunc nuw B)) is more poisonous than
(trunc nuw (add A, B))).

In some situations it is pessimistic to drop the flags. Such as
if the add in the example above also has the nuw flag. For now we
keep it simple and always drop the flags.

Worth mentioning is that we drop the flags when cloning
instructions and rebuilding the chain. This is done after the
"allowsPreservingNUW" checks in ConstantOffsetExtractor::Extract.
So we still take the "trunc nuw" into consideration when determining
if nuw can be preserved in the gep (which should be ok since that
check also require that all the involved binary operations has nuw).

Fixes #154116
2025-08-22 10:27:57 +02:00
Bjorn Pettersson
4ff7ac2330
[SeparateConstOffsetFromGEP] Add test case with trunc nuw/nsw showing miscompile
Pre commit a test case for issue #154116. When redistributing
trunc over add/sub/or we may need to drop poison generating flags
from the trunc.
2025-08-22 10:26:09 +02:00
Simon Pilgrim
8d7df8bba1
[X86] Allow AVX2 per-element shift intrinsics to be used in constexpr (#154780)
This handles constant folding for the AVX2 per-element shift intrinsics, which handle out of bounds shift amounts (logical result = 0, arithmetic result = signbit splat)

AVX512 intrinsics will follow in follow up patches

First stage of #154287
2025-08-22 09:24:24 +01:00
Pierre van Houtryve
4ab5efd48d
[AMDGPU][gfx1250] Add memory legalizer tests (NFC) (#154725) 2025-08-22 10:14:09 +02:00
Fangrui Song
f1aee598e7 ARM: Remove unneeded ARM::fixup_arm_thumb_bl special case
This is a weird special case added in 2015, simplifying an even older
condition. It is a no-op for ELF (isExternal is always false) and seems
unneeded for non-ELF.
2025-08-22 01:08:33 -07:00
LLVM GN Syncbot
2a59400003 [gn build] Port 2b8e80694263 2025-08-22 08:03:17 +00:00
Muhammad Omair Javaid
2b8e806942 Revert "[lldb-dap] Add module symbol table viewer to VS Code extension #140626 (#153836)"
This reverts commit 8b64cd8be29da9ea74db5a1a21f7cd6e75f9e9d8.

This breaks lldb-aarch64-* bots causing a crash in lldb-dap while
running test TestDAP_moduleSymbols.py

https://lab.llvm.org/buildbot/#/builders/59/builds/22959
https://lab.llvm.org/buildbot/#/builders/141/builds/10975
2025-08-22 13:02:52 +05:00
Zhaoxin Yang
149d9a38e1
[ELF][LoongArch] -r: Synthesize R_LARCH_ALIGN at input section start (#153935)
Similay to

94655dc8ae

The difference is that in LoongArch, the ALIGN is synthesized when the
alignment is >4, (instead of >=4), and the number of bytes inserted is
`sec->addralign - 4`.
2025-08-22 16:02:41 +08:00
Connector Switch
6560adb584
[flang] optimize atand/atan2d precision (#154544)
Part of https://github.com/llvm/llvm-project/issues/150452.
2025-08-22 15:55:46 +08:00
Matt Arsenault
2b46f31ee3
AMDGPU: Sign extend immediates for 32-bit subregister extracts (#154870)
extractSubregFromImm previously would sign extend the 16-bit subregister
extracts, but not the 32-bit. We try to consistently store immediates
as sign extended, since not doing it can result in misreported
isInlineImmediate checks.
2025-08-22 16:50:36 +09:00
Stanislav Mekhanoshin
e0945dfa30
[AMDGPU] Add test to show failure with SRC_*_HI registers. NFC. (#154828)
Since src_{private|shared}_{base|limit} registers are added and
are not artifical compiler happily uses it when it can. In HW
these registers do not exist and the encoding belongs to their
64-bit super-register or 32-bit low register. Same instructions
will produce relocation if run through asm.
2025-08-22 00:50:25 -07:00
Jay Foad
cf5243619a
[AMDGPU] Common up two local memory size calculations. NFCI. (#154784) 2025-08-22 08:44:11 +01:00
serge-sans-paille
50f7c6a5b9
Default to GLIBCXX_USE_CXX11_ABI=ON
Because many of our bots actually don't run a listdc++ compatible with
_GLIBCXX_USE_CXX11_ABI=0. See
https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html for
details.

This is a follow-up to be179d069664ce03c485e49fa1f6e2ca3d6286fa related
to #154447.
2025-08-22 09:35:40 +02:00
paperchalice
945a186089
[DAGCombiner] Remove most UnsafeFPMath references (#146295)
This pull request removes all references to `UnsafeFPMath` in dag
combiner except FP_ROUND.
- Set fast math flags in some tests.
2025-08-22 15:27:25 +08:00
Fangrui Song
06ab660911 MCSymbol: Avoid isExported/setExported
The next change will move these methods from the base class.
2025-08-22 00:25:55 -07:00
Durgadoss R
36dc6146b8
[MLIR][NVVM] Update TMA tensor prefetch Op (#153464)
This patch updates the TMA Tensor prefetch Op
to add support for im2col_w/w128 and tile_gather4 modes.
This completes support for all modes available in Blackwell.
* lit tests are added for all possible combinations.
* The invalid tests are moved to a separate file with more coverage.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-08-22 12:51:29 +05:30
Djordje Todorovic
5050da7ba1
[RISCV] Add initial assembler/MC layer support for big-endian (#146534)
This patch adds basic assembler and MC layer infrastructure for
RISC-V big-endian targets (riscv32be/riscv64be):
      - Register big-endian targets in RISCVTargetMachine
      - Add big-endian data layout strings
      - Implement endianness-aware fixup application in assembler
        backend
      - Add byte swapping for data fixups on BE cores
      - Update MC layer components (AsmInfo, MCTargetDesc, Disassembler,
        AsmParser)
    
This provides the foundation for BE support but does not yet include:
      - Codegen patterns for BE
      - Load/store instruction handling
      - BE-specific subtarget features
2025-08-22 09:21:10 +02:00
Jason Molenda
a2f542b7a5
[lldb][debugserver] update --help to list all the options (#154853)
These are almost all for internal-developer-users only so "look at
debugserver.cpp" wasn't unreasonable, but we rarely add any new options
so a simple list of all recognized options isn't a burden to throw in
the help method.
2025-08-22 00:05:13 -07:00
Fangrui Song
04a3dd5a19 MCSymbol: Avoid isExported/setExported
The next change will move it to MCSymbol{COFF,MachO,Wasm} to make it
clear that other object file formats (e.g. ELF) do not use this field.
2025-08-22 00:00:29 -07:00
Fangrui Song
1def457228 MC: Avoid MCSymbol::isExported
This bit is only used by COFF/MachO. The upcoming change will move
isExported/setExported to MCSymbolCOFF/MCSymbolMachO.
2025-08-21 23:26:53 -07:00
Amit Kumar Pandey
d3d5751a39
[compiler-rt]: fix CodeQL format-string warnings via explicit casts (#153843)
This change addresses CodeQL format-string warnings across multiple
sanitizer libraries by adding explicit casts to ensure that printf-style
format specifiers match the actual argument types.

Key updates:
- Cast pointer arguments to (void*) when used with %p.
- Use appropriate integer types and specifiers (e.g., size_t -> %zu,
ssize_t -> %zd) to avoid mismatches.
- Fix format specifier mismatches across xray, memprof, lsan, hwasan,
dfsan.

These changes are no-ops at runtime but improve type safety, silence
static analysis warnings, and reduce the risk of UB in variadic calls.
2025-08-22 11:51:13 +05:30
Med Ismail Bennani
595148ab76
[lldb/crashlog] Avoid StopAtEntry when launch crashlog in interactive mode (#154651)
In 88f409194, we changed the way the crashlog scripted process was
launched since the previous approach required to parse the file twice,
by stopping at entry, setting the crashlog object in the middle of the
scripted process launch and resuming it.

Since then, we've introduced SBScriptObject which allows to pass any
arbitrary python object accross the SBAPI boundary to another scripted
affordance.

This patch make sure of that to include the parse crashlog object into
the scripted process launch info dictionary, which eliviates the need to
stop at entry.

Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>

Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
2025-08-21 23:16:45 -07:00
Brad Smith
0fff460592
[Driver] DragonFly does not support C11 threads (#154886) 2025-08-22 02:02:52 -04:00
Rajat Bajpai
b08b219650
[MLIR][NVVM] Add "blocksareclusters" kernel attribute support (#154519)
This change adds "nvvm.blocksareclusters" kernel attribute support in NVVM Dialect/MLIR.
2025-08-22 11:32:21 +05:30