570302 Commits

Author SHA1 Message Date
Wenju He
e4245f2fc5
[cmake] forward LLVM_EXTERNAL_*_SOURCE_DIR to runtimes (#180399)
Allow runtime source directories to live outside the top-level tree by
honoring LLVM_EXTERNAL_*_SOURCE_DIR and propagating the values via
RUNTIMES_CMAKE_ARGS.
2026-02-24 18:29:55 +08:00
Simon Pilgrim
cfaa67bb62
[Thumb2] mve-vmovimm.ll - regenerate with missing check prefixes (#183019)
Add prefixes to discriminate between -mattr=+mve and -mattr=+mve.fp to
add missing check coverage

Fixes update_llc_test_checks warnings and simplifies regeneration for an
upcoming patch
2026-02-24 10:20:58 +00:00
Gergely Bálint
9d762ad279
[BOLT][BTI] Patch ignored functions in place when targeting them with indirect branches (#177165)
When applying BTI fixups to indirect branch targets, ignored functions
are
considered as a special case:
- these hold no instructions,
- have no CFG,
- and are not emitted in the new text section.

The solution is to patch the entry points in the original location.

If such a situation occurs in a binary, recompilation using the
-fpatchable-function-entry flag is required. This will place a nop at
all
function starts, which BOLT can use to patch the original section.

Without the extra nop, BOLT cannot safely patch the original .text
section.

An alternative solution could be to also ignore the function from which
the stub starts. This has not been tried as LongJmp pass - where most
stubs are inserted - is currently not equipped to ignore functions.

Testing: both the success and failure cases are covered with lit tests.
2026-02-24 11:09:42 +01:00
Hussam Alhassan
5c002d0ddd
[NFC][AArch64] AArch64ConditionOptimizer extract shared instruction finding logic (#182244)
Extract cmp/cond instruction finding logic from cross- and intra-block
paths into shared functions
2026-02-24 09:56:57 +00:00
Simon Pilgrim
7c5c58cdc6
[X86] getFauxShuffleMask - OR(SHUF(),SHUF()) - treat undemanded elements as undef (#182678)
We have to be careful when attempting to decode OR() patterns as
shuffles - we can't forward demanded undef elements in both sources as
an undef result as it can lead to infinite loops during widening
(#49393).

But if we don't demand the element in the first place (based off
demanded elts masks during recursive shuffle combines), then it doesn't
matter what the elements contain and we can treat it as a
SM_SentinelUndef shuffle element.

Noticed while working on #137422
2026-02-24 09:37:58 +00:00
Nikolas Klauser
12d8360727
[libc++] Add segmented iterator optimization to std::equal (#179242)
```
Benchmark                                                 97fa3e593693    a820f8f10736    Difference    % Difference
------------------------------------------------------  --------------  --------------  ------------  --------------
std::equal(deque<int>)_(it,_it,_it)/1024                        510.92           82.64       -428.27         -83.82%
std::equal(deque<int>)_(it,_it,_it)/1048576                  518795.61        87141.29    -431654.32         -83.20%
std::equal(deque<int>)_(it,_it,_it)/50                           29.24            6.77        -22.46         -76.84%
std::equal(deque<int>)_(it,_it,_it)/8                             4.20            3.71         -0.49         -11.61%
std::equal(deque<int>)_(it,_it,_it)/8192                       3972.84          643.83      -3329.01         -83.79%
std::equal(deque<int>)_(it,_it,_it,_it)/1024                    417.45           81.52       -335.93         -80.47%
std::equal(deque<int>)_(it,_it,_it,_it)/1048576              539228.26        87480.92    -451747.34         -83.78%
std::equal(deque<int>)_(it,_it,_it,_it)/50                       22.25            7.25        -15.00         -67.41%
std::equal(deque<int>)_(it,_it,_it,_it)/8                         4.75            4.44         -0.31          -6.45%
std::equal(deque<int>)_(it,_it,_it,_it)/8192                   3259.01          641.31      -2617.70         -80.32%
std::equal(deque<int>)_(it,_it,_it,_it,_pred)/1024              532.68          327.58       -205.10         -38.50%
std::equal(deque<int>)_(it,_it,_it,_it,_pred)/1048576        600755.28       402988.04    -197767.24         -32.92%
std::equal(deque<int>)_(it,_it,_it,_it,_pred)/50                 27.26           25.29         -1.97          -7.22%
std::equal(deque<int>)_(it,_it,_it,_it,_pred)/8                   5.20            5.58          0.38           7.31%
std::equal(deque<int>)_(it,_it,_it,_it,_pred)/8192             4204.16         2847.30      -1356.86         -32.27%
std::equal(deque<int>)_(it,_it,_it,_pred)/1024                  531.32          329.03       -202.30         -38.07%
std::equal(deque<int>)_(it,_it,_it,_pred)/1048576            598948.55       403822.65    -195125.90         -32.58%
std::equal(deque<int>)_(it,_it,_it,_pred)/50                     26.28           16.18        -10.10         -38.43%
std::equal(deque<int>)_(it,_it,_it,_pred)/8                       4.44            3.70         -0.74         -16.67%
std::equal(deque<int>)_(it,_it,_it,_pred)/8192                 4184.03         2902.98      -1281.05         -30.62%
std::equal(list<int>)_(it,_it,_it)/1024                        1168.78         1168.51         -0.27          -0.02%
std::equal(list<int>)_(it,_it,_it)/1048576                  1283003.12      1281885.44      -1117.69          -0.09%
std::equal(list<int>)_(it,_it,_it)/50                            60.19           44.38        -15.81         -26.27%
std::equal(list<int>)_(it,_it,_it)/8                              3.07            3.07          0.00           0.15%
std::equal(list<int>)_(it,_it,_it)/8192                       10367.41        11075.24        707.83           6.83%
std::equal(list<int>)_(it,_it,_it,_it)/1024                     728.32          734.18          5.86           0.80%
std::equal(list<int>)_(it,_it,_it,_it)/1048576               951276.58       953928.39       2651.81           0.28%
std::equal(list<int>)_(it,_it,_it,_it)/50                        31.86           32.32          0.46           1.44%
std::equal(list<int>)_(it,_it,_it,_it)/8                          3.11            3.10         -0.01          -0.34%
std::equal(list<int>)_(it,_it,_it,_it)/8192                   14940.68        16058.91       1118.22           7.48%
std::equal(list<int>)_(it,_it,_it,_it,_pred)/1024               803.49          813.53         10.05           1.25%
std::equal(list<int>)_(it,_it,_it,_it,_pred)/1048576        1012708.15      1026207.55      13499.40           1.33%
std::equal(list<int>)_(it,_it,_it,_it,_pred)/50                  38.68           39.24          0.56           1.46%
std::equal(list<int>)_(it,_it,_it,_it,_pred)/8                    4.07            4.07         -0.00          -0.08%
std::equal(list<int>)_(it,_it,_it,_it,_pred)/8192             16632.08        18073.63       1441.55           8.67%
std::equal(list<int>)_(it,_it,_it,_pred)/1024                  1162.99         1162.48         -0.51          -0.04%
std::equal(list<int>)_(it,_it,_it,_pred)/1048576            1291522.30      1303819.01      12296.72           0.95%
std::equal(list<int>)_(it,_it,_it,_pred)/50                      45.73           46.32          0.59           1.29%
std::equal(list<int>)_(it,_it,_it,_pred)/8                        4.35            4.40          0.04           1.03%
std::equal(list<int>)_(it,_it,_it,_pred)/8192                 15035.93        14598.06       -437.87          -2.91%
std::equal(vector<bool>)_(aligned)/1024                           0.22            0.22          0.00           0.04%
std::equal(vector<bool>)_(aligned)/1048576                        0.22            0.22          0.00           0.12%
std::equal(vector<bool>)_(aligned)/50                             0.22            0.22          0.00           0.02%
std::equal(vector<bool>)_(aligned)/8                              0.22            0.22          0.00           0.03%
std::equal(vector<bool>)_(aligned)/8192                           0.22            0.22          0.00           0.05%
std::equal(vector<bool>)_(unaligned)/1024                         6.34            6.39          0.04           0.70%
std::equal(vector<bool>)_(unaligned)/1048576                   6809.31         6833.52         24.21           0.36%
std::equal(vector<bool>)_(unaligned)/50                           1.11            0.92         -0.19         -17.55%
std::equal(vector<bool>)_(unaligned)/8                            1.11            1.05         -0.06          -5.29%
std::equal(vector<bool>)_(unaligned)/8192                        59.27           59.92          0.65           1.10%
std::equal(vector<int>)_(it,_it,_it)/1024                        80.39           80.59          0.20           0.25%
std::equal(vector<int>)_(it,_it,_it)/1048576                  72546.36        73803.43       1257.07           1.73%
std::equal(vector<int>)_(it,_it,_it)/50                           3.92            4.43          0.51          12.92%
std::equal(vector<int>)_(it,_it,_it)/8                            1.46            1.47          0.01           0.75%
std::equal(vector<int>)_(it,_it,_it)/8192                       553.63          559.59          5.95           1.07%
std::equal(vector<int>)_(it,_it,_it,_it)/1024                    78.69           78.37         -0.32          -0.40%
std::equal(vector<int>)_(it,_it,_it,_it)/1048576              72238.51        73582.13       1343.62           1.86%
std::equal(vector<int>)_(it,_it,_it,_it)/50                       4.18            4.62          0.44          10.52%
std::equal(vector<int>)_(it,_it,_it,_it)/8                        1.68            1.66         -0.01          -0.87%
std::equal(vector<int>)_(it,_it,_it,_it)/8192                   549.35          555.24          5.89           1.07%
std::equal(vector<int>)_(it,_it,_it,_it,_pred)/1024             361.08          363.32          2.24           0.62%
std::equal(vector<int>)_(it,_it,_it,_it,_pred)/1048576       391367.63       394209.88       2842.25           0.73%
std::equal(vector<int>)_(it,_it,_it,_it,_pred)/50                15.24           15.83          0.59           3.87%
std::equal(vector<int>)_(it,_it,_it,_it,_pred)/8                  3.18            3.19          0.01           0.40%
std::equal(vector<int>)_(it,_it,_it,_it,_pred)/8192            2992.57         3026.90         34.32           1.15%
std::equal(vector<int>)_(it,_it,_it,_pred)/1024                 362.45          365.46          3.01           0.83%
std::equal(vector<int>)_(it,_it,_it,_pred)/1048576           399898.16       402718.88       2820.72           0.71%
std::equal(vector<int>)_(it,_it,_it,_pred)/50                    14.79           14.79         -0.01          -0.04%
std::equal(vector<int>)_(it,_it,_it,_pred)/8                      2.45            2.52          0.06           2.64%
std::equal(vector<int>)_(it,_it,_it,_pred)/8192                3062.16         3088.11         25.95           0.85%
Geomean                                                         253.49          200.79        -52.70         -20.79%
```
2026-02-24 10:23:15 +01:00
Michael Buch
da1e0d9fcf
[lldb][TypeSystemClang] Unconditionally set access control to AS_public (#182956)
This patch removes all our manual adjustments to the access control
specifiers of Clang decls we create from DWARF.

This has led to occasional subtle bugs in the past (the latest being
https://github.com/llvm/llvm-project/issues/171913) and it's ultimately
redundant because Clang already has provisions for LLDB to bypass access
control for C++ and Objective-C. Access control doesn't affect name
lookup so really we're doing a lot of bookkeeping for not much benefit.
The only "feature" that relied on this was that `type lookup <foo>`
would print the access specifier in the output structure layout. I'm not
convinced that's worth keeping the infrastructure in place for (but
happy to be convinced otherwise).

I'd rather lean fully into the Clang access control bypass instead.

Note, i still kept the `AccessType` parameters to the various
`TypeSystemClang` APIs to reduce the size of the diff. A follow-up NFC
change will remove those parameters and adjust all the call-sites.
2026-02-24 09:19:41 +00:00
Simon Tatham
a84ee1416b
[compiler-rt][ARM] Enable strict mode in divsf3/mulsf3 tests (#179918)
Commit 5efce7392f3f6cc added optimized AArch32 assembly versions of
mulsf3 and divsf3, with more thorough tests. The new tests included test
cases specific to Arm's particular NaN handling rules, which are
disabled on most platforms, but were intended to be enabled for Arm.

Unfortunately, they were not enabled under any circumstances, because I
made a mistake in `test/builtins/CMakeLists.txt`: the command-line `-D`
option that should have enabled them was added to the cflags list too
early, before the list was reinitialized from scratch. So it never ended
up on the command line.

Also, the test file mulsf3.S only even _tried_ to enable strict mode in
Thumb1, even though the Arm/Thumb2 implementation would also have met
its requirements.

Because the strict-mode tests weren't enabled, I didn't notice that they
would also have failed absolutely everything, because they checked the
results using the wrong sense of comparison! I used `==`, but that
comparison was supposed to be a drop-in replacement for
`compareResultF`, which returns zero for equality. Changed the tests to
use `!=`.

Finally, I've also added a macro to each test so that it records the
source line number of each failing test case. That way, when a test
fails, you can find it in the test source more easily, without having to
search for the hex numbers mentioned in the failure message.
2026-02-24 09:08:06 +00:00
Heejin Ahn
1bc2446c78
[WebAssembly] Use generic CPU by default in llvm-mc (#181460)
Other tools, such as `llc`, use `generic` cpu by default, if you don't
give any `-mcpu`:

75f738b0b2/llvm/lib/Target/WebAssembly/WebAssemblySubtarget.cpp (L38-L39)

But `llvm-mc` didn't do that. This makes `generic` also the default CPU
for `llvm-mc`.
2026-02-24 00:39:14 -08:00
Han-Chung Wang
f80205becd
[mlir][SPIRV] Add sub-element-byte lowering support for atomic_rmw ori/andi ops (#179831)
When the memref element type (e.g., i8) is narrower than the SPIR-V
storage type (e.g., i32 on Vulkan), ori and andi can be lowered with a
single wide atomic instruction because OR-with-0 and AND-with-1 are
identity operations.

The revision follows `IntStoreOpPattern` to compute offsets/sizes via
`adjustAccessChainForBitwidth` method and `getOffsetForBitwidth` method.
Additionally, it handles the returned value (which is the old value by
definition), which is different from `IntStoreOpPattern`. E.g., the
check of `spirv::Capability::Kernel` is the same.


07ebb18e07/mlir/lib/Conversion/MemRefToSPIRV/MemRefToSPIRV.cpp (L847-L867)

There are refactoring opportunities and it is not performed within the
revision because the current implementation is already complicated. The
refactoring can be happenned in a follow-up with its own patch, so
reviewing this revision is easier.

Signed-off-by: hanhanW <hanhan0912@gmail.com>

---------

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2026-02-24 00:20:10 -08:00
William Tran-Viet
33025a267d
[libc++] Make __wrap_iter comparison operators hidden friends (#179590)
Prelude to #179389
2026-02-24 09:18:15 +01:00
Ravil Dorozhinskii
14ba1ece89
[WIP][ROCDL] Added SWMMAC ops for gfx12 and gfx1250 (#181943)
This PR adds SWMMAC ops for gfx12 and gfx1250 arch.
2026-02-24 09:17:17 +01:00
Matt Arsenault
ea6fee062d SafeStack: Add missing lit.local.cfg to SPARC subdirectory
Forgot to commit file in 8604b52e380fb37a3599539b1d87a68666ab6ed5
2026-02-24 09:06:00 +01:00
Benjamin Maxwell
d59d00a700
[AArch64] Match CTPOP combine without zero extend (#182859)
Helps improve: https://github.com/llvm/llvm-project/issues/182625.

This does not fully solve the issues with using `ctpop` as the vector
type chosen for the reduction is not ideal in all cases. This results in
extra extends, which can be seen in a few test cases.
2026-02-24 08:01:06 +00:00
Mariusz Sikora
610b40706f
[AMDGPU] Add VOP2 to gfx13 (#182812)
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2026-02-24 08:42:49 +01:00
Mariusz Sikora
1911488ca3
[AMDGPU] Add VOPD to gfx13 (#182815)
Co-authored-by: Jay Foad <jay.foad@amd.com>
2026-02-24 08:42:03 +01:00
Matt Arsenault
9c08759268
Reapply "RuntimeLibcalls: Fix adding __safestack_pointer_address by default" (#182949) (#183005)
This reverts commit 6d37110e091569509f54e2b1f3ef35e8a50e5b70.

Now with aarch64 test.
2026-02-24 07:41:39 +00:00
Fangrui Song
0b8bb80e27
[MC] Fix crash in x=0; .section x (#183001)
When an equated symbol (e.g. `x=0`) is followed by `.section x`,
getOrCreateSectionSymbol reports an "invalid symbol redefinition"
error but continues to reuse the equated symbol as a section symbol.
This causes an assertion failure in MCObjectStreamer::changeSection
when `setFragment` is called on the equated symbol.

Fix this by clearning `Sym`.
2026-02-24 06:49:53 +00:00
Jason Molenda
3f024d0835
[lldb] A few small code modernizations and cleanups [NFC] (#182656)
I was reading through ObjectContainerBSDArchive and came across some
dead method decls, a less-than-completely-clear `shared_ptr` typedef in
`ObjectContainerBSDArchive::Archive` for a shared_ptr<Archive> which was
a little unclear when reading a decl like `shared_ptr archive_sp;` for a
local variable.
2026-02-23 22:03:40 -08:00
Karthika Devi C
6654737d9a
[AArch64] Optimize 64-bit constant vector builds (#177076)
This patch optimizes the creation of constant 64-bit vectors (e.g.,
v2i32, v4i16) by avoiding expensive loads from the constant pool. The
optimization works by packing the constant vector elements into a single
i64 immediate and bitcasting the result to the target vector type. This
replaces a memory access with more efficient immediate materialization.
To ensure this transformation is efficient, a check is performed to
verify that the immediate can be generated in two or fewer mov
instructions. If it requires more, the compiler falls back to using the
constant pool.
The optimization is disabled for bigendian targets for now.
2026-02-24 11:12:04 +05:30
Jonas Devlieghere
863813cace
[lldb] Merge interfaces into lldbPluginScriptInterpreterPython (NFC) (#182962)
Make the interfaces part of lldbPluginScriptInterpreterPython instead of
putting them into their own static library. This avoids the need for an
extra static archive and more importantly a bunch of code duplication
between the two CMakeLists.txt.
2026-02-23 21:33:38 -08:00
Rana Pratap Reddy
bf3ac05ed2
[Clang][AMDGPU] Change __fp16 to _Float16 in GFX1250 CVT builtin definitions (#182893)
Change the type signature `gfx1250 cvt` builtins from `__fp16` to
`_Float16` in the tablegen builtin definitions.
2026-02-24 10:46:12 +05:30
Shivam Kunwar
a96daba840
[DWARFLinker] Fix buildbot crash: NewUnit can be null during garbage (#182993)
The assert added in
[0ab1d23fbfa2ae0ba14315cb11678d2289510f66](0ab1d23fbf)
is incorrect, NewUnit is legitimately null for compile units that are
skipped during garbage collection (e.g. dwarf5-macro.test). Revert to
the original null check.
2026-02-24 10:26:36 +05:30
Akimasa Watanuki
762ad00007
[mlir][gpu] Validate gpu-module-to-binary format (#182842)
`GpuModuleToBinaryPass::runOnOperation` now treats an unsupported
`format` value as a pass failure after emitting `"Invalid format
specified."`.

Add a regression test in
`mlir/test/Dialect/GPU/module-to-binary-invalid-format.mlir`.

Fix: https://github.com/llvm/llvm-project/issues/77052
Fix: https://github.com/llvm/llvm-project/issues/116344
Fix: https://github.com/llvm/llvm-project/issues/116346
Fix: https://github.com/llvm/llvm-project/issues/116352
2026-02-24 13:54:59 +09:00
Brandon Wu
03bb370602
[RISCV][llvm] Rename zvqdotq to zvdot4a8i (#179393)
The renaming PR is here:
https://github.com/riscv/riscv-isa-manual/pull/2576
Note that this also update the version number.
2026-02-24 04:44:07 +00:00
Timm Baeder
d44e957d14
[clang][AST] Make ASTContext::InterpContext mutable (#182884)
Do the `const_cast` only once, in `ASTContext::getInterpContext()`.
2026-02-24 05:37:41 +01:00
Shivam Kunwar
0ab1d23fbf
[DWARFLinker] Use DIEEntry for backward ref_addr references (#181881)
The classic DWARF linker avoids `DIEEntry` for `DW_FORM_ref_addr`
references, using raw `DIEInteger` values with manual offset computation
instead. A stale FIXME explains this was because "the implementation
calls back to DwarfDebug to find the unit offset", but this is no longer
true. `DIEEntry` resolves offsets via
`DIEUnit::getDebugSectionOffset()`, which has no `DwarfDebug`
dependency.


And the real constraint is that forward references may point to
placeholder `DIEs` that never get adopted into a unit tree (due toODR
pruning), so `DIEEntry` cannot resolve them(a test failed during
refactoring this). However, backward references are safe, the target DIE
is already cloned and parented in a unit tree.
2026-02-24 10:00:57 +05:30
Alexis Engelke
3de98281d4
[MLIR][CMake] Disable PCH reuse for C API libraries (#182862)
C API libraries override the symbol visibility default, which is incompatible with PCH.
2026-02-23 20:16:08 -08:00
Congcong Cai
09ab9a16e2
[MemRefToEmitC] fix typo (#182991) 2026-02-24 04:08:41 +00:00
modiking
d5bf514200
[NVPTX] Scalarize v2f32 instructions if input operand guarantees need for register coalescing (#180113)
The support of f32 packed instructions in #126337 revealed performance
regressions on certain kernels. In one case, the cause comes from
loading a v4f32 from shared memory but then accessing them as {r0, r2}
and {r1, r3} from the full load of {r0, r1, r2, r3}.

This access pattern guarantees the registers requires a coalescing
operation which increases register pressure and degrades performance.
The fix here is to identify if we can prove that an v2f32 operand comes
from non-contiguous vector extracts and if so scalarizes the operation
so the coalescing operation is no longer needed.

I've found that ptxas can see through the extra unpacks/repacks of
contiguous registers this causes in MIR. However in the full test case
the packing of the final scalar->vector results does generate additional
costs especially since the only users unpack them. An additional MIR
pass is possible to catch the case

Assisted-by: Cursor / claude-4.6-opus-high

---------

Co-authored-by: Princeton Ferro <princetonferro@gmail.com>
2026-02-23 19:44:05 -08:00
Aiden Grossman
9b4c99a1e4
[AsmPrinter] Use default capture for assertion only lambda (#182986)
Otherwise we get an unused variable warning/error in non-assertion
builds.
2026-02-24 03:25:37 +00:00
Yaxun (Sam) Liu
1b47242dcd
[CHR] Skip regions containing convergent calls (#180882)
CHR (Control Height Reduction) merges multiple biased branches into a
single speculative check, cloning the region into hot/cold paths. On
GPU targets, the merged branch may be divergent (evaluated per-thread),
splitting the wavefront: some threads take the hot path, others the
cold path.

A convergent call like ds_bpermute (a cross-lane operation on AMDGPU)
requires a specific set of threads to be active — when thread X reads
from thread Y, thread Y must be active and participating in the same
call. After CHR cloning, thread Y may have gone to the cold path while
thread X is on the hot path, so the hot-path ds_bpermute reads a stale
register value from thread Y instead of the intended value.

This caused a miscompilation in rocPRIM's lookback scan: CHR duplicated
a region containing ds_bpermute, and the hot-path copy executed with a
different set of active threads, reading incorrect cross-lane data and
causing a memory access fault.

The fix skips any region containing convergent or noduplicate calls,
following the same pattern as SimplifyCFG's block-duplication guard.
2026-02-23 21:47:43 -05:00
Twice
3b2c1db870
[MLIR][Python] Support type definitions in Python-defined dialects (#182805)
In this PR, we added basic support of type definitions in Python-defined
dialects, including:
- IRDL codegen for type definitions
- Type builders like `MyType.get(..)` and type parameter accessors (e.g.
`my_type.param1`)
- Use Python-defined types in Python-defined oeprations

```python
class TestType(Dialect, name="ext_type"):
    pass

class Array(TestType.Type, name="array"):
    elem_type: IntegerType[32] | IntegerType[64]
    length: IntegerAttr

class MakeArrayOp(TestType.Operation, name="make_array"):
    arr: Result[Array]

class MakeArray3Op(TestType.Operation, name="make_array3"):
    arr: Result[Array[IntegerType[32], IntegerAttr[IntegerType[32], 3]]]
```
2026-02-24 10:34:58 +08:00
hanbeom
ccfd59a03e
[NFC][WebAssembly] Expanding load-ext testcases for the MVP CPU target (#182864)
Some features tested in load-ext require sign-ext. 
To test this, add tests targeting the MVP CPU.
2026-02-24 11:30:51 +09:00
Fangrui Song
13838efa3c
[ELF] Adjust allowed dynamic relocation types for x86-64 (#182905)
First, disallow R_X86_64_PC64 - generally only absolute relocations are
allowed in getDynRel. glibc and musl don't support R_X86_64_PC64 as
dynamic relocations.

Second, support R_X86_64_32 as dynamic relocation for the ILP32 ABI
(x32). GNU ld's behavior looks like:

- R_X86_64_32 => R_X86_64_RELATIVE
- R_X86_64_64 with addend 0 => R_X86_64_RELATIVE
- R_X86_64_64 with non-zero addend => R_X86_64_RELATIVE64 (unsupported
  by musl; compilers do not generate such constructs to the best of my
  knowledge)

For now we require R_X86_64_64 to be resolved at link-time for x32.

Fix #140465
2026-02-23 18:26:56 -08:00
Mohamed Emad
05a039489d
[libc][math] Refactor bf16mul family to header-only (#182018)
Refactors the bf16mul math family to be header-only.

Closes https://github.com/llvm/llvm-project/issues/182017

Target Functions:
  - bf16mul
  - bf16mulf
  - bf16mulf128
  - bf16mull
2026-02-24 02:24:01 +00:00
Uday Bondhugula
b311c02c2c
[MLIR][Affine] Fix assert in slice compute cost (#182712)
Fixes https://github.com/llvm/llvm-project/issues/180029.
2026-02-24 07:48:37 +05:30
Iñaki V Arrechea
a17a305f05
[LLVM] Metric added - largest number of basic blocks in a single func… (#182970)
This metric gets the size of the biggest count of basic blocks in a
single function.
2026-02-23 18:06:47 -08:00
Aiden Grossman
6b63c59a58
[NewPM][X86] Port AsmPrinter to NewPM
This patch makes AsmPrinter work with the NewPM. We essentially create
three new passes that wrap different parts of AsmPrinter so that we can
separate out doIntialization/doFinalization without needing to
materialize all MachineFunctions at the same time. This has two main
drawbacks for now:

1. We do not transfer any state between the three new AsmPrinter passes.
   This means that debuginfo/CFI currently does not work. This will be
   fixed in future passes by moving this state to MachineModuleInfo.
2. We probably incur some overhead by needing to setup up analysis
   callbacks for every MF rather than just per module. This should not
   be large, and can be optimized in the future on top of this if
   needed.
3. This solution is not really clean. However, a lot of cleanup is going
   to be difficult to do while supporting two pass managers. Once we
   remove LegacyPM support, we can make the code much cleaner and better
   enforce invariants like a lack of state between
   doInitialization/runOnMachineFunction/doFinalization.

Reviewers: arsenm, aeubanks, paperchalice

Pull Request: https://github.com/llvm/llvm-project/pull/182797
2026-02-23 17:28:15 -08:00
Aiden Grossman
25f69d7a3f
[NFCi][NewPM][x86] Use callbacks to get analyses in AsmPrinter
This allows for overriding these call backs when using the NewPM which
has different methods for obtaining analysis results.

Reviewers: RKSimon, arsenm, phoebewang, mingmingl-llvm, aeubanks

Pull Request: https://github.com/llvm/llvm-project/pull/182796
2026-02-23 17:24:39 -08:00
Aiden Grossman
abc443ba0a
[CodeGen][NewPM] Adjust pipeline for AsmPrinter
AsmPrinter needs to be split into three passes (begin, per MF, end) to
avoid the need to materialize all machine functions at the same time.
Update the CodeGenPassBuilder hooks for this.

Reviewers: aeubanks, paperchalice, arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/182795
2026-02-23 17:23:25 -08:00
Aiden Grossman
683d15f810
[CodeGen][NewPM] Plumb MCContext through buildCodeGenPipeline
Otherwise we cannot create an MCStreamer without getting MMI, which we
cannot do until we have started running AsmPrinter without also plumbing
MMI through CodeGenPassBuilder.

Reviewers: arsenm, paperchalice, aeubanks

Pull Request: https://github.com/llvm/llvm-project/pull/182794
2026-02-23 17:21:32 -08:00
Aiden Grossman
757066c95e
[NFCi][AsmPrinter] Refactor getting analyses to callbacks
As part of making AsmPrinter work with the new pass manager, we need to
be able to override how we get analyses. This patch does that by
refactoring getting all analyses/other related functionality to
callbacks that are set by default but can be overriden later (like by a
NewPM wrapper pass).

Reviewers: aeubanks

Pull Request: https://github.com/llvm/llvm-project/pull/182793
2026-02-23 17:17:56 -08:00
Paul Kirth
19daed352f
Revert "[ASan][Fuchsia] Have Fuchsia use a dynamic shadow start" (#182972)
Reverts llvm/llvm-project#180880

This is breaking Fuchsia's CI. something in the CMake needs to be
adjusted. Reverting on the author's request.
2026-02-24 01:00:03 +00:00
Joseph Huber
23302678e8
[HIP] Move HIP to the new driver by default (#123359)
Summary:
This patch matches CUDA, moving the HIP compilation jobs to the new
driver by default. The old behavior will return with
`--no-offload-new-driver`. The main difference is that objects compiled
with the old driver are no longer compatible and will need to be
recompiled or the old driver used.
2026-02-23 18:23:56 -06:00
Iñaki V Arrechea
a8f3c97d3a
Implemented metric that gets biggest function's size (#182632)
This metric gets the size of the biggest function.
2026-02-23 16:06:53 -08:00
PiJoules
9146da3a7b
[ASan][Fuchsia] Have Fuchsia use a dynamic shadow start (#180880)
The dynamic shadow global is still set to zero, but this will change in the future.
2026-02-23 15:49:05 -08:00
Deric C.
4f59da55d7
[HLSL][Matrix] EmitFromMemory when emitting load of vector and matrix element LValues (#178315)
Fixes #177712

The MatrixElt and VectorElt cases of `EmitLoadOfLValue` did not convert
the scalar value from its load/store type into its primary IR type like
the other cases do, which caused issues with HLSL in particular which
requires bools to be converted to and from i32 and i1 forms for its
load/store and primary IR types respectively.

This PR fixes the issue by applying `EmitFromMemory` to the loaded
scalar.
2026-02-23 15:45:35 -08:00
Hansang Bae
a347e1298c
[Offload] Enable memory usage printing with alloc debug type (#182938) 2026-02-23 17:19:41 -06:00
Florian Hahn
6b352aa8ea
Revert "[VPlan] Add simple driver option to run some individual transforms. (#178522)"
This reverts commit 3df1c6f88bfbbd76d9256c55358bb75e02e33779.

Causes build-failures without assertions
https://lab.llvm.org/buildbot/#/builders/159/builds/41683
2026-02-23 22:55:42 +00:00