560143 Commits

Author SHA1 Message Date
mitchell
150d9b7a77
[clang-tidy][NFC] Add clang-tidy formatting commit to .git-blame-ignore-revs (#167126)
Co-authored-by: Baranov Victor <bar.victor.2002@gmail.com>
2025-11-20 21:42:10 +08:00
Andrzej Warzyński
cfda27d0fb
[mlir][Vector] Add support for scalable vectors to ScanToArithOps (#123117)
Note, scalable reductions dims are left as a TODO.
2025-11-20 13:39:52 +00:00
Alexander Johnston
76f1949cfa
[HLSL] Implement the fwidth intrinsic for DXIL and SPIR-V target (#161378)
Adds the fwidth intrinsic for HLSL.
The DXIL path only requires modification to the hlsl headers.
The SPIRV path implements the OpFwidth builtin in Clang and instruction
selection for the OpFwidth instruction in LLVM.
Also adds shader stage tests to the ddx_coarse and ddy_coarse
instructions used by fwidth.

Closes #99120

---------

Co-authored-by: Alexander Johnston <alexander.johnston@amd.com>
2025-11-20 07:38:32 -05:00
Paul Walker
21c4c1502e
[LLVM][CodeGen][SVE] Only use unpredicated bfloat instructions when all lanes are in use. (#168387)
While SVE support for exception safe floating point code generation is
bare bones we try to ensure inactive lanes remiain inert. I mistakenly
broke this rule when adding support for SVE-B16B16 by lowering some
bfloat operations of unpacked vectors to unpredicated instructions.
2025-11-20 12:01:04 +00:00
Mehdi Amini
3da82af83f [MLIR] Apply clang-tidy fixes for bugprone-argument-comment in SparseBufferRewriting.cpp (NFC) 2025-11-20 03:35:18 -08:00
Mehdi Amini
9e86c0d5da [MLIR] Apply clang-tidy fixes for readability-container-size-empty in LinalgOps.cpp (NFC) 2025-11-20 03:35:18 -08:00
Mehdi Amini
c6a79a55ff [MLIR] Apply clang-tidy fixes for readability-identifier-naming in LLVMToLLVMIRTranslation.cpp (NFC) 2025-11-20 03:35:18 -08:00
sskzakaria
a2b4c0fbe0
[X86][Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - allow AVX512 mask predicate intrinsics to be used in constexpr (#165054)
Enables constexpr evaluation for the following AVX512 Instrinsics:
```
_mm_movepi8_mask _mm256_movepi8_mask _mm512_movepi8_mask
_mm_movepi16_mask _mm256_movepi16_mask _mm512_movepi16_mask
_mm_movepi32_mask _mm256_movepi32_mask _mm512_movepi32_mask
_mm_movepi64_mask _mm256_movepi64_mask _mm512_movepi64_mask
```
Part of #162072
2025-11-20 11:25:23 +00:00
Benjamin Maxwell
02db2de905
[AArch64][SVE] Implement demanded bits for @llvm.aarch64.sve.cntp (#168714)
This allows DemandedBits to see that the SVE CNTP intrinsic will only
ever produce small positive integers. The maximum value you could get
here is 256, which is CNTP on a nxv16i1 on a machine with a 2048bit
vector size (the maximum for SVE).

Using this various redundant operations (zexts, sexts, ands, ors, etc)
can be eliminated.
2025-11-20 11:23:05 +00:00
Zichen Lu
0a88e96228
[MLIR][LLVM] Extend DIScopeForLLVMFuncOp to handle cross-file operatio… (#167844)
The current `DIScopeForLLVMFuncOp` pass handles debug information for
inlined code by processing `CallSiteLoc` attributes. However, some
compilation scenarios compose code from multiple source files directly
into a single function without generating `CallSiteLoc`.

**Scenario:**
```python
# a.py
def kernel_a(tensor):
    print("a: {}", tensor)  # a.py:3
    jit_func_b(tensor)           # Calls b.py code

# b.py
def func_b(tensor):
    print("b: {}", tensor)  # b.py:7
```

The scenario executes Python at compile-time and directly inserts
operations from `b.py` into the kernel function, resulting in MLIR like:

```mlir
@kernel_a(...) {
  print("a: {}", %arg0) loc(#loc_a)  // a.py:3
  print("b: {}", %arg0) loc(#loc_b)  // b.py:7 <- FileLineColLoc, not CallSiteLoc
} loc(#loc_kernel)  // a.py:1

#loc1 = loc("a.py":3:.)
#loc2 = loc("b.py":7:.)
#loc_a = loc("print"(#loc1))
#loc_b = loc("print"(#loc2))
```
```llvm
!6 = !DIFile(filename: "a.py", directory: "...")
!9 = distinct !DISubprogram(name: "...", linkageName: "...", scope: !6, file: !6, line: 13, ...)
!10 = !DILocation(line: 7, column: ., scope: !9)  // Points to kernel's DISubprogram, not correct
```
2025-11-20 12:14:14 +01:00
Simon Pilgrim
53dfdf7ffd
[X86] BuiltinsX86.td - attempt to pack the builtins for each SSE level close together. NFC. (#168844)
Avoid some repeated feature blocks - we should have a single place in
each file that we can find most builtins for a particular ISA level.

Also, avoid some of the 80col wrapping that just makes it harder to find
anything at all.

There's a lot more we can do - but I don't want to completely refactor
this while we still have so much work to do for #30794
2025-11-20 10:34:51 +00:00
Matthias Springer
95d788c761
Revert "[mlir][Pass] Fix crash when applying a pass to an optional interface" (#168847)
Reverts llvm/llvm-project#168499
2025-11-20 18:31:51 +08:00
Sam Tebbs
3396b4654b
[LV] Allow partial reductions with an extended bin op (#165536)
A pattern of the form reduce.add(ext(mul)) is valid for a partial
reduction as long as the mul and its operands fulfill the requirements
of a normal partial reduction. The mul's extend operands will be
optimised to the wider extend, and we already have oneUse checks in
place to make sure the mul and operands can be modified safely.

1. -> https://github.com/llvm/llvm-project/pull/165536
2. https://github.com/llvm/llvm-project/pull/165543
2025-11-20 10:22:11 +00:00
Jeremy Morse
2cf550a040
[DebugInfo] Force early line-zero calls to have meaningful locations (#156850)
In functions that have been seriously deformed during optimisation,
there can be call instructions with line-zero immediately after frame
setup (see C reproducer in the test added). Our previous algorithms for
prologue_end ignored these, meaning someone entering a function at
prologue_end would break-in after a function call had completed. Prefer
instead to place prologue_end and the function scope-line on the line
zero call: this isn't false (it's the first meaningful instruction of the
function) and is approximately true. Given a less than ideal function,
this is an OK solution.
2025-11-20 10:20:47 +00:00
Aaditya
74cebce264
Revert "[AMDGPU] Add wave reduce intrinsics for float types - 2 (#161… (#168845)
…815)"

This reverts commit dcab4cb49bfb0aa17df3d3fabe582696100e0d35.
2025-11-20 15:44:57 +05:30
Matthias Springer
54f69caf1f
[mlir][Pass] Fix crash when applying a pass to an optional interface (#168499)
Interfaces can be optional: whether an op implements an interface or not
can depend on the state of the operation.

```
  // An optional code block for adding additional "classof" logic. This can
  // be used to better enable "optional" interfaces, where an entity only
  // implements the interface if some dynamic characteristic holds.
  // `$_attr`/`$_op`/`$_type` may be used to refer to an instance of the
  // interface instance being checked.
  code extraClassOf = "";
```

The current `Pass::canScheduleOn(RegisteredOperationName)` is
insufficient. This commit adds an additional overload to inspect
`Operation *`.

This commit fixes a crash when scheduling an `InterfacePass` for an
optional interface on an operation that does not actually implement the
interface.
2025-11-20 17:51:44 +08:00
Aleksandr Nogikh
131cf7d5b2
[AllocToken] Enable alloc token instrumentation for size-returning functions (#168840)
Consider a newly added "malloc_span" attribute in the allocation token
instrumentation to ensure that allocation functions with the
"malloc_span" attribute are processed similarly to other memory
allocation functions.

Update the tests to demonstrate applicability to __size_returning_new.
2025-11-20 10:33:24 +01:00
Kiran Kumar T P
dc343d2f05
[NFC][flang] Replace use of flang -fc1 with %flang_fc1 in few test case (#168830)
Replace use of flang -fc1 with %flang_fc1 in few test case
2025-11-20 15:00:15 +05:30
Jim Lin
bdf598f8dd
CodeGen: Add missing subtarget to TargetLoweringBase constructor for ARC, CSKY and M68K (#168811)
Those were missing in https://github.com/llvm/llvm-project/pull/168620.
2025-11-20 17:11:28 +08:00
Simon Pilgrim
07a31adf28
[X86] EltsFromConsecutiveLoads - recognise reverse load patterns. (#168706)
See if we can create a vector load from the src elements in reverse and
then shuffle these back into place.

SLP will (usually) catch this in the middle-end, but there are a few
BUILD_VECTOR scalarizations etc. that appear during DAG legalization.

I did start looking at a more general permute fold, but I haven't found
any good test examples for this yet - happy to take another look if
somebody has examples.
2025-11-20 09:08:39 +00:00
Sam Parker
e44646b795
[WebAssembly] Lower ANY_EXTEND_VECTOR_INREG (#167529)
Treat it in the same manner of zero_extend_vector_inreg and generate an
extend_low_u if possible. This is to try an prevent expensive shuffles
from being generated instead. computeKnownBitsForTargetNode has also
been updated to specify known zeros on extend_low_u.
2025-11-20 08:57:08 +00:00
Aaditya
dcab4cb49b
[AMDGPU] Add wave reduce intrinsics for float types - 2 (#161815)
Supported Ops: `fadd`, `fsub`
2025-11-20 14:21:54 +05:30
Aaditya
dbf4525351
[AMDGPU] Add wave reduce intrinsics for float types - 1 (#161814)
Supported Ops: `fmin`, `fmax`
2025-11-20 13:23:02 +05:30
Brandon Wu
3e5fafdc22
[RISCV][llvm] Select splat_vector(constant) with PLI (#168204)
Default DAG combiner combine BUILD_VECTOR with same elements to
SPLAT_VECTOR, we can just map constant splat to PLI if possible.
2025-11-20 15:02:40 +08:00
zhangtianhao6
fde2aadb80
[CodeGen] update code generation optimization level(nfc) (#168190) 2025-11-20 14:54:42 +08:00
Craig Topper
8608344778
[CFIInserter] Turn a reachable llvm_unreachable into a report_fatal_error. (#168777)
This prevents it from being optimized out in non-asserts builds.

Update X86 test to remove REQUIRES: asserts and check for LLVM ERROR.
Add FileCheck to RISC-V test and remove UNSUPPORTED.

This is the more complete fix for #168772 and #168525.
2025-11-19 22:32:26 -08:00
Matt Arsenault
db20a7f2bc
DAG: Fix constructing a temporary TargetTransformInfo instance (#168480) 2025-11-20 01:19:23 -05:00
Jinjie Huang
7f0dbf049a
[NFC] Reduce the size of test input in incompatible_dwarf_version.test (#168825)
Use smaller test inputs in in incompatible_dwarf_version.test to reduce
disk usage and execution time.
2025-11-20 13:56:10 +08:00
lonely eagle
765208b313
[mlir] Make remove-dead-values remove block and successorOperands before delete ops (#166766)
Reland https://github.com/llvm/llvm-project/pull/165725, fix the Failed
test by removing successor operands before delete operations. Following
the deletion of cond.branch, its successor operands will subsequently be
removed.
2025-11-20 13:55:09 +08:00
Volodymyr Sapsai
b39a9db3ab
[clang][deps] Add module map describing compiled module to file dependencies. (#160226)
When we add the module map describing the compiled module to the command
line, add it to the file dependencies as well.

Discovered while working on reproducers where a command line input was
missing in the captured files as it wasn't considered a dependency.
2025-11-19 20:17:43 -08:00
Nicolai Hähnle
13ed14f47e
AMDGPU: Autogenerate checks in a test (#168815) 2025-11-20 03:51:32 +00:00
marius doerner
7198279707
[clang][bytecode] Implement case ranges (#168418)
Fixes #165969

Implement GNU case ranges for constexpr bytecode interpreter.
2025-11-20 04:50:32 +01:00
Luke Lau
47b756a5a6
[RISCV] Only reduce VLs of instructions with demanded VLs (#168693)
In RISCVVLOptimizer we first compute all the demanded VLs, then we walk
backwards through the function and try to reduce any VLs.

We don't actually need to walk backwards anymore since after #124530 the
order in which we modify the instructions doesn't matter.

This patch changes it to just iterate over the instructions with a
demanded VL computed, which means we don't iterate over scalar
instructions etc.

This also fixes #168665, where we triggered an assert on instructions
with a dead $vxsat implicit-def:

dead %x:vr = PseudoVSADDU_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */,
0 /* tu, mu */, implicit-def dead $vxsat

Because $vxsat is a reserved register, DeadMachineInstructionElim won't
remove it and the instruction makes it to RISCVVLOptimizer.

And because the def of %x is dead, we don't reach this instruction in
the dataflow analysis. This instruction returns true for isCandidate, so
we would try to lookup its demanded VL which doesn't exist and assert.
But with this patch we don't try to reduce instructions that aren't in
DemandedVLs, which fixes the crash.
2025-11-20 03:49:59 +00:00
Hristo Hristov
3f151a3fa6
[libc++][memory] Applied [[nodiscard]] to smart pointers (#168483)
Applied `[[nodiscard]]` where relevant to smart pointers and related
functions.

- [x] - `std::unique_ptr`
- [x] - `std::shared_ptr`
- [x] - `std::weak_ptr`

See guidelines:
-
https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant
- `[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue. For example a locking
constructor in unique_lock.

---------

Co-authored-by: Hristo Hristov <zingam@outlook.com>
2025-11-20 04:19:15 +02:00
Pranav Kant
fda20d99ae
[bazel] Fix #165009 (#168804) 2025-11-19 18:17:35 -08:00
Jinjie Huang
79fffed60a
[llvm-dwp] Give more information when incompatible version found (#168511)
Provide more information when detecting a DWARF version mismatch in .dwo
files to help locate the issue and align with other similar errors.
2025-11-20 10:17:13 +08:00
Paddy McDonald
beac880da5
Better fix for the stack_container_dynamic_lib test (#168798)
Add the missing %libdl to the link command
2025-11-19 18:06:11 -08:00
Gang Chen
9e9fe08b16
Re-land [Transform][LoadStoreVectorizer] allow redundant in Chain (#168135)
This is the fixed version of
https://github.com/llvm/llvm-project/pull/163019
2025-11-19 17:39:10 -08:00
Yaxun (Sam) Liu
f9696949c3
[ClangLinkerWrapper] Refactor target ID sanitization for Windows file… (#168744)
… names

Fix non-RDC mode HIP compilation for the new driver on Windows due to
invalid temporary file names when offload arch is a target ID containing
':', which is invalid in file names on Windows.

Refactor the existing handling of ':' in file names on Windows from
clang driver into a shared function sanitizeTargetIDInFileName in
clang/Basic/TargetID.h. This function replaces ':' with '@' on Windows
only, preserving the original behavior.

Update both clang/lib/Driver/Driver.cpp and
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp to use this
shared function, ensuring consistent handling across both tools.
2025-11-19 20:22:22 -05:00
Sayan Saha
def8ecbda9
[tosa] : Relax dynamic dimension checks for batch for conv decompositions (#168764)
This PR relaxes the validation checks to allow input/output data to have
dynamic batch dimensions.
2025-11-19 20:17:01 -05:00
Alexey Bataev
2c3aa92089 [SLP]Fix insertion point for setting for the nodes
The problem with the many def-use chain problems in SLP vectorizer are
related to the fact that some nodes reuse the same instruction as
insertion point. Insertion point is not the instruction, but the place
between instructions. To set it correctly, better to generate pseudo
instruction immediately after the last instruction, and use it as
insertion point. It resolves the issues in most cases.

Fixes #168512 #168576
2025-11-19 17:15:24 -08:00
Eli Friedman
4e275f7274
[Arm64EC][clang] Implement varargs support in clang. (#152411)
The clang side of the calling convention code for arm64 vs. arm64ec is
close enough that this isn't really noticeable in most cases, but the
rule for choosing whether to pass a struct directly or indirectly is
significantly different.

(Adapted from my old patch https://reviews.llvm.org/D125419 .)

Fixes #89615.
2025-11-19 16:45:08 -08:00
Carl Ritson
b1c4b55118
RenameIndependentSubregs: try to only implicit def used subregs (#167486)
Attempt to only define used subregisters when creating IMPLICIT_DEF fix
ups for live interval subranges. This avoids the appearance at the MIR
level of entire (wide) registers becoming live rather than relying only
on transient LiveIntervals dead definitions for unused subregisters.
2025-11-20 09:28:34 +09:00
Dhruva Chakrabarti
94e4ee38aa
[AMDGPU] Fixed crash in getLastMIForRegion when the region is empty. (#168653)
PreRARematStage builds region live-outs if GCN trackers are enabled. If
rematerialization leads to empty regions, this can cause a crash because
of dereference of an invalid iterator in getLastMIForRegion. The fix is
to skip calling getLastMIForRegion for empty regions.

This patch fixes another bug in the same code region. getLastMIForRegion
calls skipDebugInstructionsBackward which may immediately return the
RegionEnd if it is not the begin instruction and it is a non-debug
instruction. That would imply considering an instruction that is outside
the relevant region. The fix is to always pass the previous of RegionEnd
to skipDebugInstructionsBackward.

This bug was found while using GCN trackers on the existing LIT test
machine-scheduler-sink-trivial-remats.mir. Here's the assertion failure.

llvm-project/llvm/include/llvm/ADT/ilist_iterator.h:168:
llvm::ilist_iterator<OptionsT, IsReverse, IsConst>::reference
llvm::ilist_iterator<OptionsT, IsReverse, IsConst>::operator*() const
[with OptionsT = llvm::ilist_detail::node_options<llvm::MachineInstr,
true, true, void, false, void>; bool IsReverse = false; bool IsConst =
false; llvm::ilist_iterator<OptionsT, IsReverse, IsConst>::reference =
llvm::MachineInstr&]: Assertion `!NodePtr->isKnownSentinel()' failed.
2025-11-19 16:19:20 -08:00
Nishant Patel
af73aeaa19
[MLIR][Vector] Add unroll pattern for vector.shape_cast (#167738)
This PR adds pattern for unrolling shape_cast given a targetShape. This
PR is a follow up of #164010 which was very general and was using
inserts and extracts on each element (which is also
LowerVectorShapeCast.cpp is doing).
After doing some more research on use cases, we (me and @Jianhui-Li )
realized that the previous version in #164010 is unnecessarily generic
and doesn't fit our performance needs.

Our use case requires that targetShape is contiguous in both source and
result vector.

This pattern only applies when contiguous slices can be extracted from
the source vector and inserted into the result vector such that each
slice remains in vector form with targetShape (and not decompose to
scalars). In these cases, the unrolling proceeds as:

vector.extract_strided_slice -> vector.shape_cast (on the slice
unrolled) -> vector.insert_strided_slice
2025-11-19 16:16:44 -08:00
Sang Ik Lee
7de59f0b24
[MLIR][Conversion] XeGPU to XeVM: Use adaptor for getting base address from memref. (#168610)
adaptor already lowers memref to base address.
Conversion patterns should use it instead of generating code to get base
address from memref.
2025-11-19 16:15:27 -08:00
Andy Kaylor
ef0cd1dae3
[CIR][NFC] Fix warnings in release builds (#168791)
This fixes several warnings that occur in CIR release builds.
2025-11-19 16:12:17 -08:00
Paddy McDonald
ff39d59000
Disable test under GCC (#168792)
New test stack_container_dynamic_lib.cpp has errors under gcc.

Require clang while better fix is investigated
2025-11-19 16:08:34 -08:00
Jan Svoboda
835951325e
[clang][deps] Enable calling DepScanFile::getBuffer() repeatedly (#168789)
This PR makes it possible to call `getBuffer()` on `DepScanFile` (a
`llvm::vfs::File`) repeatedly. Previously, this function would return a
moved-from `unique_ptr`. This doesn't fix any existing bugs, I
discovered this while experimenting with the VFSs in the scanner. Note
that the returned instances of `llvm::MemoryBuffer` are non-owning and
share the underlying buffer storage.
2025-11-19 16:03:30 -08:00
Haocong Lu
80f862b692
[CIR] Upstream CIR codegen for lzcnt and tzcnt x86 builtins (#168479)
Support CIR codegen for x86 builtins `__builtin_ia32_lzcnt` and
`__builtin_ia32_tzcnt`.
2025-11-19 16:01:56 -08:00