547939 Commits

Author SHA1 Message Date
Aiden Grossman
869bce23fd
[CI] Setup generate_report to describe ninja failures
This patch makes it so that generate_report will add information about
failed build actions to the summary report. This makes it significantly
easier to find compilation failures, especially given we run ninja with
-k 0.

This patch only does the integration into generate_report (along with
testing). Actual utilization in the script is split into a separate
patch to try and keep things clean.

Reviewers: dschuff, cmtice, DavidSpickett, Keenuts, lnihlen, gburgessiv

Reviewed By: cmtice, DavidSpickett

Pull Request: https://github.com/llvm/llvm-project/pull/152621
2025-08-08 09:44:04 -07:00
Ellis Hoag
3b32893cd9
[InstrProf][NFC] Refactor profdata trace tests (#152550)
Refactor some llvm-profdata tests to read text profiles which are easier
to match with FileCheck
2025-08-08 09:39:58 -07:00
Slava Gurevich
0f59b8d4e3
Fix improper alignment of static buffer for placement-new of BufferQueue (#152408)
No behavioral change, but eliminates potential UB in strict-alignment
systems.

The previous commit (llvm#94171) bulk-updated alignment usage to C++23
spec, but missed this occurrence.
2025-08-08 09:36:22 -07:00
Chao Chen
c96223434c
[mlir][xegpu] Add definition of SliceAttr (#150146)
---------

Co-authored-by: Charitha Saumya <136391709+charithaintc@users.noreply.github.com>
2025-08-08 11:27:17 -05:00
Min-Yih Hsu
b4e8b8ee91
[mlir][vector] Canonicalize broadcast of shape_cast (#150523)
Fold `broadcast(shape_cast(x))` into `broadcast(x)` if the type of x is
compatible with broadcast's result type and the shape_cast only adds or removes ones in the leading dimensions.

---------

Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
Co-authored-by: James Newling <james.newling@gmail.com>
2025-08-08 09:25:32 -07:00
Alexey Bataev
0419b459be Revert "[SLP]Initial FMAD support (#149102)"
This reverts commit 0bcf45ea3458ba79eb4257afcfd6af954292c9ce to fix the
regresions, reported in https://github.com/llvm/llvm-project/issues/152683
2025-08-08 09:17:59 -07:00
James Newling
b574bcf036
[mlir][TD] Support padding with poison (#152003)
Signed-off-by: James Newling <james.newling@gmail.com>
2025-08-08 09:09:03 -07:00
Simon Pilgrim
45b4f1b438
[Headers][X86] Allow _mm512_set1_epi8/16/pd/ps intrinsics to be used in constexpr (#152746)
Pulled out of #152288 as I need this to proceed with several other patches
2025-08-08 17:04:08 +01:00
Orlando Cazalet-Hyams
1778669739
[KeyInstr] Remove LLVM_EXPERIMENTAL_KEY_INSTRUCTIONS CMake flag (#152735)
The CMake flag has been on by default for a month without any issues.

This makes the feature support in LLVM unconditional (but does not
enable the feature by default).
2025-08-08 17:03:28 +01:00
Simon Pilgrim
c8312bdd16
[Headers][X86] Enable constexpr handling for pmulhw/pmulhuw intrinsics (#152540)
This patch updates the pmulhw/pmulhuw builtins to support constant
expression handling - extending the VectorExprEvaluator::VisitCallExpr
handling code that handles elementwise integer binop builtins.

Hopefully this can be used as reference patch to show how to add future
target specific constexpr handling with minimal code impact.

I've also enabled pmullw constexpr handling (which are tagged on
#152490) as they all use very similar tests.

I've also had to tweak the MMX -> SSE2 wrapper as undefs are not
permitted in constexpr shuffle masks

Fixes #152524
2025-08-08 17:02:50 +01:00
Aiden Grossman
9ea1d39ead [CI][Github] Remove Outdated Comments
5fc3e76ec4f323c22cddf7b9458137510507847a made the pipelines fail on
errors and also removed the TODO comments, but did not remove the
explanatory comments on why things were set up that way. Given things no
longer succeed on error, these comments are outdated and should be
removed.
2025-08-08 15:59:15 +00:00
Aiden Grossman
83dd7d97bd
[CI] Add Support for Parsing Ninja Logs to generate_test_report_lib
This patch adds in support for taking the CLI output from ninja and
parsing it for failures. This is intended to be used in the cases where
all tests pass (or none have run), but the build fails to easily surface
where exactly the build failed.

The actual integration will happen in a future patch.

Reviewers: gburgessiv, dschuff, lnihlen, DavidSpickett, Keenuts, cmtice

Reviewed By: DavidSpickett, cmtice

Pull Request: https://github.com/llvm/llvm-project/pull/152620
2025-08-08 08:42:25 -07:00
Muhammad Bassiouni
45b15946b1
[libc][hdrgen] Fix hdrgen when using macros as guards in stdlib.yaml. (#152732) 2025-08-08 18:39:47 +03:00
Ivan R. Ivanov
7c141e2118
[ValueTracking] Add missing check for two-value PN recurrence matching (#152700)
When InstTy is a type like IntrinsicInst which can have a variable
number of arguments, we can encounter a case where Operation will have
fewer than two arguments and error at the getOperand() calls.

Fixes: https://github.com/llvm/llvm-project/issues/152725.
2025-08-08 17:39:24 +02:00
Muhammad Bassiouni
66734f4c3c
[libc][math] Refactor cbrtf implementation to header-only in src/__support/math folder. (#151846)
Part of #147386

in preparation for: https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450
2025-08-08 18:28:50 +03:00
nicebert
09bf2c5c91
[OpenMP] Claims omp_target_is_accessible as worked on (#151507)
Includes link to current PR.
Spec requires minor clarification.
2025-08-08 10:21:16 -05:00
Jordan Rupprecht
6a8e376d82
[bazel] Extra layering_check dep for #151228: BFloat16 (#152741) 2025-08-08 10:11:52 -05:00
Simon Pilgrim
f169893cbf
[Headers][X86] Allow BITALG vpopcntw/vpopcntb intrinsics to be used in constexpr (#152701)
Matches VPOPCNTDQ handling
2025-08-08 16:09:26 +01:00
Amina Chabane
478b415181
[AArch64] Enable svcompact intrinsic in streaming mode with SME2.2 (#151703)
When the target enables +sme2p2, the svcompact intrinsic is now
available in streaming SVE mode, through updating the guards in
arm_sve.td. Included Sema test acle_sve_compact.cpp.
2025-08-08 16:04:54 +01:00
Mikhail R. Gadelha
e91f68487c
[RISCV] Update SpacemiT-X60 vector fixed-point arithmetic latencies (#150517)
This PR adds hardware-measured latencies for all instructions defined in
Section 12 of the RVV specification: "Vector Fixed-Point Arithmetic
Instructions" to the SpacemiT-X60 scheduling model.
2025-08-08 11:57:35 -03:00
Kazu Hirata
1bc49c0c97
[AST] Remove an unused local variable (NFC) (#152647) 2025-08-08 07:45:22 -07:00
Kazu Hirata
8afa70f1c8
[llvm] Proofread SourceLevelDebugging.rst (#152646)
This patch takes care of the highly mechanical part of proofreading
SourceLevelDebugging.rst, namely:

- hyphenating "32 bit value" and similar and
- hypenating "Objective C"
2025-08-08 07:45:14 -07:00
Kazu Hirata
c11868f66c
[IR] Remove Intrinsic::getDeclaration (#152645)
Intrinsic::getDeclaration has been deprecated for more than 9 months
since:

  commit b9f08676abcfbb226c67b5ac2a7bc5b33254b915
  Author: Rahul Joshi <rjoshi@nvidia.com>
  Date:   Mon Oct 14 19:21:28 2024 -0700

This patch removes it.  I'm not aware of any downstream use AFAIK.
2025-08-08 07:45:06 -07:00
Kazu Hirata
4e44e7c164
[Sema] Remove an unnecessary cast (NFC) (#152644)
numTypeParams is already of unsigned.

Co-authored-by: Corentin Jabot <corentinjabot@gmail.com>
2025-08-08 07:44:59 -07:00
Kazu Hirata
9beb18a6f0
[CodeGen] Remove an unnecessary cast (NFC) (#152643)
getUnitInc() already returns int.
2025-08-08 07:44:51 -07:00
Kazu Hirata
30b0a9ec19
[ADT] Use range-based for loops in StringMap.h (NFC) (#152641) 2025-08-08 07:44:44 -07:00
Simon Pilgrim
e64224a224
[Headers][X86] Allow AVX cast intrinsics to be used in constexpr (#152730)
Still missing the "extend to 256-bit" casts - _mm256_castpd128_pd256 / _mm256_castps128_ps256 / _mm256_castsi128_si256 - due to constexpr not liking undefined/poison etc.
2025-08-08 15:39:39 +01:00
Guray Ozen
76a533c8ec
[MLIR][NVVM] Add pmevent (#152509)
Add nvvm.pmevent Op that Triggers one or more of a fixed number of
performance monitor events, with event index or mask specified by
immediate operand.

[For more information, see PTX
ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-pmevent)
2025-08-08 16:34:18 +02:00
tcottin
2c4b876fa8
[clangd] introduce doxygen parser (#150790)
Followup work of #140498 to continue the work on clangd/clangd#529

Introduce the use of the Clang doxygen parser to parse the documentation
of hovered code.

- ASTContext independent doxygen parsing
- Parsing doxygen commands to markdown for hover information

Note: after this PR I have planned another patch to rearrange the
information shown in the hover info.
This PR is just for the basic introduction of doxygen parsing for hover
information.

---------

Co-authored-by: Maksim Ivanov <emaxx@google.com>
2025-08-08 16:07:36 +02:00
Timm Baeder
1b1f352cb9
[clang][bytecode] Handle reads on zero-size arrays (#152706) 2025-08-08 16:03:02 +02:00
Timm Baeder
3ea76af3a1
[clang][bytecode][NFC] Remove a useless local variable (#152711)
We can just check NonNullArgs.empty().
2025-08-08 15:52:23 +02:00
Timm Baeder
d54516b9ad
[clang][bytecode][NFC] Use an existing local variable (#152710)
Instead of calling getSize() again.
2025-08-08 15:41:58 +02:00
Yingwei Zheng
ac8295550b
[Clang][CodeGen] Move EmitPointerArithmetic into CodeGenFunction. NFC. (#152634)
`CodeGenFunction::EmitPointerArithmetic` is needed by
https://github.com/llvm/llvm-project/pull/152575. Separate the NFC
changes into a new PR for smooth review.
2025-08-08 21:41:03 +08:00
David Green
26b302fd8b [AArch64] Rename Cost -> PromotedCost to avoid shadowing error 2025-08-08 14:37:24 +01:00
Erick Ochoa Lopez
a1672d7c6a
[mlir][vector] Add alignment attribute to maskedload and maskedstore (#151690)
These commits continue the work done in
https://github.com/llvm/llvm-project/pull/144344, of adding alignment
attributes to operations in the vector and memref. These commits focus
on adding the alignment attribute to the `maskedload` and `maskedstore`
operations. The `VectorLoadConversion` pattern in VectorToLLVM is a
template for `load`, `store`, `maskedload` and `maskedstore` operations.
Having the alignment attribute in all these operations would allow for
an easy way to propagate the alignment attribute from the vector dialect
to the LLVM dialect.

This patchset also includes changes to the conversion from VectorToLLVM
to propagate the alignment attribute for the
vector.{,masked}{load,store} operations.
2025-08-08 09:23:44 -04:00
Szymon Piotr Milczek
fd41700962
[InstCombine] visitShuffleVectorInst assert with vector of pointers fix. (#152341)
In visitShuffleVectorInst there's an if block that's meant to turn
shufflevector followed by bitcast into extractelement where possible.

It assumes that there will never be bitcasts performed on vectors of ptr
as such operations are almost always illegal, and ptrtoint instructions
should be used instead.

There is however an edge case where a bitcast instruction can be
performed on a vector of type `<1 x ptr>` to turn it into type `ptr`

In this edge case, the code initializes the variable `VecBitWidth` to 0.
Then, when iterating over users that are bitcasts, an attempt is made to
create a vector of size 0, which triggers and assert.

This commit changes initialization of `VecBitWidth` to use datalayout to
find the the size of the vector instead of getPrimitiveSizeInBits method
which results in 0 for ptr and vectors of ptr.
2025-08-08 15:23:02 +02:00
Rahul Joshi
7f0e4079c8
[NFCI][TableGen] Make Intrinsic::getAttributes table driven (#152349)
This a follow on to https://github.com/llvm/llvm-project/pull/152219 to
reduce both code and frame size of `Intrinsic::getAttributes` further.

Currently, this function consists of several switch cases (one per
unique argument attributes) that populates the local `AS` array with all
non-empty argument attributes for that intrinsic by calling
`getIntrinsicArgAttributeSet`. This change makes this code table driven
and implements `Intrinsic::getAttributes` without any switch cases,
which reduces the code size of this function on all platforms and in
addition reduces the frame size by a factor of 10 on Windows.

This is achieved by:
1. Emitting table `ArgAttrIdTable` containing a concatenated list of
`<ArgNo, AttrID>` entries across all unique arguments.
2. Emitting table `ArgAttributesInfoTable` (indexed by unique
arguments-ID) to store the starting index and number of non-empty arg
attributes.
3. Reserving unique function-ID 255 to indicate that the intrinsic has
no function attributes (to replace `HasFnAttr` setup in each switch
case).
4. Using a simple table lookup and for loop to build the list of
argument and function attributes for a given intrinsic.

Experimental data shows that with release builds and assertions
disabled, this change reduces the code size for GCC and Clang builds on
Linux by ~9KB for a modest (80/152 byte) increase in frame size. For
Windows, it reduces the code size by 20KB and frame size from 4736 bytes
to 461 bytes which is 10x reduction. Actual data is as follows:

```
 Current trunk:
  Compiler                              gcc-13.3.0      clang-18.1.3      MSVC 19.43.34810.0
  code size                             0x35a9          0x370c            0x5581
  frame size                            0x120           0x118             0x1280

 table driven Intrinsic::getAttributes:
  code size                             0xcfb            0xcd0            0x1cf
  frame size                            0x1b8            0x188            0x1A0
  Total savings (code + data)           9212 bytes       9790 bytes       20119 bytes
```

Total savings above accounts for the additional data size for the 2 new
tables, which in this experiment was: `ArgAttributesInfoTable` = 314
bytes and `ArgAttrIdTable` = 888 bytes. Coupled with the earlier
https://github.com/llvm/llvm-project/pull/152219, this achieves a 46x
reduction in frame size for this function in Windows release builds.
2025-08-08 06:02:43 -07:00
Timm Baeder
8d26252eec
[clang][bytecode][NFC] Dead blocks are always uninitialized (#152699)
We always call the descriptor dtor before, so they are never
initialized.
2025-08-08 14:57:38 +02:00
Yaxun (Sam) Liu
479556c720
[HIP] compressed bundle format defaults to v3 (#152600)
HIP runtime support for compressed bundle format v3 is in place,
therefore switch the default compressed bundle format to v3 in compiler.

This allows both compressed and decompressed fat binary size to exceed
4GB by default.

Environment variable COMPRESSED_BUNDLE_FORMAT_VERSION=2 can be used for
backward compatibility for older HIP runtimes not supporting v3.

Fixes: SWDEV-548879
2025-08-08 08:53:01 -04:00
sebvince
8949dc7f9c
[mlir][amdgpu] fold memref.subview/expand_shape/collapse_shape into amdgpu.gather_to_lds for DST operand (#152277) 2025-08-08 05:47:33 -07:00
David Green
7f1638efc1
[AArch64] Generalize costing for FP16 instructions (#150033)
This extracts the code for modelling a fp16 operation as
`fptrunc(fpop(fpext,fpext))` into a new function named
getFP16BF16PromoteCost so that it can be reused by the
arithmetic instructions. The function takes a lambda to
calculate the cost of the operation with the promoted type.
2025-08-08 13:40:07 +01:00
Lucas Ramirez
83c308f014
[AMDGPU][Scheduler] Consistent occupancy calculation during rematerialization (#149224)
The `RPTarget`'s way of determining whether VGPRs are beneficial to save
and whether the target has been reached w.r.t. VGPR usage currently
assumes, if `CombinedVGPRSavings` is true, that free slots in one VGPR
RC can always be used for the other. Implicitly, this makes the
rematerialization stage (only current user of `RPTarget`) follow a
different occupancy calculation than the "regular one" that the
scheduler uses, one that assumes that ArchVGPR/AGPR usage can be
balanced perfectly and at no cost, which is untrue in general. This
ultimately yields suboptimal rematerialization decisions that require
cross-VGPR-RC copies unnecessarily.

This fixes that, making the `RPTarget`'s internal model of occupancy
consistent with the regular one. The `CombinedVGPRSavings` flag is
removed, and a form of cross-VGPR-RC saving implemented only for unified
RFs, which is where it makes the most sense. Only when the amount of
free VGPRs in a given VGPR RC (ArchVPGR or AGPR) is lower than the
excess VGPR usage in the other VGPR RC does the `RPTarget` consider that
a pressure reduction in the former will be beneficial to the latter.
2025-08-08 14:26:04 +02:00
Mel Chen
ab7281d896
[VPlan] Update naming in VPInterleaveRecipe constructor. nfc (#152472) 2025-08-08 20:17:10 +08:00
Simon Pilgrim
1e9ed918dd
[X86][AVX512BITALG] add C/C++ and 32/64-bit builtins test coverage (#152693) 2025-08-08 13:12:06 +01:00
Michael Buch
672f82a2ef [lldb][test] TestExprDefinitionInDylib.py: add cases for calling ctors 2025-08-08 12:12:25 +01:00
Timm Baeder
fde9ee1ac2
[clang][bytecode] Don't deallocate dynamic blocks with pointers (#152672)
This fixes the edge case we had with variables pointing to dynamic
blocks, which forced us to convert basically *all* dynamic blocks to
DeadBlock when deallocating them.

We now don't run dynamic blocks through InterpState::deallocate() but
instead add them to a DeadAllocations list when they are deallocated but
still have pointers.

As a consequence, not all blocks with Block::IsDead = true are
DeadBlocks.
2025-08-08 13:02:01 +02:00
Florian Hahn
82d633e9ff
[VPlan] Materialize vector trip count using VPInstructions. (#151925)
Materialize the vector trip count computation using VPInstruction
instead of directly creating IR. This is one of the last few steps
needed to model the full vector skeleton in VPlan. It also simplifies
vector-trip count computations for scalable vectors, as we can re-use
the UF x VF computation.

PR: https://github.com/llvm/llvm-project/pull/151925
2025-08-08 11:44:32 +01:00
Sasa Vuckovic
9349484e8f
[MLIR] Make PassPipelineOptions virtually inheriting from PassOptions to allow diamond inheritance (#146370)
## Problem

Given 3 pipelines, A, B, and a superset pipeline AB that runs both the A
& B pipelines, it is not easy to manage their options - one needs to
manually recreate all options from A and B into AB, and maintain them.
This is tedious.

## Proposed solution
Ideally, AB options class inherits from both A and B options, making the
maintenance effortless. Today though, this causes problems as their base
classes `PassPipelineOptions<A>` and `PassPipelineOptions<B>` both
inherit from `mlir::detail::PassOptions`, leading to the so called
"diamond inheritance problem", i.e. multiple definitions of the same
symbol, in this case parseFromString that is defined in
mlir::detail::PassOptions.

Visually, the inheritance looks like this:

```
                         mlir::detail::PassOptions
                            ↑                  ↑
                            |                  |
           PassPipelineOptions<A>      PassPipelineOptions<B>
                            ↑                  ↑
                            |                  |
                         AOptions           BOptions
                            ↑                  ↑
                            +---------+--------+
                                      |
                                  ABOptions
```

A proposed fix is to use the common solution to the diamond inheritance
problem - virtual inheritance.
2025-08-08 12:33:56 +02:00
Ryotaro Kasuga
bd39ae6125
[Delinearization] Add function for fixed size array without relying on GEP (#145050)
The existing functions `getIndexExpressionsFromGEP` and
`tryDelinearizeFixedSizeImpl` provide functionality to delinearize
memory accesses for fixed size array. They use the GEP source element
type in their optimization heuristics. However, driving optimization
heuristics based on GEP type information is not allowed.

This patch introduces new functions `findFixedSizeArrayDimensions` and
`delinearizeFixedSizeArray` to delinearize a fixed size array without
using the type information in GEP. The new function
`findFixedSizeArrayDimensions` infers the size of each dimension of the
array based on the value to be added to the address as induction
variables are incremented. `delinearizeFixedSizeArray` attempts to
restore the subscripts of each dimension based on the estimated array
size.

This is an initial implementation that may not cover all cases, but is
intended to replace the existing function in the future.

Related:
- https://discourse.llvm.org/t/enabling-loop-interchange/82589/4
-
https://github.com/llvm/llvm-project/pull/124911#issuecomment-2962499501
2025-08-08 19:08:14 +09:00
Bart Chrzaszcz
92f6b15445
[clang] Fix bazel after eccc6e2. (#152681) 2025-08-08 11:02:14 +01:00