11516 Commits

Author SHA1 Message Date
Kazu Hirata
035dd1d854 [ADT] Fix a warning
This patch fixes:

  third-party/unittest/googletest/include/gtest/gtest.h:1379:11:
  error: comparison of integers of different signs: 'const unsigned
  long' and 'const int' [-Werror,-Wsign-compare]
2025-08-21 08:49:56 -07:00
Chaitanya Koparkar
ad63a70d6d
[ADT] Add fshl/fshr operations to APInt (#153790)
These operations are required for #153151.
2025-08-21 12:52:18 +01:00
Steven Wu
deab049b5c
[CAS] Add ActionCache to LLVMCAS Library (#114097)
ActionCache is used to store a mapping from CASID to CASID. The current
implementation of the ActionCache can only be used to associate the
key/value from the same hash context.

ActionCache has two operations: `put` to store the key/value and `get`
to
lookup the key/value mapping. ActionCache uses the same TrieRawHashMap
data structure to store the mapping, where is CASID of the key is the
hash to index the map.

While CASIDs for key/value are often associcate with actual CAS
ObjectStore, it doesn't provide the guarantee of the existence of such
object in any ObjectStore.
2025-08-20 14:42:44 -07:00
David Majnemer
0a7eabcc56 Reapply "[APFloat] Fix getExactInverse for DoubleAPFloat"
The previous implementation of getExactInverse used the following check
to identify powers of two:

  // Check that the number is a power of two by making sure that only the
  // integer bit is set in the significand.
  if (significandLSB() != semantics->precision - 1)
    return false;

This condition verifies that the only set bit in the significand is the
integer bit, which is correct for normal numbers. However, this logic is
not correct for subnormal values.

APFloat represents subnormal numbers by shifting the significand right
while holding the exponent at its minimum value. For a power of two in
the subnormal range, its single set bit will therefore be at a position
lower than precision - 1. The original check would consequently fail,
causing the function to determine that these numbers do not have an
exact multiplicative inverse.

The new logic calculated this correctly but it seems that
test/CodeGen/Thumb2/mve-vcvt-fixed-to-float.ll expected the old
behavior.

Seeing as how getExactInverse does not have tests or documentation, we
conservatively maintain (and document) this behavior.

This reverts commit 47e62e846beb267aad50eb9195dfd855e160483e.
2025-08-20 14:02:36 -07:00
David Green
4875553f4c [AArch64][GlobalISel] Port unmerge KnownBits tests to print<gisel-value-tracking>. NFC
This takes the known-bits tests added in #112172 and ports them over to be a
new print<gisel-value-tracking> test.
2025-08-20 20:57:14 +01:00
David Tenty
63195d3d7a
[NFC][CMake] quote ${CMAKE_SYSTEM_NAME} consistently (#154537)
A CMake change included in CMake 4.0 makes `AIX` into a variable
(similar to `APPLE`, etc.)
ff03db6657

However, `${CMAKE_SYSTEM_NAME}` unfortunately also expands exactly to
`AIX` and `if` auto-expands variable names in CMake. That means you get
a double expansion if you write:

`if (${CMAKE_SYSTEM_NAME}  MATCHES "AIX")`
which becomes:
`if (AIX  MATCHES "AIX")`
which is as if you wrote:
`if (ON MATCHES "AIX")`

You can prevent this by quoting the expansion of "${CMAKE_SYSTEM_NAME}",
due to policy
[CMP0054](https://cmake.org/cmake/help/latest/policy/CMP0054.html#policy:CMP0054)
which is on by default in 4.0+. Most of the LLVM CMake already does
this, but this PR fixes the remaining cases where we do not.
2025-08-20 12:45:41 -04:00
Steven Wu
2cfba9678d
[FileSystem] Allow exclusive file lock (#114098)
Add parameter to file lock API to allow exclusive file lock. Both Unix
and Windows support lock the file exclusively for write for one process
and LLVM OnDiskCAS uses exclusive file lock to coordinate CAS creation.
2025-08-20 08:32:18 -07:00
Mehdi Amini
8b2028ced6
Update log_level for LLVM_DEBUG and associated macros (#154525)
During the review of #150855 we switched from 0 to 1 for the default log
level used, but this macro wasn't updated.
2025-08-20 13:31:13 +00:00
Zhaoxuan Jiang
2738828c0e
[Reland] [CGData] Lazy loading support for stable function map (#154491)
This is an attempt to reland #151660 by including a missing STL header
found by a buildbot failure.

The stable function map could be huge for a large application. Fully
loading it is slow and consumes a significant amount of memory, which is
unnecessary and drastically slows down compilation especially for
non-LTO and distributed-ThinLTO setups. This patch introduces an opt-in
lazy loading support for the stable function map. The detailed changes
are:

- `StableFunctionMap`
- The map now stores entries in an `EntryStorage` struct, which includes
offsets for serialized entries and a `std::once_flag` for thread-safe
lazy loading.
- The underlying map type is changed from `DenseMap` to
`std::unordered_map` for compatibility with `std::once_flag`.
- `contains()`, `size()` and `at()` are implemented to only load
requested entries on demand.

- Lazy Loading Mechanism
- When reading indexed codegen data, if the newly-introduced
`-indexed-codegen-data-lazy-loading` flag is set, the stable function
map is not fully deserialized up front. The binary format for the stable
function map now includes offsets and sizes to support lazy loading.
- The safety of lazy loading is guarded by the once flag per function
hash. This guarantees that even in a multi-threaded environment, the
deserialization for a given function hash will happen exactly once. The
first thread to request it performs the load, and subsequent threads
will wait for it to complete before using the data. For single-threaded
builds, the overhead is negligible (a single check on the once flag).
For multi-threaded scenarios, users can omit the flag to retain the
previous eager-loading behavior.
2025-08-20 06:15:04 -07:00
Qihan Cai
5f0515debd
[RISCV] Support Remaining P Extension Instructions for RV32/64 (#150379)
This patch implements pages 15-17 from
jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf

Documentation:
jhauser.us/RISCV/ext-P/RVP-baseInstrs-014.pdf
jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf
2025-08-20 22:54:07 +10:00
Steven Wu
30c5c48d87
[CAS][Tests] Fix unit tests that hangs on two cores (#154151) 2025-08-19 08:21:34 -07:00
Benjamin Maxwell
eb764040bc
[AArch64][SME] Implement the SME ABI (ZA state management) in Machine IR (#149062)
## Short Summary

This patch adds a new pass `aarch64-machine-sme-abi` to handle the ABI
for ZA state (e.g., lazy saves and agnostic ZA functions). This is
currently not enabled by default (but aims to be by LLVM 22). The goal
is for this new pass to more optimally place ZA saves/restores and to
work with exception handling.

## Long Description

This patch reimplements management of ZA state for functions with
private and shared ZA state. Agnostic ZA functions will be handled in a
later patch. For now, this is under the flag `-aarch64-new-sme-abi`,
however, we intend for this to replace the current SelectionDAG
implementation once complete.

The approach taken here is to mark instructions as needing ZA to be in a
specific ("ACTIVE" or "LOCAL_SAVED"). Machine instructions implicitly
defining or using ZA registers (such as $zt0 or $zab0) require the
"ACTIVE" state. Function calls may need the "LOCAL_SAVED" or "ACTIVE"
state depending on the callee (having shared or private ZA).

We already add ZA register uses/definitions to machine instructions, so
no extra work is needed to mark these.

Calls need to be marked by glueing Arch64ISD::INOUT_ZA_USE or
Arch64ISD::REQUIRES_ZA_SAVE to the CALLSEQ_START.

These markers are then used by the MachineSMEABIPass to find
instructions where there is a transition between required ZA states.
These are the points we need to insert code to set up or restore a ZA
save (or initialize ZA).

To handle control flow between blocks (which may have different ZA state
requirements), we bundle the incoming and outgoing edges of blocks.
Bundles are formed by assigning each block an incoming and outgoing
bundle (initially, all blocks have their own two bundles). Bundles are
then combined by joining the outgoing bundle of a block with the
incoming bundle of all successors.

These bundles are then assigned a ZA state based on the blocks that
participate in the bundle. Blocks whose incoming edges are in a bundle
"vote" for a ZA state that matches the state required at the first
instruction in the block, and likewise, blocks whose outgoing edges are
in a bundle vote for the ZA state that matches the last instruction in
the block. The ZA state with the most votes is used, which aims to
minimize the number of state transitions.
2025-08-19 10:00:28 +01:00
Mehdi Amini
89abccc9a6
[MLIR] Update GreedyRewriter to use the LDBG() debug log mechanism (NFC) (#153961)
Also improve a bit the LDBG() implementation
2025-08-18 21:05:34 +00:00
Damyan Pepper
cc49f3b3e1
[NFC][HLSL] Remove confusing enum aliases / duplicates (#153909)
Remove:

* DescriptorType enum - this almost exactly shadowed the ResourceClass
enum
* ClauseType aliased ResourceClass

Although these were introduced to make the HLSL root signature handling
code a bit cleaner, they were ultimately causing confusion as they
appeared to be unique enums that needed to be converted between each
other.

Closes #153890
2025-08-18 08:58:33 -07:00
Benjamin Maxwell
81c06d198e
Reland "[AArch64][SME] Port all SME routines to RuntimeLibcalls" (#153417)
This updates everywhere we emit/check an SME routines to use
RuntimeLibcalls to get the function name and calling convention.
2025-08-18 14:53:40 +01:00
Nikita Popov
ba45ac61b6 [CAS] Temporarily disable broken test
This test hangs forever if executed with less than three cores
available, see:
https://github.com/llvm/llvm-project/pull/114096#issuecomment-3196698403
2025-08-18 15:09:08 +02:00
林克
6842cc5562
[RISCV] Add SpacemiT XSMTVDot (SpacemiT Vector Dot Product) extension. (#151706)
The full spec can be found at spacemit-x60 processor support scope:
Section 2.1.2.2 (Features):

https://developer.spacemit.com/documentation?token=BWbGwbx7liGW21kq9lucSA6Vnpb#2.1

This patch only supports assembler.
2025-08-18 18:03:17 +08:00
Carl Ritson
97d5d483ec
[MsgPack] Add code for floating point assignment and writes (#153544)
Allow assignment of float to DocType and support output of float in
writeToBlob method.
Expand tests coverage to various missing basic I/O operations.

Co-authored-by: Xavi Zhang <Xavi.Zhang@amd.com>
2025-08-18 10:03:40 +09:00
Matt Arsenault
3e5d8a1439 Reapply "RuntimeLibcalls: Generate table of libcall name lengths (#153… (#153864)
This reverts commit 334e9bf2dd01fbbfe785624c0de477b725cde6f2.

Check if llvm-nm exists before building the benchmark.
2025-08-16 09:53:50 +09:00
Craig Topper
e67ec12640
[RISCV] Remove experimental from Smctr and Ssctr. (#153903)
These extensions were ratified in November 2024.
2025-08-15 17:18:09 -07:00
gulfemsavrun
334e9bf2dd
Revert "RuntimeLibcalls: Generate table of libcall name lengths (#153… (#153864)
…210)"

This reverts commit 9a14b1d254a43dc0d4445c3ffa3d393bca007ba3.

Revert "RuntimeLibcalls: Return StringRef for libcall names (#153209)"

This reverts commit cb1228fbd535b8f9fe78505a15292b0ba23b17de.

Revert "TableGen: Emit statically generated hash table for runtime
libcalls (#150192)"

This reverts commit 769a9058c8d04fc920994f6a5bbb03c8a4fbcd05.

Reverted three changes because of a CMake error while building llvm-nm
as reported in the following PR:
https://github.com/llvm/llvm-project/pull/150192#issuecomment-3192223073
2025-08-15 13:32:27 -07:00
Sterling-Augustine
5b0619e79b
Move function info word into its own data structure (#153627)
The sframe generator needs to construct this word separately from FDEs
themselves, so split them into a separate data structure.
2025-08-15 13:16:34 -07:00
Matt Arsenault
cb1228fbd5
RuntimeLibcalls: Return StringRef for libcall names (#153209)
Does not yet fully propagate this down into the TargetLowering
uses, many of which are relying on null checks on the returned
value.
2025-08-15 09:55:39 +09:00
Steven Wu
7e46f5db21
[Support] Add mapped_file_region::sync(), equivalent to msync (#153632) 2025-08-14 17:05:33 -07:00
Matt Arsenault
769a9058c8
TableGen: Emit statically generated hash table for runtime libcalls (#150192)
a96121089b9c94e08c6632f91f2dffc73c0ffa28 reverted a change
to use a binary search on the string name table because it
was too slow. This replaces it with a static string hash
table based on the known set of libcall names. Microbenchmarking
shows this is similarly fast to using DenseMap. It's possibly
slightly slower than using StringSet, though these aren't an
exact comparison. This also saves on the one time use construction
of the map, so it could be better in practice.

This search isn't simple set check, since it does find the
range of possible matches with the same name. There's also
an additional check for whether the current target supports
the name. The runtime constructed set doesn't require this,
since it only adds the symbols live for the target.

Followed algorithm from this post
http://0x80.pl/notesen/2023-04-30-lookup-in-strings.html

I'm also thinking the 2 special case global symbols should
just be added to RuntimeLibcalls. There are also other global
references emitted in the backend that aren't tracked; we probably
should just use this as a centralized database for all compiler
selected symbols.
2025-08-15 09:02:56 +09:00
Kyungwoo Lee
07d3a73d70 Revert "[CGData] Lazy loading support for stable function map (#151660)"
This reverts commit 76dd742f7b32e4d3acf50fab1dbbd897f215837e.
2025-08-14 16:56:54 -07:00
Zhaoxuan Jiang
76dd742f7b
[CGData] Lazy loading support for stable function map (#151660)
The stable function map could be huge for a large application. Fully
loading it is slow and consumes a significant amount of memory, which is
unnecessary and drastically slows down compilation especially for
non-LTO and distributed-ThinLTO setups. This patch introduces an opt-in
lazy loading support for the stable function map. The detailed changes
are:

- `StableFunctionMap`
- The map now stores entries in an `EntryStorage` struct, which includes
offsets for serialized entries and a `std::once_flag` for thread-safe
lazy loading.
- The underlying map type is changed from `DenseMap` to
`std::unordered_map` for compatibility with `std::once_flag`.
- `contains()`, `size()` and `at()` are implemented to only load
requested entries on demand.

- Lazy Loading Mechanism
- When reading indexed codegen data, if the newly-introduced
`-indexed-codegen-data-lazy-loading` flag is set, the stable function
map is not fully deserialized up front. The binary format for the stable
function map now includes offsets and sizes to support lazy loading.
- The safety of lazy loading is guarded by the once flag per function
hash. This guarantees that even in a multi-threaded environment, the
deserialization for a given function hash will happen exactly once. The
first thread to request it performs the load, and subsequent threads
will wait for it to complete before using the data. For single-threaded
builds, the overhead is negligible (a single check on the once flag).
For multi-threaded scenarios, users can omit the flag to retain the
previous eager-loading behavior.
2025-08-14 13:49:09 -07:00
Florian Hahn
177f27d220
[VPlan] Add incoming_[blocks,values] iterators to VPPhiAccessors (NFC) (#138472)
Add 3 new iterator ranges to VPPhiAccessors

* incoming_values(): returns a range over the incoming
  values of a phi 
* incoming_blocks(): returns a range over the incoming 
  blocks of a phi
* incoming_values_and_blocks: returns a range over pairs of
   incoming values and blocks.

Depends on https://github.com/llvm/llvm-project/pull/124838.

PR: https://github.com/llvm/llvm-project/pull/138472
2025-08-14 16:47:04 +01:00
Jakub Kuderski
1633e0ba8b
[ADT] Add from_range constructor for (Small)DenseMap (#153515)
This follows how we support range construction for (Small)DenseSet.
2025-08-14 08:53:52 -04:00
Lang Hames
3bc3b4cf5f [ORC] Add cloneExternalModuleToContext API.
cloneExternalModuleToContext can be used to clone an LLVM module onto a given
ThreadSafeContext. Callers of this function are responsible for ensuring
exclusive access to the source module and its LLVMContext.
2025-08-14 21:21:17 +10:00
Aiden Grossman
47e62e846b Revert "[APFloat] Fix getExactInverse for DoubleAPFloat"
This reverts commit f4941319cba19d7691baa6ec783c84be4d847637.

This broke llvm/test/CodeGen/Thumb2/mve-vcvt-fixed-to-float.ll which
took out a ton of buildbots and also broke premerge.
2025-08-14 04:39:50 +00:00
Pedro Lobo
08eff57444
[ADT] Add signed and unsigned mulExtended to APInt (#153399)
Adds `mulsExtended` and `muluExtended` methods to `APInt`, as suggested
in #153293.
These are based on the `MULDQ` and `MULUDQ` x86 intrinsics.
2025-08-13 22:37:33 +01:00
David Majnemer
f4941319cb [APFloat] Fix getExactInverse for DoubleAPFloat
Some background: getExactInverse()'s callers expect that
the result is not subnormal.

DoubleAPFloat implemented getExactInverse() by going through
semPPCDoubleDoubleLegacy.

This means that numbers like 0x1p1022 which would have a normal inverse
in semPPCDoubleDouble would not in semPPCDoubleDoubleLegacy.

This commit refactors the logic into a single method on APFloat which
uses getExactLog2Abs() and scalbn() to calculate the inverse without
having to compute a reciprocal and test if it is inexact.
This approach works for both IEEEFloat and DoubleAPFloat.
2025-08-13 11:39:53 -07:00
Steven Wu
7350112c40
[CAS] Disable CAS unittests that requires threads (#153434)
Disable parallel CAS tests when LLVM is configured to not having
threads.
2025-08-13 11:11:56 -07:00
Steven Wu
3ecd331808
[CAS] Fix MSVC warning after #114096 (#153430)
Correct use of `SCOPED_TRACE` and fix MSVC warning.
2025-08-13 09:51:05 -07:00
Orlando Cazalet-Hyams
f316009997
[RemoveDIs][NFC] Remove more dbg.assign intrinsics code paths (#153371) 2025-08-13 16:37:04 +01:00
Nikita Popov
48beed5b71
Revert "[AArch64][SME] Port all SME routines to RuntimeLibcalls" (#153392)
This introduced a 5% compile-time regression on AArch64, see
https://llvm-compile-time-tracker.com/compare.php?from=b9138bde3562de5c28a239dbd303caf2406678c6&to=271688b87abe7cf45aceaff8266270a25eb7b436&stat=instructions:u.

Reverts llvm/llvm-project#152505.
2025-08-13 11:54:39 +00:00
Benjamin Maxwell
271688b87a
[AArch64][SME] Port all SME routines to RuntimeLibcalls (#152505)
This updates everywhere we emit/check an SME routines to use
RuntimeLibcalls to get the function name and calling convention.

Note: RuntimeLibcallEmitter had some issues with emitting non-unique
variable names for sets of libcalls, so I tweaked the output to avoid
the need for variables.
2025-08-13 08:48:59 +01:00
David Majnemer
acef1db3b2 [APFloat] Remove some overly optimistic assertions
An earlier draft of DoubleAPFloat::convertToSignExtendedInteger had
arranged for overflow to be handled in a different way.  However, these
assertions are now possible if Hi+Lo are out of range and Lo != 0.

A test has been added to defend against a regression.
2025-08-12 18:32:58 -07:00
David Majnemer
f6d143fd1f [APFloat] Properly implement frexp(DoubleAPFloat)
The prior implementation did not consider that the Lo component may
underflow when it undergoes scaling.  This means that we need to
carefully handle things like binade crossings or how to handle
roundTowardZero when Hi and Lo have different signs.

Particularly annoying is roundTiesToAway when Hi and Lo have different
signs.  It basically requires us to implement roundTiesTowardZero.
2025-08-12 17:03:27 -07:00
David Majnemer
e722ef4956 Reapply "[APFloat] Properly implement DoubleAPFloat::convertToSignExtendedInteger"
This reverts commit 8b44945a9231d4d7be0858a1c5d9c13d397bc512.

The compilation failure under !NDEBUG has been fixed.
2025-08-12 17:01:49 -07:00
Philip Reames
4d629f9744
[MIR] Remove std::variant from multiple save/restore point handling [nfc] (#153226)
In review of bbde6b, I had originally proposed that we support the
legacy text format. As review evolved, it bacame clear this had been a
bad idea (too much complexity), but in order to let that patch finally
move forward, I approved the change with the variant. This change undoes
the variant, and updates all the tests to just use the array form.
2025-08-12 11:23:05 -07:00
Steven Wu
dda996b875
[CAS] Add LLVMCAS library with InMemoryCAS implementation (#114096)
Add llvm::cas::ObjectStore abstraction and InMemoryCAS as a in-memory
CAS object store implementation.

The ObjectStore models its objects as:
* Content: An array of bytes for the data to be stored.
* Refs: An array of references to other objects in the ObjectStore.
And each CAS Object can be idenfied with an unqine ID/Hash.

ObjectStore supports following general action:
* Expected<ID> store(Content, ArrayRef<Ref>)
* Expected<Ref> get(ID)

It also introduces following types to interact with a CAS ObjectStore:
* CASID: Hash representation for an CAS Objects with its context to help
  print/compare CASIDs.
* ObjectRef: A light-weight ref for an object in the ObjectStore. It is
  implementation defined so it can be optimized for
  read/store/references depending on the implementation.
* ObjectProxy: A proxy for the users of CAS to interact with the data
  inside CAS Object. It bundles a ObjectHandle and an ObjectStore
  instance.
2025-08-12 10:25:43 -07:00
Nikita Popov
9d96d01b42 [IR] Add offset stripping test with mixed const/variable offsets (NFC)
Regression test for:
a7edc95c79 (commitcomment-163691175)
2025-08-12 15:12:16 +02:00
Andreas Jonson
ca7ffaaeeb [ConstantRange] add nuw support to truncate (NFC) (#152990) 2025-08-12 12:26:35 +02:00
Kazu Hirata
f90ded5b94
[ADT] Reduce memory allocation in SmallPtrSet::reserve() (#153126)
Previously, reserve() allocated double the required number of buckets.
For example, for NumEntries in the range [49, 96], it would reserve
256 buckets when only 128 are needed to maintain the load factor.

This patch removes "+ 1" in the NewSize calculation.
2025-08-11 22:51:32 -07:00
Helena Kotas
fb1035cfb4
[DirectX] Fix resource binding analysis incorrectly removing duplicates (#152253)
The resource binding analysis was incorrectly reducing the size of the
`Bindings` vector by one element after sorting and de-duplication. This
led to an inaccurate setting of the `HasOverlappingBinding` flag in the
`DXILResourceBindingInfo` analysis, as the truncated vector no longer
reflected the true binding state.

This update corrects the shrink logic and introduces an `assert` in the
`DXILPostOptimizationValidation` pass. The assertion will trigger if
`HasOverlappingBinding` is set but no corresponding error is detected,
helping catch future inconsistencies.

The bug surfaced when the `srv_metadata.hlsl` and `uav_metadata.hlsl`
tests were updated to include unbounded resource arrays as part of
https://github.com/llvm/llvm-project/issues/145422. These updated test
files are included in this PR, as they would cause the new assertion to
fire if the original issue remained unresolved.

Depends on #152250
2025-08-11 10:53:00 -07:00
Yingwei Zheng
84b31581f8
Revert "[PatternMatch] Add m_[Shift]OrSelf matchers." (#152953)
Reverts llvm/llvm-project#152924
According to
f67668b586,
it is not an NFC change.
2025-08-11 09:35:16 +02:00
Nikita Popov
35bad229c1
[PredicateInfo] Use bitcast instead of ssa.copy (#151174)
PredicateInfo needs some no-op to which the predicate can be attached.
Currently this is an ssa.copy intrinsic. This PR replaces it with a
no-op bitcast.
    
Using a bitcast is more efficient because we don't have the overhead of
an overloaded intrinsic. It also makes things slightly simpler overall.
2025-08-11 09:25:01 +02:00
Yingwei Zheng
1c499351d6
[PatternMatch] Add m_[Shift]OrSelf matchers. (#152924)
Address the comment
https://github.com/llvm/llvm-project/pull/147414/files#r2228612726.
As they are usually used to match integer packing patterns, it is enough
to handle constant shamts.
2025-08-11 09:58:16 +08:00