60265 Commits

Author SHA1 Message Date
David Green
5db67e1c86
[GlobalISel] Add a fadd 0.0 combine with nsz (#153748)
This is surprisingly helpful, coming up a lot from fadd reductions.
2025-08-21 10:19:39 +01:00
Benjamin Maxwell
810ea69edd
[LiveRegUnits] Exclude runtime defined liveins when computing liveouts (#154325)
These liveins are not defined by predecessors, so should not be 
considered as liveouts in predecessor blocks. This resolves:

- https://github.com/llvm/llvm-project/pull/149062#discussion_r2285072001
- https://github.com/llvm/llvm-project/pull/153417#issuecomment-3199972351
2025-08-21 09:06:32 +01:00
Dominik Adamski
b69fd34e76
[Offload] Add oneInterationPerThread param to loop device RTL (#151959)
Currently, Flang can generate no-loop kernels for all OpenMP target
kernels in the program if the flags
-fopenmp-assume-teams-oversubscription or
-fopenmp-assume-threads-oversubscription are set.
If we add an additional parameter, we can choose
in the future which OpenMP kernels should be generated as no-loop
kernels.

This PR doesn't modify current behavior of oversubscription flags.

RFC for no-loop kernels:
https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517
2025-08-21 09:03:56 +02:00
Steven Wu
deab049b5c
[CAS] Add ActionCache to LLVMCAS Library (#114097)
ActionCache is used to store a mapping from CASID to CASID. The current
implementation of the ActionCache can only be used to associate the
key/value from the same hash context.

ActionCache has two operations: `put` to store the key/value and `get`
to
lookup the key/value mapping. ActionCache uses the same TrieRawHashMap
data structure to store the mapping, where is CASID of the key is the
hash to index the map.

While CASIDs for key/value are often associcate with actual CAS
ObjectStore, it doesn't provide the guarantee of the existence of such
object in any ObjectStore.
2025-08-20 14:42:44 -07:00
David Majnemer
0a7eabcc56 Reapply "[APFloat] Fix getExactInverse for DoubleAPFloat"
The previous implementation of getExactInverse used the following check
to identify powers of two:

  // Check that the number is a power of two by making sure that only the
  // integer bit is set in the significand.
  if (significandLSB() != semantics->precision - 1)
    return false;

This condition verifies that the only set bit in the significand is the
integer bit, which is correct for normal numbers. However, this logic is
not correct for subnormal values.

APFloat represents subnormal numbers by shifting the significand right
while holding the exponent at its minimum value. For a power of two in
the subnormal range, its single set bit will therefore be at a position
lower than precision - 1. The original check would consequently fail,
causing the function to determine that these numbers do not have an
exact multiplicative inverse.

The new logic calculated this correctly but it seems that
test/CodeGen/Thumb2/mve-vcvt-fixed-to-float.ll expected the old
behavior.

Seeing as how getExactInverse does not have tests or documentation, we
conservatively maintain (and document) this behavior.

This reverts commit 47e62e846beb267aad50eb9195dfd855e160483e.
2025-08-20 14:02:36 -07:00
Philip Reames
e6b4a21849
[IR] Add utilities for manipulating length of MemIntrinsic [nfc] (#153856)
Goal is simply to reduce direct usage of getLength and setLength so that
if we end up moving memset.pattern (whose length is in elements) there
are fewer places to audit.
2025-08-20 13:50:11 -07:00
Florian Hahn
4e6c88be7c
[TTI] Remove Args argument from getOperandsScalarizationOverhead (NFC). (#154126)
Remove the ArrayRef<const Value*> Args operand from
getOperandsScalarizationOverhead and require that the callers
de-duplicate arguments and filter constant operands.

Removing the Value * based Args argument enables callers where no Value
* operands are available to use the function in a follow-up: computing
the scalarization cost directly for a VPlan recipe.

It also allows more accurate cost-estimates in the future: for example,
when vectorizing a loop, we could also skip operands that are live-ins,
as those also do not require scalarization.

PR: https://github.com/llvm/llvm-project/pull/154126
2025-08-20 21:09:08 +01:00
Finn Plummer
15babbaf5d
[DirectX] Add boilerplate integration of objcopy for DXContainerObjectFile (#153079)
This pr implements the boiler plate required to use `llvm-objcopy` for
`DXContainer` object files.

It defines a minimal structure `object` to represent the `DXContainer`
header and the following parts.
This structure is a simple representation of the object data to allow
for simple modifications at the granularity of each part. It follows
similarily to how the respective `object`s are defined for `ELF`,
`wasm`, `XCOFF`, etc.

This is the first step to implement
https://github.com/llvm/llvm-project/issues/150275 and
https://github.com/llvm/llvm-project/issues/150277 as compiler actions
that invoke `llvm-objcopy` for functionality.
2025-08-20 10:58:42 -07:00
Krzysztof Parzyszek
15cb06109d
[Frontend][OpenMP] Allow multiple occurrences of DYN_GROUPPRIVATE (#154549)
It was mistakenly placed in "allowOnceClauses" on the constructs that
allow it.
2025-08-20 10:56:30 -05:00
Steven Wu
2cfba9678d
[FileSystem] Allow exclusive file lock (#114098)
Add parameter to file lock API to allow exclusive file lock. Both Unix
and Windows support lock the file exclusively for write for one process
and LLVM OnDiskCAS uses exclusive file lock to coordinate CAS creation.
2025-08-20 08:32:18 -07:00
Mehdi Amini
8b2028ced6
Update log_level for LLVM_DEBUG and associated macros (#154525)
During the review of #150855 we switched from 0 to 1 for the default log
level used, but this macro wasn't updated.
2025-08-20 13:31:13 +00:00
Nikita Popov
99119a5a81 [OpenMPIRBuilder] Add missing LLVM_ABI annotations 2025-08-20 15:19:08 +02:00
Nikita Popov
822496db7f [SampleContextTracker] Add missing LLVM_ABI annotations 2025-08-20 15:19:08 +02:00
Nikita Popov
7eb5031e2c [GlobalDCE] Add missing LLVM_ABI annotation 2025-08-20 15:19:08 +02:00
Nikita Popov
6a99ad2975 [Debug] Add missing LLVM_ABI annotations 2025-08-20 15:19:08 +02:00
Zhaoxuan Jiang
2738828c0e
[Reland] [CGData] Lazy loading support for stable function map (#154491)
This is an attempt to reland #151660 by including a missing STL header
found by a buildbot failure.

The stable function map could be huge for a large application. Fully
loading it is slow and consumes a significant amount of memory, which is
unnecessary and drastically slows down compilation especially for
non-LTO and distributed-ThinLTO setups. This patch introduces an opt-in
lazy loading support for the stable function map. The detailed changes
are:

- `StableFunctionMap`
- The map now stores entries in an `EntryStorage` struct, which includes
offsets for serialized entries and a `std::once_flag` for thread-safe
lazy loading.
- The underlying map type is changed from `DenseMap` to
`std::unordered_map` for compatibility with `std::once_flag`.
- `contains()`, `size()` and `at()` are implemented to only load
requested entries on demand.

- Lazy Loading Mechanism
- When reading indexed codegen data, if the newly-introduced
`-indexed-codegen-data-lazy-loading` flag is set, the stable function
map is not fully deserialized up front. The binary format for the stable
function map now includes offsets and sizes to support lazy loading.
- The safety of lazy loading is guarded by the once flag per function
hash. This guarantees that even in a multi-threaded environment, the
deserialization for a given function hash will happen exactly once. The
first thread to request it performs the load, and subsequent threads
will wait for it to complete before using the data. For single-threaded
builds, the overhead is negligible (a single check on the once flag).
For multi-threaded scenarios, users can omit the flag to retain the
previous eager-loading behavior.
2025-08-20 06:15:04 -07:00
jyli0116
9df7ca1f0f
[GlobalISel] Legalize Saturated Truncate instructions and intrinsics (#154340)
Adds legalization support for `G_TRUNC_SSAT_S`, `G_TRUNC_SSAT_S`,
`G_TRUNC_USAT_U` instructions for GlobalISel.
2025-08-20 10:37:22 +01:00
Gang Chen
ef68d1587d
[AMDGPU] upstream barrier count reporting part1 (#154409) 2025-08-19 16:42:31 -07:00
Krzysztof Parzyszek
292faf6133
[Frontend][OpenMP] Add definition of groupprivate directive (#153799)
This is the common point for clang and flang implementations.
2025-08-19 08:27:29 -05:00
Orlando Cazalet-Hyams
da45b6c71d
[RemoveDIs][NFC] Remove dbg intrinsic version of calculateFragmentIntersect (#153378) 2025-08-19 13:44:25 +01:00
David Green
a7df02f83c
[InstCombine] Make strlen optimization more resilient to different gep types. (#153623)
This makes the optimization in optimizeStringLength for strlen(gep
@glob, %x) -> sub endof@glob, %x a little more resilient, and maybe a
bit more correct for geps with non-array types.
2025-08-19 10:37:17 +01:00
Aditi Medhane
948abf1bf5
[PowerPC] Add BCDCOPYSIGN and BCDSETSIGN Instruction Support (#144874)
Support the following BCD format conversion builtins for PowerPC.

- `__builtin_bcdcopysign` – Conversion that returns the decimal value of
the first parameter combined with the sign code of the second parameter.
`
- `__builtin_bcdsetsign` – Conversion that sets the sign code of the
input parameter in packed decimal format.

> Note: This built-in function is valid only when all following
conditions are met:
> -qarch is set to utilize POWER9 technology.
> The bcd.h file is included.

## Prototypes

```c
vector unsigned char __builtin_bcdcopysign(vector unsigned char, vector unsigned char);
vector unsigned char __builtin_bcdsetsign(vector unsigned char, unsigned char);
```

## Usage Details

`__builtin_bcdsetsign`: Returns the packed decimal value of the first
parameter combined with the sign code.
The sign code is set according to the following rules:
- If the packed decimal value of the first parameter is positive, the
following rules apply:
     - If the second parameter is 0, the sign code is set to 0xC.
     - If the second parameter is 1, the sign code is set to 0xF.
- If the packed decimal value of the first parameter is negative, the
sign code is set to 0xD.
> notes:
>     The second parameter can only be 0 or 1.
> You can determine whether a packed decimal value is positive or
negative as follows:
> - Packed decimal values with sign codes **0xA, 0xC, 0xE, or 0xF** are
interpreted as positive.
> - Packed decimal values with sign codes **0xB or 0xD** are interpreted
as negative.

---------

Co-authored-by: Aditi-Medhane <aditi.medhane@ibm.com>
2025-08-19 14:47:27 +05:30
David Sherwood
13d8ba7dea
[LV][TTI] Calculate cost of extracting last index in a scalable vector (#144086)
There are a couple of places in the loop vectoriser where we
want to calculate the cost of extracting the last lane in a
vector. However, we wrongly assume that asking for the cost
of extracting lane (VF.getKnownMinValue() - 1) is an accurate
representation of the cost of extracting the last lane. For
SVE at least, this is non-trivial as it requires the use of
whilelo and lastb instructions.

To solve this problem I have added a new
getReverseVectorInstrCost interface where the index is used
in reverse from the end of the vector. Suppose a vector has
a given ElementCount EC, the extracted/inserted lane would be
EC - 1 - Index. For scalable vectors this index is unknown at
compile time. I've added a AArch64 hook that better represents
the cost, and also a RISCV hook that maintains compatibility
with the behaviour prior to this PR.

I've also taken the liberty of adding support in vplan for
calculating the cost of VPInstruction::ExtractLastElement.
2025-08-19 09:31:37 +01:00
Matt Arsenault
19ebfa6d0b
RuntimeLibcalls: Move exception call config to tablegen (#151948)
Also starts pruning out these calls if the exception model is
forced to none.

I worked backwards from the logic in addPassesToHandleExceptions
and the pass content. There appears to be some tolerance
for mixing and matching exception modes inside of a single module.
As far as I can tell _Unwind_CallPersonality is only relevant for
wasm, so just add it there.

As usual, the arm64ec case makes things difficult and is
missing test coverage. The set of calls in list form is necessary
to use foreach for the duplication, but in every other context a
dag is more convenient. You cannot use foreach over a dag, and I
haven't found a way to flatten a dag into a list.

This removes the last manual setLibcallImpl call in generic code.
2025-08-19 10:35:59 +09:00
Matt Arsenault
fe67267d19
MSP430: Move __mspabi_mpyll calling conv config to tablegen (#153988)
There are several libcall choices for MUL_I64 which depend on the
subtarget, but this is the base case. The manual custom ISelLowering
is still overriding the decision until we have a way to control
lowering choices, but we can still get the calling convention
set for now.
2025-08-19 10:25:10 +09:00
Mehdi Amini
89abccc9a6
[MLIR] Update GreedyRewriter to use the LDBG() debug log mechanism (NFC) (#153961)
Also improve a bit the LDBG() implementation
2025-08-18 21:05:34 +00:00
Krzysztof Parzyszek
8429f7faaa
[flang][OpenMP] Parsing support for DYN_GROUPPRIVATE (#153615)
This does not perform semantic checks or lowering.
2025-08-18 13:35:02 -05:00
Tobias Stadler
8135b7c1ab
[LV] Emit all remarks for unvectorizable instructions (#153833)
If ExtraAnalysis is requested, emit all remarks caused by unvectorizable instructions - instead of only the first.
This is in line with how other places handle DoExtraAnalysis and it can be quite helpful to get info about all instructions in a loop that prevent vectorization.
2025-08-18 18:04:53 +01:00
Damyan Pepper
cc49f3b3e1
[NFC][HLSL] Remove confusing enum aliases / duplicates (#153909)
Remove:

* DescriptorType enum - this almost exactly shadowed the ResourceClass
enum
* ClauseType aliased ResourceClass

Although these were introduced to make the HLSL root signature handling
code a bit cleaner, they were ultimately causing confusion as they
appeared to be unique enums that needed to be converted between each
other.

Closes #153890
2025-08-18 08:58:33 -07:00
Alex MacLean
d12f58ff11
[NVVM] Add various intrinsic attrs, cleanup and consolidate td (#153436)
- llvm.nvvm.reflect - Use a PureIntrinsic for (adding speculatable),
this will be replaced by a constant prior to lowering so speculation is
fine.
- llvm.nvvm.tex.* - Add [IntrNoCallback, IntrNoFree, IntrWillReturn]
- llvm.nvvm.suld.* - Add [IntrNoCallback, IntrNoFree] and
[IntrWillReturn] when not using "clamp" mode
- llvm.nvvm.sust.* - Add [IntrNoCallback, IntrNoFree, IntrWriteMem] and
[IntrWillReturn] when not using "clamp" mode
- llvm.nvvm.[suq|txq|istypep].* - Use DefaultAttrsIntrinsic
- llvm.nvvm.read.ptx.sreg.* - Add [IntrNoFree, IntrWillReturn] to
non-constant reads as well.
2025-08-18 08:33:23 -07:00
Kazu Hirata
07eb7b7692
[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
2025-08-18 07:01:29 -07:00
Benjamin Maxwell
81c06d198e
Reland "[AArch64][SME] Port all SME routines to RuntimeLibcalls" (#153417)
This updates everywhere we emit/check an SME routines to use
RuntimeLibcalls to get the function name and calling convention.
2025-08-18 14:53:40 +01:00
jofrn
e8e3e6e893
[LiveVariables] Mark use without implicit if defined at instr (#119446)
LiveVariables will mark instructions with their implicit subregister
uses. However, it will also mark the subregister as an implicit if its
own definition is a subregister of it, i.e. `$r3 = OP val, implicit-def
$r0_r1_r2_r3, ..., implicit $r2_r3`, even if it is otherwise unused,
which defines $r3 on the same line it is used.

This change ensures such uses are marked without implicit, i.e. `$r3 =
OP val, implicit-def $r0_r1_r2_r3, ..., $r2_r3`.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-08-18 08:34:59 -04:00
Nikita Popov
238c3dcd0d
[CodeGen][Mips] Remove fp128 libcall list (#153798)
Mips requires fp128 args/returns to be passed differently than i128. It
handles this by inspecting the pre-legalization type. However, for soft
float libcalls, the original type is currently not provided (it will
look like a i128 call). To work around that, MIPS maintains a list of
libcalls working on fp128.

This patch removes that list by providing the original, pre-softening
type to calling convention lowering. This is done by carrying additional
information in CallLoweringInfo, as we unfortunately do need both types
(we want the un-softened type for OrigTy, but we need the softened type
for the actual register assignment etc.)

This is in preparation for completely removing all the custom
pre-analysis code in the Mips backend and replacing it with use of
OrigTy.
2025-08-18 09:22:41 +02:00
Kazu Hirata
b6a62a496f
[ADT] Use range-based for loops in SetVector (NFC) (#154058) 2025-08-17 23:46:43 -07:00
Jim Lin
9c02d66255
[LegalizeTypes][VP] Teach isVPBinaryOp to recognize vp.sadd/saddu/ssub/ssubu.sat (#154047)
Those vp intrinsics also are vp binary operations. Similar to
https://reviews.llvm.org/D135753.
2025-08-18 13:10:00 +08:00
Nadharm
83a1b40b16
[NFC] Fix unary minus operator on unsigned type warning (#153887)
Fixes: `warning C4146: unary minus operator applied to unsigned type,
result still unsigned`
2025-08-17 20:08:44 -07:00
Carl Ritson
97d5d483ec
[MsgPack] Add code for floating point assignment and writes (#153544)
Allow assignment of float to DocType and support output of float in
writeToBlob method.
Expand tests coverage to various missing basic I/O operations.

Co-authored-by: Xavi Zhang <Xavi.Zhang@amd.com>
2025-08-18 10:03:40 +09:00
Fangrui Song
34c7b7ccae MCSymbol: Remove setUndefined
The name is misleading, as setting Fragment to nullptr does not
necessarily make it undefined - common and equated symbols have
a nullptr fragment as well.
2025-08-17 15:57:27 -07:00
Fangrui Song
2cedb286b8 MCSymbol: Remove unused IsTarget parameter from declareCommon 2025-08-16 15:47:39 -07:00
Fangrui Song
aa96e20dce MCSymbol: Remove AMDGPU-specific Kind::TargetCommon
The SymContentsTargetCommon kind introduced by
https://reviews.llvm.org/D61493 lackes significant and should be treated
as a regular common symbol with a different section index.

Update ELFObjectWriter to respect the specified section index.
The new representation also works with Hexagon's SHN_HEXAGON_SCOMMON.
2025-08-16 15:39:33 -07:00
Fangrui Song
190778a8ba MCSymbol: Rename SymContents to kind
The names "SymbolContents" and "SymContents*" members are confusing.
Rename to kind and Kind::XXX similar to lld/ELF/Symbols.h

Rename SymContentsVariable to Kind::Equated as the former term is
"equated symbol", not "variable".
2025-08-16 15:10:35 -07:00
Kazu Hirata
1c8da29f48
[ADT] Use small_buckets() in SmallPtrSetImpl::remove_if (NFC) (#153962) 2025-08-16 13:15:36 -07:00
Fangrui Song
1893caa9bc MCSymbol: Decrease the bitfield size of SymbolContents
Follow-up to 57b0843f68f5f349c73d1bf54e321a1a6d1800bf

The size of MCSymbol has been reduced to 24 bytes on 64-bit systems.
2025-08-16 10:43:05 -07:00
Mircea Trofin
c971c25544
[licm] don't drop MD_prof when dropping other metadata (#152420)
Part of Issue #147390
2025-08-16 07:26:13 -07:00
Kazu Hirata
0ede7ace0d
[ADT] Use llvm::copy in SmallPtrSet.cpp (NFC) (#153930)
This patch uses llvm::copy in combination with buckets() and
small_buckets().
2025-08-16 06:47:18 -07:00
Mingjie Xu
a293573c4e
[SSAUpdater] Only iterate blocks modified by CheckIfPHIMatches() in RecordMatchingPHIs() (#153596)
In https://github.com/llvm/llvm-project/pull/100281, we use
`TaggedBlocks` to record blocks modified by `CheckIfPHIMatche()`, so do
not need to clear every block in `BlockList` if `CheckIfPHIMatches()`
match failed.

If `CheckIfPHIMatches()` match succeed, we can reuse `TaggedBlocks` to
only record matching PHIs for modified blocks, avoid checking every
block in `BlockList` to see if `PHITag` is set.
2025-08-16 19:59:10 +08:00
Kazu Hirata
627f8018fe
[ADT] Rename NumNonEmpty to NumEntries in SmallPtrSet (NFC) (#153757)
Without this patch, we use NumNonEmpty, which keeps track of the
number of valid entries plus tombstones even though we have a separate
variable to keep track of the number of tombstones.

This patch simplifies the metadata.  Specifically, it changes the name
and semantics of the variable to NumEntries to keep track of the
number of valid entries.

The difference in semantics requires some code changes aside from
mechanical replacements:

- size() just returns NumEntries.

- erase_imp() and remove_if() need to decrement NumEntries in the
  large mode.

- insert_imp_big() increments NumEntries for successful insertions,
  regardless of whether a tombstone is being replaced with a valid
  entry.  It also computes the number of non-tombstone empty slots as:

  CurArraySize - NumEntries - NumTombstones

- Grow() no longer needs NumNonEmpty -= NumTombstones.

Overall, the resulting code should look more intuitive and more
consistent with DenseMapSet.
2025-08-15 21:22:37 -07:00
joaosaffran
37729d8ceb
[HLSL] Refactoring DXILABI.h to not depend on scope printer (#153840)
This patch refactors DXILABI to remove the dependency on scope printer. 
Closes: #153827

---------

Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com>
2025-08-15 21:33:44 -04:00
Matt Arsenault
3e5d8a1439 Reapply "RuntimeLibcalls: Generate table of libcall name lengths (#153… (#153864)
This reverts commit 334e9bf2dd01fbbfe785624c0de477b725cde6f2.

Check if llvm-nm exists before building the benchmark.
2025-08-16 09:53:50 +09:00