981 Commits

Author SHA1 Message Date
Sam Elliott
0d08cb0e70
[outliners] Turn nooutline into an Enum Attribute (#163665)
This change turns the `"nooutline"` attribute into an enum attribute
called `nooutline`, and adds an auto-upgrader for bitcode to make the
same change to existing IR.

This IR attribute disables both the Machine Outliner (enabled at Oz for
some targets), and the IR Outliner (disabled by default).
2026-02-10 21:44:17 -08:00
Matt Arsenault
2502e3b7ba
IR: Promote "denormal-fp-math" to a first class attribute (#174293)
Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first
class denormal_fpenv attribute. Previously the query for the effective
denormal mode involved two string attribute queries with parsing. I'm
introducing more uses of this, so it makes sense to convert this
to a more efficient encoding. The old representation was also awkward
since it was split across two separate attributes. The new encoding
just stores the default and float modes as bitfields, largely avoiding
the need to consider if the other mode is set.

The syntax in the common cases looks like this:
  `denormal_fpenv(preservesign,preservesign)`
  `denormal_fpenv(float: preservesign,preservesign)`
  `denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)`

I wasn't sure about reusing the float type name instead of adding a
new keyword. It's parsed as a type but only accepts float. I'm also
debating switching the name to subnormal to match the current
preferred IEEE terminology (also used by nofpclass and other
contexts).

This has a behavior change when using the command flag debug
options to set the denormal mode. The behavior of the flag
ignored functions with an explicit attribute set, per
the default and f32 version. Now that these are one attribute,
the flag logic can't distinguish which of the two components
were explicitly set on the function. Only one test appeared to
rely on this behavior, so I just avoided using the flags in it.

This also does not perform all the code cleanups this enables.
In particular the attributor handling could be cleaned up.

I also guessed at how to support this in MLIR. I followed
MemoryEffects as a reference; it appears bitfields are expanded
into arguments to attributes, so the representation there is
a bit uglier with the 2 2-element fields flattened into 4 arguments.
2026-02-05 13:31:26 +00:00
Vladislav Dzhidzhoev
b9cecee3fb
Reland "[DebugMetadata][DwarfDebug] Support function-local types in lexical block scopes (4/7)" (#165032)
This is an attempt to merge https://reviews.llvm.org/D144006 with LTO
fix.

The last merge attempt was
https://github.com/llvm/llvm-project/pull/75385.
The issue with it was investigated in
https://github.com/llvm/llvm-project/pull/75385#issuecomment-2386684121.
The problem happens when 
1. Several modules are being linked.
2. There are several DISubprograms that initially belong to different
modules but represent the same source code function (for example, a
function included from the same source code file).
3. Some of such DISubprograms survive IR linking. It may happen if one
of them is inlined somewhere or if the functions that have these
DISubprograms attached have internal linkage.
4. Each of these DISubprograms has a local type that corresponds to the
same source code type. These types are initially from different modules,
but have the same ODR identifier.

If the same (in the sense of ODR identifier/ODR uniquing rules) local
type is present in two modules, and these modules are linked together,
the type gets uniqued. A DIType, that happens to be loaded first,
survives linking, and the references on other types with the same ODR
identifier from the modules loaded later are replaced with the
references on the DIType loaded first. Since defintion subprograms, in
scope of which these types are located, are not deduplicated, the linker
output may contain multiple DISubprogram's having the same (uniqued)
type in their retainedNodes lists.
Further compilation of such modules causes crashes.

To tackle that,
* previous solution to handle LTO linking with local types in
retainedNodes is removed (cloneLocalTypes() function),
* for each loaded distinct (definition) DISubprogram, its retainedNodes
list is scanned after loading, and DITypes with a scope of another
subprogram are removed. If something from a Function corresponding to
the DISubprogram references uniqued type, we rely on cross-CU links.

Additionally:
* a check is added to Verifier to report about local types located in a
wrong retainedNodes list,

Original commit message follows.
---------

RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544

Similar to imported declarations, the patch tracks function-local types in
DISubprogram's 'retainedNodes' field. DwarfDebug is adjusted in accordance with
the aforementioned metadata change and provided a support of function-local
types scoped within a lexical block.

The patch assumes that DICompileUnit's 'enums field' no longer tracks local
types and DwarfDebug would assert if any locally-scoped types get placed there.

Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Co-authored-by: Jeremy Morse <jeremy.morse@sony.com>
2026-02-04 00:34:52 +01:00
Teresa Johnson
b30971c4bb
[ThinLTO] Remove unused relative block frequency support (#177215)
This removes most of the handling of the relative block frequency
support added in 2018 in c73cec84c99e5a63dca961fef67998a677c53a3c, which
was disabled by default and never utilized in the thin link as expected.

Support for reading old Bitcode containing the record is maintained as
required for backwards compatibility requirements, as is the support for
parsing old LLVM assembly containing that information. Tests ensure that
this backwards compatibility is maintained.

This came up in the context of redundant BFI/DT computations which
existed largely for the purpose of computing this information
and are being addressed in PR176646.
2026-01-21 11:39:57 -08:00
Aiden Grossman
e2d7cd685d
[IR] Make dead_on_return attribute optionally sized
This patch makes the dead_on_return parameter attribute optionally require a number
of bytes to be passed in to specify the number of bytes known to be dead
upon function return/unwind. This is aimed at enabling annotating the
this pointer in C++ destructors with dead_on_return in clang. We need
this to handle cases like the following:

```
struct X {
  int n;
  ~X() {
    this[n].n = 0;
  }
};
void f() {
  X xs[] = {42, -1};
}
```

Where we only certain that sizeof(X) bytes are dead upon return of ~X.
Otherwise DSE would be able to eliminate the store in ~X which would not
be correct.

This patch only does the wiring within IR. Future patches will make
clang emit correct sizing information and update DSE to only delete
stores to objects marked dead_on_return that are provably in bounds of
the number of bytes specified to be dead_on_return.

Reviewers: nikic, alinas, antoniofrighetto

Pull Request: https://github.com/llvm/llvm-project/pull/171712
2026-01-21 08:22:05 -08:00
Luke Lau
cee36b23cc
[IR] Allow non-constant offsets in @llvm.vector.splice.{left,right} (#174693)
Following on from #170796, this PR implements the second part of
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974
by allowing non-constant offsets in the vector splice intrinsics.

Previously @llvm.vector.splice had a restriction enforced by the
verifier that the offset had to be known to be within the range of the
vector at compile time. Because we can't enforce this with non-constant
offsets, it's been relaxed so that offsets that would slide the vector
out of bounds return a poison value, similar to
insertelement/extractelement.

@llvm.vector.splice.left also previously only allowed offsets within the
range 0 <= Offset < N, but this has been relaxed to 0 <= Offset <= N so
that it's consistent with @llvm.vector.splice.right.

In lieu of the verifier checks that were removed, InstSimplify has been
taught to fold splices to poison when the offset is out of bounds.

The cost model isn't implemented in this PR, and just returns invalid
for any non-constant offsets for now. I think the correct way to cost
these non-constant offets isn't through getShuffleCost because they
can't handle variable masks, but instead just through
getIntrinsicInstCost.
2026-01-21 10:58:40 +00:00
Matt Arsenault
0d4a35d560
IR: Remove llvm.convert.to.fp16 and llvm.convert.from.fp16 intrinsics (#174484)
These are long overdue for removal. These were originally a hack
to support loading half values before there was any / decent support
for the half type through the backend. There's no reason to continue
supporting these, they're equivalent to fpext/fptrunc with a bitcast.

SelectionDAG stopped translating these directly, and used the
bitcast + fp cast since f7a02c17628e825, so there's been no reason
to use these since 2014.
2026-01-21 09:50:28 +00:00
Nikita Popov
af7c10618b [BitcodeReader] Improve error messages
Avoid using "Invalid record" for all errors. At least mention
what kind of record it is.
2026-01-19 14:28:40 +01:00
Shilei Tian
5a63367b15
Reapply "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674) (#174697)
This reverts commit 0b2f3cfb72a76fa90f3ec2a234caabe0d0712590.
2026-01-07 06:12:19 +00:00
dyung
0b2f3cfb72
Revert "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674)
Reverts llvm/llvm-project#174310

This change is causing 2 cross-project-test failures on
https://lab.llvm.org/buildbot/#/builders/174/builds/29695
2026-01-07 01:18:23 +00:00
Shilei Tian
ccca3b8c67
[AMDGPU] Rework the clamp support for WMMA instructions (#174310)
Fixes #166989.
2026-01-06 15:46:40 -05:00
Luke Lau
ad4bfac732
[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796)
This PR implements the first change outlined in
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel

In order to allow non-immediate offsets in the llvm.vector.splice
intrinsic, we need to separate out the "shift left" and "shift right"
modes into two separate intrinsics, which were previously determined by
whether or not the offset is positive or negative.

The description in the LangRef has also been reworded in terms of
sliding elements left or right and extracting either the upper or lower
half as opposed to extracting from a certain index, which brings it
inline with the definition of `llvm.fshr.*`/`llvm.fshl.*`.

This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into
their new equivalent one based on their offset, so existing uses of
vector.splice should still work.

Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced
in this PR to keep the diff small and kick the tyres on the AutoUpgrader
a bit. I planned to do this in a follow up NFC but can include it in
this PR if reviewers prefer.

Similarly the shuffle costing kind `SK_Splice` has just been kept the
same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight`
later.
2026-01-06 15:41:26 +08:00
Shilei Tian
c97de4387b
Revert "[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069)" (#174303)
This reverts commit 2c376ffeca490a5732e4fd6e98e5351fcf6d692a because it
breaks assembler.

```
$ llvm-mc -triple=amdgcn -mcpu=gfx1250 -show-encoding <<< "v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] matrix_b_reuse"
  v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] clamp ; encoding: [0x10,0x80,0x72,0xcc,0x00,0x11,0x42,0x1c]
```

We have a fundamental issue in the clamp support in VOP3P instructions,
which will need more changes.
2026-01-04 02:13:21 +00:00
Muhammad Abdul
2c376ffeca
[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069)
Fixes #166989 

- Adds a clamp immediate operand to the AMDGPU WMMA iu8 intrinsic and
threads it through LLVM IR, MIR lowering, Clang builtins/tests, and MLIR
ROCDL dialect so all layers agree on the new operand
- Updates AMDGPUWmmaIntrinsicModsAB so the clamp attribute is emitted,
teaches VOP3P encoding to accept the immediate, and adjusts Clang
codegen/builtin headers plus MLIR op definitions and tests to match
- Documents what the WMMA clamp operand do
- Implement bitcode AutoUpgrade for source compatibility on WMMA IU8
Intrinsic op

Possible future enhancements:
- infer clamping as an optimization fold based on the use context

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-12-27 12:51:29 -05:00
Teresa Johnson
e3c621c50b
[ThinLTO][MemProf] Add option to override max ICP with larger number (#171652)
Adds an option -module-summary-max-indirect-edges, and wiring into the
ICP logic that collects promotion candidates from VP metadata, to
support a larger number of promotion candidates for use in building the
ThinLTO summary. Also use this in the MemProf ThinLTO backend handling
where we perform memprof ICP during cloning.

The new option, essentially off by default, can be used to override the
value of -icp-max-prom, which is checked internally in ICP, with a
larger max value when collecting candidates from the VP metadata.

For MemProf in particular, where we synthesize new VP metadata targets
from allocation contexts, which may not be all that frequent, we need to
be able to include a larger set of these targets in the summary in order
to correctly handle indirect calls in the contexts. Otherwise we will
not set up the callsite graph edges correctly.
2025-12-15 10:16:06 -08:00
anjenner
27651133e2
AMDGPU: Drop and upgrade llvm.amdgcn.atomic.csub/cond.sub to atomicrmw (#105553)
These both perform conditional subtraction, returning the minuend and
zero respectively, if the difference is negative.
2025-12-09 23:13:33 +00:00
Paul Walker
b5a3b8b704
[LLVM][SVE] Remove aarch64.sve.rev intrinsic, using vector.reverse instead. (#169654) 2025-11-28 11:59:34 +00:00
Paul Walker
8401a8d0be [NFC][LLVM] Add bitcode tests for llvm.aarch64.sve.rev 2025-11-27 10:42:29 +00:00
Peter Collingbourne
d2379effe9
Add deactivation symbol operand to ConstantPtrAuth.
Deactivation symbol operands are supported in the code generator by
building on the previously added support for IRELATIVE relocations.

Reviewers: ojhunt, fmayer, ahmedbougacha, nikic, efriedma-quic

Reviewed By: fmayer

Pull Request: https://github.com/llvm/llvm-project/pull/133537
2025-11-26 12:39:40 -08:00
Peter Collingbourne
6227eb90da
Add IR and codegen support for deactivation symbols.
Deactivation symbols are a mechanism for allowing object files to disable
specific instructions in other object files at link time. The initial use
case is for pointer field protection.

For more information, see the RFC:
https://discourse.llvm.org/t/rfc-deactivation-symbols/85556

Reviewers: ojhunt, nikic, fmayer, arsenm, ahmedbougacha

Reviewed By: fmayer

Pull Request: https://github.com/llvm/llvm-project/pull/133536
2025-11-26 12:37:09 -08:00
CarolineConcatto
200793ac21
Extend MemoryEffects to Support Target-Specific Memory Locations (#148650)
This patch introduces preliminary support for additional memory
locations.
They are: target_mem0 and target_mem1 and they model memory locations
that cannot be represented with existing memory locations.

It was a solution suggested in :
https://discourse.llvm.org/t/rfc-improving-fpmr-handling-for-fp8-intrinsics-in-llvm/86868/6

Currently, these locations are not yet target-specific. The goal is to
enable the compiler to express read/write effects on these resources.
2025-11-18 11:10:58 +00:00
Jay Foad
f037f41350
[IR] Add new function attribute nocreateundeforpoison (#164809)
Also add a corresponding intrinsic property that can be used to mark
intrinsics that do not introduce poison, for example simple arithmetic
intrinsics that propagate poison just like a simple arithmetic
instruction.

As a smoke test this patch adds the new property to
llvm.amdgcn.fmul.legacy.
2025-11-04 12:00:44 +00:00
Orlando Cazalet-Hyams
aa5fe56db4
[DebugInfo] Add dataSize to DIBasicType to add DW_AT_bit_size to _BitInt types (#164372)
DW_TAG_base_type DIEs are permitted to have both byte_size and bit_size
attributes "If the value of an object of the given type does not fully
occupy the storage described by a byte size attribute"

* Add DataSizeInBits to DIBasicType (`DIBasicType(... dataSize: n ...)` in IR).
* Change Clang to add DataSizeInBits to _BitInt type metadata.
* Change LLVM to add DW_AT_bit_size to base_type DIEs that have non-zero
  DataSizeInBits.

TODO: Do we need to emit DW_AT_data_bit_offset for big endian targets?
See discussion on the PR.

Fixes [#61952](https://github.com/llvm/llvm-project/issues/61952)

---------

Co-authored-by: David Stenberg <david.stenberg@ericsson.com>
2025-10-29 15:23:46 +00:00
Michael Buch
49f918d4c3
[llvm][Bitcode][ObjC] Fix order of setter/getter argument to DIObjCProperty constructor (#165421)
Depends on:
* https://github.com/llvm/llvm-project/pull/165401

We weren't testing `DIObjCProperty` roundtripping. So this was never
caught.

The consequence of this is that the `setter:` would have the getter name
and `getter:` would have the setter name.
2025-10-29 12:14:56 +00:00
paperchalice
4a95cd14b3
[test][Bitcode] Remove unsafe-fp-math uses (NFC) (#164743)
Post cleanup for #164534.
2025-10-23 16:38:22 +08:00
Teresa Johnson
eb74d8e03c
[ThinLTO] Add index flag for internalization/promotion status (#164530)
Add an index-wide flag indicating whether index-based internalization
and promotion have completed. This will be used in a follow on change.
2025-10-22 07:30:43 -07:00
Daniel Kiss
048070ba6f
[ARM][AArch64] BTI,GCS,PAC Module flag update. (#86212)
Module flag is used to indicate the feature to be propagated to the
function. As now the frontend emits all attributes accordingly let's
help the auto upgrade to only do work when old and new bitcodes are
merged.

Depends on #82819 and #86031
2025-10-22 09:29:06 +02:00
Nikita Popov
573ca36753
[IR] Replace alignment argument with attribute on masked intrinsics (#163802)
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.

This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)

It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
2025-10-20 08:50:09 +00:00
Michael Buch
cf1cdde24e
[llvm][DebugInfo] Add 'sourceLanguageVersion' field support to DICompileUnit (#162632)
Depends on:
* https://github.com/llvm/llvm-project/pull/162445

In preparation to emit DWARFv6's `DW_AT_language_version`.
2025-10-15 16:52:45 +01:00
Michael Buch
c32753a77a
[llvm][DebugInfo] Add 'sourceLanguageName' field support to DICompileUnit (#162445)
Depends on:
* https://github.com/llvm/llvm-project/pull/162255
* https://github.com/llvm/llvm-project/pull/162434

Part of a patch series to support the DWARFv6
`DW_AT_language_name`/`DW_AT_language_version` attributes.
2025-10-10 09:54:04 +01:00
Marco Elver
224873d7ac
[AllocToken] Introduce sanitize_alloc_token attribute and alloc_token metadata (#160131)
In preparation of adding the "AllocToken" pass, add the pre-requisite
`sanitize_alloc_token` function attribute and `alloc_token` metadata.

---

This change is part of the following series:
  1. https://github.com/llvm/llvm-project/pull/160131
  2. https://github.com/llvm/llvm-project/pull/156838
  3. https://github.com/llvm/llvm-project/pull/162098
  4. https://github.com/llvm/llvm-project/pull/162099
  5. https://github.com/llvm/llvm-project/pull/156839
  6. https://github.com/llvm/llvm-project/pull/156840
  7. https://github.com/llvm/llvm-project/pull/156841
  8. https://github.com/llvm/llvm-project/pull/156842
2025-10-07 12:51:42 +02:00
Nikita Popov
f31bc666f4
[IR] Handle addrspacecast in findBaseObject() (#162076)
Make findBaseObject() look through addrspacecast, so that
getAliaseeObject() works with an aliasee that uses and addrspacecast.
This fixes a crash during module summary index emission.

Fixes https://github.com/llvm/llvm-project/issues/161646.
2025-10-06 16:18:12 +02:00
Tom Tromey
296fddc89e
Allow DW_OP_rot, DW_OP_neg, and DW_OP_abs in DIExpression (#160757)
The Ada front end can emit somewhat complicated DWARF expressions for
the offset of a field. While working in this area I found that I needed
DW_OP_rot (to implement a branch-free computation -- it looked more
difficult to add support for branching); and DW_OP_neg and DW_OP_abs
(just basic functionality).
2025-10-03 14:36:17 +01:00
Florian Hahn
4d4cb757f9
[LLVMContext] Add OB_align assume bundle op ID. (#158078)
Assume operand bundles are emitted in a few more places now, including
used in various places in libc++. Add a dedicated ID for them.

PR: https://github.com/llvm/llvm-project/pull/158078
2025-09-24 16:03:46 +00:00
Sander de Smalen
17e008db17
[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)
The partial reduction intrinsics are no longer experimental, because
they've been used in production for a while and are unlikely to change.
2025-09-17 11:44:47 +01:00
Antonio Frighetto
370607065d
[llvm] Regenerate test checks including TBAA semantics (NFC)
Tests exercizing TBAA metadata (both purposefully and not), and
previously generated via UTC, have been regenerated and updated
to version 6.
2025-09-12 20:01:17 +02:00
Alexander Richardson
3a4b351ba1
[IR] Introduce the ptrtoaddr instruction
This introduces a new `ptrtoaddr` instruction which is similar to
`ptrtoint` but has two differences:

1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance
2) `ptrtoaddr` only extracts (and then extends/truncates) the low
   index-width bits of the pointer

For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.

This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.

RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/139357
2025-08-08 10:12:39 -07:00
Diana Picus
20d8398825
[AMDGPU] ISel & PEI for whole wave functions (#145858)
Whole wave functions are functions that will run with a full EXEC mask.
They will not be invoked directly, but instead will be launched by way
of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in
a future patch). These functions are meant as an alternative to the
`llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics.

Whole wave functions will set EXEC to -1 in the prologue and restore the
original value of EXEC in the epilogue. They must have a special first
argument, `i1 %active`, that is going to be mapped to EXEC. They may
have either the default calling convention or amdgpu_gfx. The inactive
lanes need to be preserved for all registers used, active lanes only for
the CSRs.

At the IR level, arguments to a whole wave function (other than
`%active`) contain poison in their inactive lanes. Likewise, the return
value for the inactive lanes is poison.

This patch contains the following work:
* 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN
  used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return
  a SReg_1 representing `%active`, which needs to be passed into
  SI_WHOLE_WAVE_FUNC_RETURN.
* SelectionDAG support for generating these 2 new pseudos and the
  special handling of %active. Since the return may be in a different
  basic block, it's difficult to add the virtual reg for %active to
  SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF
  which is later replaced via a custom inserter.
* Expansion of the 2 pseudos during prolog/epilog insertion. PEI also
  marks any used VGPRs as WWM registers, which are then spilled and
  restored with the usual logic.

Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic
and a lot of optimization work (especially in order to reduce spills
around function calls).

---------

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-07-21 10:39:09 +02:00
Jeremy Morse
51f4e2cda2
[Bitcode][NFC] Add abbrev for FUNC_CODE_DEBUG_LOC (#147211)
DILocations that are not attached to instructions are encoded using
METADATA_LOCATION records which have an abbrev. DILocations attached to
instructions are interleaved with instruction records as FUNC_CODE_DEBUG_LOC
records, which do not have an abbrev (and FUNC_CODE_DEBUG_LOC_AGAIN
which have no operands).

Add a new FUNCTION_BLOCK abbrev FUNCTION_DEBUG_LOC_ABBREV for
FUNC_CODE_DEBUG_LOC records.

This reduces the bc file size by up to 7% in CTMark, with many between 2-4%
smaller.

[per-file file size
compile-time-tracker](https://llvm-compile-time-tracker.com/compare.php?from=75cf826849713c00829cdf657e330e24c1a2fd03&to=1e268ebd0a581016660d9d7e942495c1be041f7d&stat=size-file&details=on)
(go to stage1-ReleaseLTO-g).

This optimisation is motivated by #144102, which adds the new Key Instructions
fields to bitcode records. The combined patches still overall look to be a
slight improvement over the base.

(Originally reviewed in PR #146497)

Co-authored-by: Orlando Cazalet-Hyams <orlando.hyams@sony.com>
2025-07-06 22:30:31 +01:00
Nikita Popov
0a656d8e57
[Bitcode] Add abbreviations for additional instructions (#146825)
Add abbreviations for icmp/fcmp, store and br, which are the most common
instructions that don't have abbreviations yet. This requires increasing
the abbreviation size to 5 bits.

This gives about 3-5% bitcode size reductions for the clang build.
2025-07-03 14:28:32 +02:00
Antonio Frighetto
f1cc0b607b [IR] Introduce dead_on_return attribute
Add `dead_on_return` attribute, which is meant to be taken advantage
by the frontend, and states that the memory pointed to by the argument
is dead upon function return. As with `byval`, it is supposed to be
used for passing aggregates by value. The difference lies in the ABI:
`byval` implies that the pointer is explicitly passed as argument to
the callee (during codegen the copy is emitted as per byval contract),
whereas a `dead_on_return`-marked argument implies that the copy
already exists in the IR, is located at a specific stack offset within
the caller, and this memory will not be read further by the caller upon
callee return – or otherwise poison, if read before being written.

RFC: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.
2025-07-02 09:29:36 +02:00
Timothy Werquin
a8e486bfc4
[Bitcode] Fix constexpr expansion creating invalid PHIs (#141560)
Fixes errors about duplicate PHI edges when the input had duplicates
with constexprs in them. The constexpr translation makes new basic
blocks, causing the verifier to complain about duplicate entries in PHI
nodes.
2025-05-27 15:51:48 +02:00
Jonathan Thackray
6e49f73825
Reland [llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#137701)
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.

These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
     https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.

Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
2025-04-30 22:06:37 +01:00
Jonathan Thackray
7ee0097b48
Revert "[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions" (#137657)
Reverts llvm/llvm-project#136759 due to bad interaction with c792b25e4
2025-04-28 16:53:36 +01:00
Jonathan Thackray
ba420d8122
[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#136759)
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.

These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
     https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.

Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
2025-04-28 15:31:44 +01:00
Matt Arsenault
c5ac63e4fc
ThinLTO: Add flag to print uselistorder in bitcode writer pass (#133230)
This is needed in llvm-reduce to avoid perturbing the uselistorder in
intermediate steps. Really llvm-reduce wants pure serialization with
no dependency on the pass manager. There are other optimizations mixed
in to the serialization here depending on metadata in the module, which
is also bad.
    
Part of #63621
2025-04-14 20:34:02 +02:00
Shilei Tian
a45b133d40
[AMDGPU][Verifier] Mark calls to entry functions as invalid in the IR verifier (#134910) 2025-04-11 15:32:37 -04:00
Shilei Tian
61a7289077 [NFC] Remove trailing whitespaces in llvm/test/Bitcode/calling-conventions.3.2.ll 2025-04-10 13:01:39 -04:00
Jeremy Morse
40f9bb9e25
[DebugInfo][RemoveDIs] Eliminate another debug-info variation flag (#133917)
The "preserve input debug-info format" flag allowed some tooling to opt
into not seeing the new debug records yet, and to not autoupgrade. This
was good at the time, but un-necessary now that we'll be ditching
intrinsics shortly.

It also hides errors now: verify-uselistorder was hardcoding this flag
to on, and as a result it hasn't seen debug records before. Thus, we
missed a uselistorder variation: constant-expressions such as GEPs can
be contained within debug records and completely isolated from the value
hierachy, see the metadata-use-uselistorder.ll test. These Values didn't
get ordered, but were legitimate uses of constants like "i64 0", and we
now run into difficulty handling that. The patch to AsmWriter seeks
Values to order even through debug-info now.

Finally there are a few intrinsics-tests relying on this flag that we
can just delete, such as one in llvm-reduce and another few in the
LocalTest unit tests. For the fast-isel test, it was added in
https://reviews.llvm.org/D67703 explicitly for checking the size of
blocks without debug-info and in 1525abb9c94 the codepath it tests moved
towards being sunsetted. It'll be totally redundant once RemoveDIs is on
permanently.

Note that there's now no explicit test for the textual-IR autoupgrade
path. I submit that we can rely on the thousands of .ll files where
we've only been bothered to update the outputs, not the inputs, to debug
records.
2025-04-09 18:00:28 +01:00
Matt Arsenault
28a391848c Bitcode: Convert test to opaque pointers 2025-04-07 21:59:11 +07:00