1513 Commits

Author SHA1 Message Date
Prabhu Rajasekaran
b7dfc429c3
[llvm][AsmPrinter] Call graph section Flag field enum (#176309)
This enum is required in llvm-readobj ELFDumper.cpp as well for parsing
the call graph section generated. To avoid duplication of the Flag field
enum, moving this to llvm/object/ELFTypes.h.
2026-02-03 11:27:20 -08:00
Wael Yehia
e1f69ee8e8
[AIX] Implement the ifunc attribute. (#153049)
Currently, the AIX linker and loader do not provide a mechanism to
implement ifuncs similar to GNU_ifunc on ELF Linux.
On AIX, we will lower `__attribute__((ifunc("resolver"))` to the llvm
`ifunc` as other platforms do. The llvm `ifunc` in turn will get lowered
at late stages of the optimization pipeline to an AIX-specific
implementation. No special linkage or relocations are needed when
generating assembly/object output.

On AIX, a function `foo` has two symbols associated with it: a function
descriptor (`foo`) residing in the `.data` section, and an entry point
(`.foo`) residing in the `.text` section. The first field of the
descriptor is the address of the entry point. Typically, the address
field in the descriptor is initialized once: statically, at load time
(?), or at runtime if runtime linking is enabled.

Here we would like to use the address field in the descriptor to
implement the `ifunc` semantics. Specifically, the ifunc function will
become a stub that jumps to the entry point in the address field. A
constructor function is linked into every linkage module. The
constructor walks an array of `{descriptor, resolver}` pairs, calling
the resolver and saving the result in the address field in the
descriptor (thus setting `foo`'s descriptor to point to the resolved
version early during program runtime).

Known limitations:
- Due to bug #161576, which affects object generation path, you will
need either `-ffunction-sections` or `-fno-integrated-as` to generate a
correct/linkable object file.
- aliases to ifuncs are not supported, a testcase has been added and
marked XFAIL. I'm planning to address in a follow-up PR because it's not
important enough, IMHO, for this PR
- dead ifuncs in a CU that contains at least one live ifunc, will result
in all ifuncs being kept by the linker. The fix for this is common with
a similar problem we have with PGO. PR #159435 is trying to provide a
mechanism that will allow the ifunc and PGO implementations to avoid the
dead code retention at the link step.
- the resolver must return a function that is in the same DSO as the
ifunc; the compiler will try to detect if this condition is violated and
report it, but it cannot detect it in general. To be safe, all candidate
functions (returned by a particular resolver) must either be static or
have hidden/protected visibility. This is so that the ifunc stub doesn't
have to save and restore the TOC register r2. In future work, this case
will be supported and the requirement will be lifted.

---------

Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
2026-02-03 14:15:16 -05:00
Laxman Sole
1a23bca645
[DebugInfo][NVPTX] Adding support for inlined_at debug directive in NVPTX backend (#170239)
This change adds support for emitting the enhanced PTX debugging
directives `function_name` and `inlined_at` as part of the `.loc`
directive in the NVPTX backend.

`.loc` syntax - 
>.loc file_index line_number column_position

`.loc` syntax with `inlined_at` attribute - 
>.loc file_index line_number column_position,function_name label {+
immediate }, inlined_at file_index2 line_number2 column_position2

`inlined_at` attribute specified as part of the `.loc` directive
indicates PTX instructions that are generated from a function that got
inlined. It specifies the source location at which the specified
function is inlined. `file_index2`, `line_number2`, and
`column_position2` specify the location at which the function is
inlined.

The `function_name` attribute specifies an offset in the DWARF section-
`.debug_str`. Offset is specified as a label expression or a label +
immediate expression, where label is defined in the `.debug_str`
section. DWARF section `.debug_str` contains ASCII null-terminated
strings that specify the name of the function that is inlined.

These attributes were introduced in PTX ISA version 7.2 (see NVIDIA’s
documentation:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#debugging-directives-loc
).

To support these features, the PR introduces a new `NVPTXDwarfDebug`
class derived from `DwarfDebug`, which implements NVPTX-specific logic
for emitting these directives. The base DwarfDebug infrastructure is
extended with new virtual functions (`initializeTargetDebugInfo()` and
`recordTargetSourceLine()`) that enable the NVPTX backend to generate
this additional debug information.

The MC layer is also updated to emit the NVPTX-specific `.loc`
attributes (function_name and inlined_at). The implementation applies to
PTX ISA 7.2 and later when the debug-info emission kind is either
lineTableOnly or DebugDirectiveOnly. A new command-line option,
`--line-info-inlined-at=<true/false>`, is added to control whether the
inlined_at attribute is generated.

Note - The `NVCC` compiler already emits the `.loc` directive with
`inlined_at` when compiled with `-lineinfo` option.
2026-02-01 22:33:53 -08:00
Marina Taylor
7994daccd9
[AsmPrinter] Add a command-line option to emit stack usage files (#178908)
Preparation for #178005.

This will allow stack usage files to be requested during the linking
step in LTO builds, in a more straightforward way than via
TargetOptions.
2026-01-30 18:10:08 +00:00
Marina Taylor
2eaaaf1912
NFC: Rename CodeGenOptions::StackUsageOutput to StackUsageFile (#178898)
Preparation for #178005.

"Output" has too many different interpretations: it could be an
enabled/disabled, a file format, etc. Clarify that it's the destination
file.
2026-01-30 15:03:54 +00:00
Jameson Nash
d10b2b566a
[NFCI] replace getValueType with new getGlobalSize query (#177186)
Returns uint64_t to simplify callers. The goal is eventually replace
getValueType with this query, which should return the known minimum
reference-able size, as provided (instead of a Type) during create.
Additionally the common isSized query would be replaced with an
isExactKnownSize query to test if that size is an exact definition.
2026-01-22 13:55:53 -05:00
Jameson Nash
a28a2f6e50
[AsmPrinter] Analyze GlobalAlias more carefully with getAliaseeObject (#176996)
Move the `GA.getAliaseeObject()` call to the top of `emitGlobalAlias`
and reuse the result throughout the function. Since `getAliaseeObject()`
can return null, switch to `isa_and_nonnull<>` for correctness.

This is just a drive-by fix I noticed in reading the code, not
something I actually encountered in practice. This seems to have been
last improved in 924696d271cabdda066088c40a0fa98bd240b86a, and I think
this version now even closer matches the intent of the comment here.
2026-01-21 10:00:01 -05:00
Daniel Paoliello
483c6834a2
[NFC][win] Use an enum for the cfguard module flag (#176461)
Currently the `cfguard` module flag can be set to 1 (emit tables only,
no checks) or 2 (emit tables and checks).

This change formalizes that definition by moving these values into an
enum, instead of just having them documented in comments.

Split out from #176276
2026-01-16 18:23:25 -08:00
Prabhu Rajasekaran
4b31ad94e0
[UEFI] Codeview do not crash when no llvm.dbg.cu (#174460)
PR #142970 Added for Windows targets to emit minimal codeview metadata
even when debug info is disabled. This crashes the backend for UEFI
x86_64-uefi triple as llvm.dbg.cu is expected unconditionally there.
Handling it correctly in AsmPrinter and adding a regression test.
2026-01-06 09:52:24 -08:00
Rahman Lavaee
ba6c5f8e4e
Insert symbols for prefetch targets read from basic blocks section profile. (#168439)
This is the first PR to enable the prefetch optimization via Propeller
based on our
[RFC](https://discourse.llvm.org/t/rfc-code-prefetch-insertion/88668/22).
It enables emitting special symbols prefixed with
`__llvm_prefetch_target` to point to the prefetch targets as specified
via directives in the profile. A prefetch target is uniquely identified
by its function name, basic block ID, and the subblock index (used when
the target is after a call instruction).

A new pass is added which sets a field in basic blocks which have
prefetch targets. The next PR will add the prefetch insertion logic into
the same pass.
2026-01-05 15:36:59 -08:00
Rahman Lavaee
53005fd435
Use the Propeller CFG profile in the PGO analysis map if it is available. (#163252)
This PR implements the emitting of the post-link CFG information in PGO
analysis map, as explained in the
[RFC](https://discourse.llvm.org/t/rfc-extending-the-pgo-analysis-map-with-propeller-cfg-frequencies/88617).
This is enabled by a flag `pgo-analysis-map-emit-bb-sections-cfg`.

This PR bumps the SHT_LLVM_BB_ADDR_MAP version to 5.
Also includes some refactoring changes related to storing the CFG in the
Basic block sections profile reader.
2025-12-17 14:19:18 -08:00
Jan Svoboda
8e999e3d78
[llvm][clang] Sandbox filesystem reads (#165350)
This PR introduces a new mechanism for enforcing a sandbox around
filesystem reads coming from the compiler. A fatal error is raised
whenever the `llvm::sys::fs`, `llvm::MemoryBuffer::getFile*()` APIs get
used directly instead of going through the "blessed" virtual interface
of `llvm::vfs::FileSystem`.
2025-12-11 15:42:13 -08:00
JaydeepChauhan14
9b6b52b534
[AsmPrinter][NFC] Reuse Target Triple variable (#171612) 2025-12-11 12:28:59 +01:00
Owen Anderson
ba3208e19f
[LLVM/CodeGen] Use the correct address space when building structor tables. (#171247)
No in-tree target exercises this, but it's needed for CHERI, and I
believe its correctness is verifiable by inspection.

Co-authored-by: Alex Richardson <alexrichardson@google.com>
2025-12-08 22:44:07 -06:00
Daniel Thornburgh
5f08fb4d72
[IR] llvm.reloc.none intrinsic for no-op symbol references (#147427)
This intrinsic emits a BFD_RELOC_NONE relocation at the point of call,
which allows optimizations and languages to explicitly pull in symbols
from static libraries without there being any code or data that has an
effectual relocation against such a symbol.

See issue #146159 for context.
2025-11-06 08:52:46 -08:00
Prabhu Rajasekaran
f60e69315e
[llvm] Emit canonical linkage correct function symbol (#166487)
In the call graph section, we were emitting the temporary label
pointing to the start of the function instead of the canonical linkage
correct function symbol. This patch fixes it and updates the
corresponding tests.
2025-11-05 09:22:08 -08:00
Rahman Lavaee
e9368a056d
[SHT_LLVM_BB_ADDR] Implement ELF and YAML support for Propeller CFG data in PGO analysis map. (#164914)
This PR implements the ELF support for PostLink CFG in PGO analysis map
as discussed in
[RFC](https://discourse.llvm.org/t/rfc-extending-the-pgo-analysis-map-with-propeller-cfg-frequencies/88617/2).

A later PR will implement the Codegen Support.
2025-10-30 13:12:06 -07:00
wdx727
d8d80b659a
Adding Matching and Inference Functionality to Propeller-PR2 (#162963)
Adding Matching and Inference Functionality to Propeller. For detailed
information, please refer to the following RFC:
https://discourse.llvm.org/t/rfc-adding-matching-and-inference-functionality-to-propeller/86238.
This is the second PR, which includes the calculation of basic block
hashes and their emission to the ELF file. It is associated with the
previous PR at https://github.com/llvm/llvm-project/pull/160706.

co-authors: lifengxiang1025
[lifengxiang@kuaishou.com](mailto:lifengxiang@kuaishou.com); zcfh
[wuminghui03@kuaishou.com](mailto:wuminghui03@kuaishou.com)

Co-authored-by: lifengxiang1025 <lifengxiang@kuaishou.com>
Co-authored-by: zcfh <wuminghui03@kuaishou.com>
Co-authored-by: Rahman Lavaee <rahmanl@google.com>
2025-10-23 09:38:12 -07:00
Rahman Lavaee
d55b10a4b9
Add comment about static_cast requirement. 2025-10-21 17:42:12 -07:00
Aiden Grossman
b85733094c Revert "Remove unnecessary static_cast<bool> in AsmPrinter.cpp."
This reverts commit 0e8ee0ec78dc370a1bf2688411cf2db36c3a4cd0.

This breaks Windows premerge.
https://lab.llvm.org/staging/#/builders/21/builds/6831
2025-10-22 00:11:25 +00:00
Rahman Lavaee
0e8ee0ec78
Remove unnecessary static_cast<bool> in AsmPrinter.cpp. 2025-10-21 14:15:03 -07:00
Jan Svoboda
9deba01c1d
[support] Use VFS in SourceMgr for loading includes (#162903)
Most `SourceMgr` clients don't make use of include files, but those that
do might want to specify the file system to use. This patch enables that
by making it possible to pass a `vfs::FileSystem` instance into
`SourceMgr`.
2025-10-15 09:24:36 -07:00
Prabhu Rajasekaran
cac8bdb56c
[NFC][llvm] Update call graph section's name. (#163429)
Call graph section emitted by LLVM was named `.callgraph`. Renaming it
to `.llvm.callgraph`.
2025-10-15 07:52:54 -07:00
wdx727
5eeae08f7e
Adding Matching and Inference Functionality to Propeller (#160706)
We have optimized the implementation of introducing the "matching and
inference" technique into Propeller. In this new implementation, we have
made every effort to avoid introducing new compilation parameters while
ensuring compatibility with Propeller's current usage. Instead of
creating a new profile format, we reused the existing one employed by
Propeller. This new implementation is fully compatible with Propeller's
current usage patterns and reduces the amount of code changes. For
detailed information, please refer to the following RFC:
https://discourse.llvm.org/t/rfc-adding-matching-and-inference-functionality-to-propeller/86238.
We plan to submit the relevant changes in several pull requests (PRs).
The current one is the first PR, which adds the basic block hash to the
SHT_LLVM_BB_ADDR_MAP section.

co-authors: lifengxiang1025 <lifengxiang@kuaishou.com>; zcfh
<wuminghui03@kuaishou.com>

Co-authored-by: lifengxiang1025 <lifengxiang@kuaishou.com>
Co-authored-by: zcfh <wuminghui03@kuaishou.com>
Co-authored-by: Rahman Lavaee <rahmanl@google.com>
2025-10-14 10:34:14 -07:00
Prabhu Rajasekaran
6fb87b231f
[llvm][AsmPrinter] Call graph section format. (#159866)
Make .callgraph section's layout efficient in space. Document the layout
of the section.
2025-10-10 12:20:11 -07:00
ssijaric-nv
24c1bb60e3
[MC] Make .note.GNU-stack explicit for the trampoline case (#151754)
In the presence of trampolines, the .note.GNU-stack section is not emitted. The
absence of .note.GNU-stack results in the stack marked executable by some
linkers. But others require an explict .note.GNU-stack section.

The GNU ld 2.43 on x86 machines, for example, issues the following:

missing .note.GNU-stack section implies executable stack
NOTE: This behaviour is deprecated and will be removed in a future version of the linker

On one of the ARM machines, the absence of .note.GNU-stack results in the stack
marked as non-executable:

STACK off 0x0000000000000000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**4
filesz 0x0000000000000000 memsz 0x0000000000000000 flags rw-

This change just emits the explicit .note.GNU-stack and marks it executable if required.
2025-10-04 15:22:54 -07:00
Michael Liao
fc6cc4009f [AsmPrinter] Remove unnecessary casts. NFC 2025-10-01 14:23:42 -04:00
Prabhu Rajasekaran
42b195e1bf
[llvm][AsmPrinter] Add direct calls to callgraph section (#155706)
Extend CallGraphSection to include metadata about direct calls. This
simplifies the design of tools that must parse .callgraph section to not
require dependency on MC layer.
2025-09-22 14:34:01 -07:00
Tobias Stadler
dfbd76bda0
[Remarks] Restructure bitstream remarks to be fully standalone (#156715)
Currently there are two serialization modes for bitstream Remarks:
standalone and separate. The separate mode splits remark metadata (e.g.
the string table) from actual remark data. The metadata is written into
the object file by the AsmPrinter, while the remark data is stored in a
separate remarks file. This means we can't use bitstream remarks with
tools like opt that don't generate an object file. Also, it is confusing
to post-process bitstream remarks files, because only the standalone
files can be read by llvm-remarkutil. We always need to use dsymutil
to convert the separate files to standalone files, which only works for
MachO. It is not possible for clang/opt to directly emit bitstream
remark files in standalone mode, because the string table can only be
serialized after all remarks were emitted.

Therefore, this change completely removes the separate serialization
mode. Instead, the remark string table is now always written to the end
of the remarks file. This requires us to tell the serializer when to
finalize remark serialization. This automatically happens when the
serializer goes out of scope. However, often the remark file goes out of
scope before the serializer is destroyed. To diagnose this, I have added
an assert to alert users that they need to explicitly call
finalizeLLVMOptimizationRemarks.

This change paves the way for further improvements to the remark
infrastructure, including more tooling (e.g. #159784), size optimizations
for bitstream remarks, and more.

Pull Request: https://github.com/llvm/llvm-project/pull/156715
2025-09-22 16:41:39 +01:00
Rahman Lavaee
a61ff1487b
[SHT_LLVM_BB_ADDR_MAP] Change the callsite feature to emit end of callsites. (#155041)
This PR simply moves the callsite anchors from the beginning of
callsites to their end.

Emitting the end of callsites is more sensible as it allows breaking the
basic block into subblocks which end with control transfer instructions.
2025-08-25 10:17:29 -07:00
Alexander Richardson
3a4b351ba1
[IR] Introduce the ptrtoaddr instruction
This introduces a new `ptrtoaddr` instruction which is similar to
`ptrtoint` but has two differences:

1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance
2) `ptrtoaddr` only extracts (and then extends/truncates) the low
   index-width bits of the pointer

For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.

This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.

RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/139357
2025-08-08 10:12:39 -07:00
Fangrui Song
b51ff2705f MCSymbolELF: Migrate away from classof
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
2025-08-03 16:05:35 -07:00
Fangrui Song
e640ca8b9a MCSymbolELF: Migrate away from classof
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
2025-08-03 15:45:36 -07:00
Prabhu Rajasekaran
7c6a1c3d15
[llvm][AsmPrinter] Emit call graph section
Collect the necessary information for constructing the call graph
section, and emit to .callgraph section of the binary.

MD5 hash of the callee_type metadata string is used as the numerical
type id emitted.

Reviewers: ilovepi

Reviewed By: ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/87576
2025-07-31 13:38:20 -07:00
Fangrui Song
dd4ebe6514 MCSectionCOFF: Remove classof
The object file format specific derived classes are used in context like
MCStreamer and MCObjectTargetWriter where the type is statically known.
We don't use isa/dyn_cast and we want to eliminate
MCSection::SectionVariant in the base class.
2025-07-25 22:10:48 -07:00
sivadeilra
b933f0c376
Fix Windows EH IP2State tables (remove +1 bias) (#144745)
This changes how LLVM constructs certain data structures that relate to
exception handling (EH) on Windows. Specifically this changes how
IP2State tables for functions are constructed. The purpose of this
change is to align LLVM to the requires of the Windows AMD64 ABI, which
requires that the IP2State table entries point to the boundaries between
instructions.

On most Windows platforms (AMD64, ARM64, ARM32, IA64, but *not* x86-32),
exception handling works by looking up instruction pointers in lookup
tables. These lookup tables are stored in `.xdata` sections in
executables. One element of the lookup tables are the `IP2State` tables
(Instruction Pointer to State).

If a function has any instructions that require cleanup during exception
unwinding, then it will have an IP2State table. Each entry in the
IP2State table describes a range of bytes in the function's instruction
stream, and associates an "EH state number" with that range of
instructions. A value of -1 means "the null state", which does not
require any code to execute. A value other than -1 is an index into the
State table.

The entries in the IP2State table contain byte offsets within the
instruction stream of the function. The Windows ABI requires that these
offsets are aligned to instruction boundaries; they are not permitted to
point to a byte that is not the first byte of an instruction.

Unfortunately, CALL instructions present a problem during unwinding.
CALL instructions push the address of the instruction after the CALL
instruction, so that execution can resume after the CALL. If the CALL is
the last instruction within an IP2State region, then the return address
(on the stack) points to the *next* IP2State region. This means that the
unwinder will use the wrong cleanup funclet during unwinding.

To fix this problem, compilers should insert a NOP after a CALL
instruction, if the CALL instruction is the last instruction within an
IP2State region. The NOP is placed within the same IP2State region as
the CALL, so that the return address points to the NOP and the unwinder
will locate the correct region.

This PR modifies LLVM so that it inserts NOP instructions after CALL
instructions, when needed. In performance tests, the NOP has no
detectable significance. The NOP is rarely inserted, since it is only
inserted when the CALL is the last instruction before an IP2State
transition or the CALL is the last instruction before the function
epilogue.

NOP padding is only necessary on Windows AMD64 targets. On ARM64 and
ARM32, instructions have a fixed size so the unwinder knows how to "back
up" by one instruction.

Interaction with Import Call Optimization (ICO):

Import Call Optimization (ICO) is a compiler + OS feature on Windows
which improves the performance and security of DLL imports. ICO relies
on using a specific CALL idiom that can be replaced by the OS DLL
loader. This removes a load and indirect CALL and replaces it with a
single direct CALL.

To achieve this, ICO also inserts NOPs after the CALL instruction. If
the end of the CALL is aligned with an EH state transition, we *also*
insert a single-byte NOP. **Both forms of NOPs must be preserved.** They
cannot be combined into a single larger NOP; nor can the second NOP be
removed.

This is necessary because, if ICO is active and the call site is
modified by the loader, the loader will end up overwriting the NOPs that
were inserted for ICO. That means that those NOPs cannot be used for the
correct termination of the exception handling region (the IP2State
transition), so we still need an additional NOP instruction. The NOPs
cannot be combined into a longer NOP (which is ordinarily desirable)
because then ICO would split one instruction, producing a malformed
instruction after the ICO call.
2025-07-22 09:18:13 -07:00
Fangrui Song
6201761e96 MC: Rename isVirtualSection to isBssSection
The term BSS (Block Started by Symbol) is a standard, widely recognized
term, available in the a.out object file format and adopted by formats
like COFF, XCOFF, Mach-O (called S_ZEROFILL while `__bss` is also used),
and ELF. To avoid introducing unfamiliar terms, we should use
isBSSSection instead of isVirtualSection.
2025-07-20 10:39:17 -07:00
Rahman Lavaee
fcecf177c1
[SHT_LLVM_BB_ADDR_MAP] Emit callsite offsets in the SHT_LLVM_BB_ADDR_MAP section. (#146563)
Callsite offsets will help map addresses to the right position in the
basic block (before or after a callsite).

This PR also bumps the BBAddrMap version to 3.

The encoding/decoding ability is already pushed upstream
8d7a8fcc3ab9f6d4c4a7e4312876fe94ed3d6c4f.
2025-07-09 11:11:15 -07:00
Fangrui Song
68494ae072 [XRay] xray_fn_idx: fix alignment directive
Use `emitValueToAlignment` as the section does not contain code.
`emitCodeAlignment` would lead to ALIGN relocations on RISC-V and
LoongArch with linker relaxation.

In addition, change the alignment to wordsize, sufficient for the
runtime requirement (`XRayFunctionSledIndex`).

Related to #147322
2025-07-08 21:52:53 -07:00
Rahman Lavaee
6b623a6622
[SHT_LLVM_BB_ADDR_MAP] Remove support for versions 1 and 0 (SHT_LLVM_BB_ADDR_MAP_V0). (#146186)
Version 2 was added more than two years ago
(6015a045d7).
So it should be safe to deprecate older versions.
2025-07-02 10:31:52 -07:00
dianqk
630d55cce4
[AsmPrinter] Always emit global equivalents if there is non-global uses (#145648)
A case found from https://github.com/rust-lang/rust/issues/142752:
https://llvm.godbolt.org/z/T7ce9saWh.

We should emit `@bar_0` for the following code:

```llvm
target triple = "x86_64-unknown-linux-gnu"

@rel_0 = private unnamed_addr constant [1 x i32] [
  i32 trunc (i64 sub (i64 ptrtoint (ptr @bar_0 to i64), i64 ptrtoint (ptr @rel_0 to i64)) to i32)]
@bar_0 = internal unnamed_addr constant ptr @foo_0, align 8
@foo_0 = external global ptr, align 8

define void @foo(ptr %arg0) {
  store ptr @bar_0, ptr %arg0, align 8
  ret void
}
```
2025-06-25 18:39:36 +08:00
Fangrui Song
290fc1ea11
MC,AsmPrinter: Report redefinition error instead of crashing in more cases
* Fix the crash for `.equiv b, undef; b:` (.equiv equates a symbol to an expression and reports an error if the symbol was already defined).
* Remove redundant isVariable check from emitFunctionEntryLabel

Pull Request: https://github.com/llvm/llvm-project/pull/145460
2025-06-23 23:00:03 -07:00
Rahman Lavaee
8d7a8fcc3a
[SHT_LLVM_BB_ADDR_MAP] Encode and decode callsite offsets in a newly-introduced SHT_LLVM_BB_ADDR_MAP version. (#144426)
Recently, we have been looking at some optimizations targeting
individual calls. In particular, we plan to extend the address mapping
technique to map to individual callsites. For example, in this piece of
code for a basic blocks:

```
<BB>:
1200:    lea 0x1(%rcx), %rdx
1204:    callq foo
1209:    cmpq 0x10, %rdx
120d:    ja  L1
```

We want to emit 0x9 as the call site offset for `callq foo` (the offset
from the block entry to right after the call), so that we know if a
sampled address is before the call or after.

This PR implements the decode/encode/emit capability. The Codegen change
will be implemented in a later PR.
2025-06-23 09:25:14 -07:00
Matt Arsenault
6b129d6bbf
AsmPrinter: Do not use report_fatal_error for unhandled ConstantExpr (#145275) 2025-06-23 16:29:58 +09:00
Matt Arsenault
db051e8800
AsmPrinter: Do not use report_fatal_error for unknown appending linkage (#145269) 2025-06-23 16:26:27 +09:00
Matt Arsenault
338ee673bd
AsmPrinter: Do not use report_fatal_error for AIX XXStructor error (#145273) 2025-06-23 16:25:53 +09:00
Tobias Stadler
5835f1e0a3
[AsmPrinter] Fix crash when remarks section is unsupported (#144724)
Emit a warning and bail out instead of segfault-ing when the current
object file format does not have support for emitting a remarks section.
2025-06-20 12:55:11 +01:00
Jacek Caban
be5c96bfac
[CodeGen][COFF] Always emit CodeView compiler info on Windows targets (#142970)
MSVC always emits minimal CodeView metadata with compiler information,
even when debug info is otherwise disabled. Other tools may rely on this
metadata being present. For example, linkers use it to determine whether
hotpatching is enabled for the object file.
2025-06-13 22:48:29 +02:00
Eli Friedman
9f82ac5738
Remove GlobalObject::getAlign/setAlignment (#143188)
Currently, GlobalObject has an "alignment" property... but it's
basically nonsense: alignment doesn't mean the same thing for variables
and functions, and it's completely meaningless for ifuncs.

This "removes" (actually marking protected) the methods from
GlobalObject, adds the relevant methods to Function and GlobalVariable,
and adjusts the code appropriately.

This should make future alignment-related cleanups easier.
2025-06-09 13:51:03 -07:00
Simon Tatham
56acb06bc6
[ARM,AArch64] Don't put BTI at asm goto branch targets (#141562)
In 'asm goto' statements ('callbr' in LLVM IR), you can specify one or
more labels / basic blocks in the containing function which the assembly
code might jump to. If you're also compiling with branch target
enforcement via BTI, then previously listing a basic block as a possible
jump destination of an asm goto would cause a BTI instruction to be
placed at the start of the block, in case the assembly code used an
_indirect_ branch instruction (i.e. to a destination address read from a
register) to jump to that location. Now it doesn't do that any more:
branches to destination labels from the assembly code are assumed to be
direct branches (to a relative offset encoded in the instruction), which
don't require a BTI at their destination.

This change was proposed in https://discourse.llvm.org/t/85845 and there
seemed to be no disagreement. The rationale is:

1. it brings clang's handling of asm goto in Arm and AArch64 in line
with gcc's, which didn't generate BTIs at the target labels in the first
place.

2. it improves performance in the Linux kernel, which uses a lot of 'asm
goto' in which the assembly language just contains a NOP, and the
label's address is saved elsewhere to let the kernel self-modify at run
time to swap between the original NOP and a direct branch to the label.
This allows hot code paths to be instrumented for debugging, at only the
cost of a NOP when the instrumentation is turned off, instead of the
larger cost of an indirect branch. In this situation a BTI is
unnecessary (if the branch happens it's direct), and since the code
paths are hot, also a noticeable performance hit.

Implementation:

`SelectionDAGBuilder::visitCallBr` is the place where 'asm goto' target
labels are handled. It calls `setIsInlineAsmBrIndirectTarget()` on each
target `MachineBasicBlock`. Previously it also called
`setMachineBlockAddressTaken()`, which made `hasAddressTaken()` return
true, which caused a BTI to be added in the Arm backends.

Now `visitCallBr` doesn't call `setMachineBlockAddressTaken()` any more
on asm goto targets, but `hasAddressTaken()` also checks the flag set by
`setIsInlineAsmBrIndirectTarget()`. So call sites that were using
`hasAddressTaken()` don't need to be modified. But the Arm backends
don't call `hasAddressTaken()` any more: instead they test two more
specific query functions that cover all the reasons `hasAddressTaken()`
might have returned true _except_ being an asm goto target.

Testing:

The new test `AArch64/callbr-asm-label-bti.ll` is testing the actual
change, where it expects not to see a `bti` instruction after
`[[LABEL]]`. The rest of the test changes are all churn, due to the
flags on basic blocks changing. Actual output code hasn't changed in any
of the existing tests, only comments and diagnostics.

Further work:

`RISCVIndirectBranchTracking.cpp` and `X86IndirectBranchTracking.cpp`
also call `hasAddressTaken()` in a way that might benefit from using the
same more specific check I've put in `ARMBranchTargets.cpp` and
`AArch64BranchTargets.cpp`. But I'm not sure of that, so in this commit
I've only changed the Arm backends, and left those alone.
2025-06-03 08:44:13 +01:00