In LTO, part of LLVM's middle-end runs after linking has finished. LTO's
semantics depend on the complete set of extracted bitcode files being
known at this time. If the middle-end inserts new calls to library
functions (libfuncs) that are implemented in bitcode, this could extract
new bitcode object files into the link. These cannot be compiled,
leading to undefined symbol references.
Additionally, the middle-end in LTO may reason that such library
functions have no references, and it may internalize them, then
manipulate their API or even delete them. Afterwards, it may emit a call
to them, again producing undefined symbol references.
This patch resolves the former issue by ensuring that the middle end
emits no new references to symbols defined in bitcode, and it resolves
the latter issue by ensuring that extracted bitcode for libfuncs is
considered external, since new calls may be emitted to them at any time.
The new semantics are not yet established for MachO LLD, which does not
yet appear to have any special handling for libcalls in LTO. It also
does not yet support distributed ThinLTO; doing so would require
additional (de)serialization work.
This is the patch referenced in @ilovepi's and my talk at the last LLVM
devmeeting: "LT-Uh-Oh"
Gemini 3.1 was used in porting to COFF and WASM LLDs.
Add a SanCov flag for calling dedicated hook functions on function entry
and exit. This flag can be used either in combination with
-fsanitize-coverage=trace-pc (in which case this patch changes which
hook is called for the entry BB, and generates an additional hook call
before return), or it can be used by itself (in which case only the
dedicated entry/exit callbacks are invoked).
This can be used to track the call stack throughout a sancov trace.
cc @vitalybuka @dvyukov
Without this change, passing -fthinlto-index causes -fpass-plugin
arguments to be ignored. We want to be able to use plugins with
distributed thin-lto, so add support for this.
Wire the llvm-mc --reloc-section-sym={all,internal,none} option through
the clang driver (-Wa,--reloc-section-sym=) and cc1as
(--reloc-section-sym=). The option is only valid for ELF targets.
GNU Assembler will add the option as well.
Preparation for #178005.
"Output" has too many different interpretations: it could be an
enabled/disabled, a file format, etc. Clarify that it's the destination
file.
This plumbs the option --gsframe through the various levels needed to
support it in the assembler.
This is the final step in assembler-level sframe support for x86. With
it in place, clang produces sframe-sections that successfully link with
gnu-ld.
LLD support is pending some discussion.
The previous PR (https://github.com/llvm/llvm-project/pull/165322) had a
bad merge, but the only comments were as below. Both done.
1. Fix some stray formatting.
2. Add tests that:
the option is passed on to cc1
the correct error is emitted when an unsupported platform is used
the assembler behaves as expected (even if this has been tested
elsewhere, an end-to-end "integration" test would be useful)
The test for assembler behavior simply checks that an sframe section is
added, and is modeled after the similar one for crel.
This ensures that both the midend and backend see consistent values for
VecLib, so that for example frem can be selected correctly.
The test involves running the front-end, midend (to vectorize) and
backend to prove it can lower the vector frem call.
Fixes#172483
Introduce `__builtin_allow_sanitize_check("name")` which returns true if
the specified sanitizer is enabled for the function (after inlining).
Supported sanitizers are "address", "thread", "memory", "hwaddress", and
their "kernel-" variants, matching the names of the
`no_sanitize("name")`
usage.
This builtin enables conditional execution of explicit checks only when
the sanitizer is enabled, respecting `no_sanitize` attributes, even when
used from `always_inline` functions that may be used in sanitized or
no_sanitize functions.
Since we must defer until after inlining and cannot determine the result
statically, Clang must lower to the `llvm.allow.sanitize.*` intrinsics,
which are then resolved by the `LowerAllowCheckPass`.
*Original Motivation:* The Linux kernel has a number of low-level
primitives that use inline assembly not visible to the sanitizers, but
use explicitly inserted checks to avoid coverage loss. Many of those
low-level helpers, however, are also used from so-called `noinstr`
functions, which use `no_sanitize(..)` to prohibit instrumentation;
these are used for very brittle code (such as when the kernel sets up a
task context *before* normal memory is accessible), and any
instrumentation, incl. from explicit instrumentation, is prohibited.
Many such helpers themselves are macros or `always_inline`, however, are
unable to be used from such brittle contexts because they contain
explicit instrumentation. This requires awkward workarounds to avoid the
instrumentation.
The ideal solution is this new builtin, that can be used to determine if
instrumentation is enabled in a given function or not, which the helper
can then use to insert instrumentation only where instrumentation is
allowed.
A recent such case came up in [1], where file-level instrumentation had
already been disabled for KASAN and KCSAN, which had not been necessary
if the new builtin were available.
[1] https://lore.kernel.org/all/20251208-gcov-inline-noinstr-v1-0-623c48ca5714@google.com/
Single-use AddEmitPasses is inlined into RunCodegenPipeline for clarity
in comparing the parameters to the plugin and the parameters passed to
addPassesToEmitFile.
Pull Request: https://github.com/llvm/llvm-project/pull/171872
This permits pass plugins to use llvm:🆑:opt. Additionally, add a test
of -fpass-plugin, this was previously not tested at all.
I'm not sure whether using the LLVM Bye.so in the tests is possible this
way (e.g., if Clang is built standalone).
Reland after #173279.
Pull Request: https://github.com/llvm/llvm-project/pull/173287
This avoid pulling in the entire Passes library with all passes as
dependencies when just referring to PassPlugin, which is in fact
independent of the Passes themselves.
Pull Request: https://github.com/llvm/llvm-project/pull/173279
This permits pass plugins to use llvm:🆑:opt. Additionally, add a test
of -fpass-plugin, this was previously not tested at all.
I'm not sure whether using the LLVM Bye.so in the tests is possible this
way (e.g., if Clang is built standalone).
Pull Request: https://github.com/llvm/llvm-project/pull/171868
This avoid pulling in the entire Passes library with all passes as
dependencies when just referring to PassPlugin, which is in fact
independent of the Passes themselves.
Pull Request: https://github.com/llvm/llvm-project/pull/172478
This PR introduces a new mechanism for enforcing a sandbox around
filesystem reads coming from the compiler. A fatal error is raised
whenever the `llvm::sys::fs`, `llvm::MemoryBuffer::getFile*()` APIs get
used directly instead of going through the "blessed" virtual interface
of `llvm::vfs::FileSystem`.
This patch adds Clang support for speculative devirtualization and
integrates the related pass into the pass pipeline.
It's building on the LLVM backend implementation from PR #159048.
Speculative devirtualization transforms an indirect call (the virtual
function) to a guarded direct call.
It is guarded by a comparison of the virtual function pointer to the
expected target.
This optimization is still safe without LTO because it doesn't do direct
calls, it's conditional according to the function ptr.
This optimization:
- Opt-in: Disabled by default, enabled via `-fdevirtualize-speculatively`
- Works in non-LTO mode
- Handles publicly-visible objects.
- Uses guarded devirtualization with fallback to indirect calls when the
speculation is incorrect.
For this C++ example:
```
class Base {
public:
__attribute__((noinline))
virtual void virtual_function1() { asm volatile("NOP"); }
virtual void virtual_function2() { asm volatile("NOP"); }
};
class Derived : public Base {
public:
void virtual_function2() override { asm volatile("NOP"); }
};
__attribute__((noinline))
void foo(Base *BV) {
BV->virtual_function1();
}
void bar() {
Base *b = new Derived();
foo(b);
}
```
Here is the IR without enabling speculative devirtualization:
```
define dso_local void @_Z3fooP4Base(ptr noundef %BV) local_unnamed_addr #0 {
entry:
%vtable = load ptr, ptr %BV, align 8, !tbaa !6
%0 = load ptr, ptr %vtable, align 8
tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV)
ret void
}
```
IR after enabling speculative devirtualization:
```
define dso_local void @_Z3fooP4Base(ptr noundef %BV) local_unnamed_addr #0 {
entry:
%vtable = load ptr, ptr %BV, align 8, !tbaa !12
%0 = load ptr, ptr %vtable, align 8
%1 = icmp eq ptr %0, @_ZN4Base17virtual_function1Ev
br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect, !prof !15
if.true.direct_targ: ; preds = %entry
tail call void @_ZN4Base17virtual_function1Ev(ptr noundef nonnull align 8 dereferenceable(8) %BV)
br label %if.end.icp
if.false.orig_indirect: ; preds = %entry
tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV)
br label %if.end.icp
if.end.icp: ; preds = %if.false.orig_indirect, %if.true.direct_targ
ret void
}
```
The libcall lowering decisions should be program dependent,
depending on the current module's RuntimeLibcallInfo. We need
another related analysis derived from that plus the current
function's subtarget to provide concrete lowering decisions.
This takes on a somewhat unusual form. It's a Module analysis,
with a lookup keyed on the subtarget. This is a separate module
analysis from RuntimeLibraryAnalysis to avoid that depending on
codegen. It's not a function pass to avoid depending on any
particular function, to avoid repeated subtarget map lookups in
most of the use passes, and to avoid any recomputation in the
common case of one subtarget (and keeps it reusable across
repeated compilations).
This also switches ExpandFp and PreISelIntrinsicLowering as
a sample function and module pass. Note this is not yet wired
up to SelectionDAG, which is still using the LibcallLoweringInfo
constructed inside of TargetLowering.
Remove explicit insertion of the AllocTokenPass, which is now handled by
the PassBuilder. Emit AllocToken configuration as LLVM module flags to
persist into the backend.
Specifically, this also means it will now be handled by LTO backend
phases; this avoids interference with other optimizations (e.g. PGHO)
and enable late heap-allocation optimizations with LTO enabled.
Unconditionally add AllocTokenPass to the optimization pipelines, and
ensure that it runs last in LTO backend pipelines. The latter ensures
that AllocToken instrumentation can be moved later in the LTO pipeline
to avoid interference with other optimizations (e.g. PGHO) and enable
late heap-allocation optimizations.
In preparation of removing AllocTokenPass being added by Clang, add
support for AllocTokenPass to read configuration options from LLVM
module flags.
To optimize given the pass is now runs unconditionally, only retrieve
TargetLibraryInfo and OptimizationRemarkEmitter when necessary.
d076608d58d1ec55016eb747a995511e3a3f72aa moved some deps around to avoid
cycles and left clang/Frontend/FrontendDiagnostic.h as a shim that
simply includes clang/Basic/DiagnosticFrontend.h. This PR inlines it so
that nothing in tree still includes clang/Frontend/FrontendDiagnostic.h.
Doing this will help prevent future layering issues. See #162865.
Frontend already depends on Basic, so no new deps need to be added
anywhere except for places that do strict dep checking.
After the base branch was moved to main, this somehow ended up
adding a second definition of RTLCI, instead of modifying the
existing one.
Also fix other build error with gcc bots.
This fixes the -fveclib flag getting lost on its way to the backend.
Previously this was its own cl::opt with a random boolean. Move the
flag handling into CommandFlags with other backend ABI-ish options,
and have clang directly set it, rather than forcing it to go through
command line parsing.
Prior to de68181d7f, codegen used TargetLibraryInfo to find the vector
function. Clang has special handling for TargetLibraryInfo, where it
would
directly construct one with the vector library in the pass pipeline.
RuntimeLibcallsInfo currently is not used as an analysis in codegen, and
needs to know the vector library when constructed.
RuntimeLibraryAnalysis could follow the same trick that
TargetLibraryInfo is using in the future, but a lot more boilerplate changes
are needed to thread that analysis through codegen. Ideally this would come
from an IR module flag, and nothing would be in TargetOptions. For now, it's
better for all of these sorts of controls to be consistent.
Update all uses of variadic `.Cases` to use the initializer list
overload instead. I plan to mark variadic `.Cases` as deprecated in a
followup PR.
For more context, see https://github.com/llvm/llvm-project/pull/163117.
Implement code generation for `__builtin_infer_alloc_token()`. The
`AllocToken` pass is now registered to run unconditionally in the
optimization pipeline. This ensures that all instances of the
`llvm.alloc.token.id` intrinsic are lowered to constant token IDs,
regardless of whether `-fsanitize=alloc-token` is enabled. This
guarantees that the builtin always resolves to a token value, providing
a consistent and reliable mechanism for compile-time token querying.
This completes `__builtin_infer_alloc_token(<malloc-args>, ...)` to
allow compile-time querying of the token ID, where the builtin arguments
mirror those normally passed to any allocation function. The argument
expressions are unevaluated operands. For type-based token modes, the
same type inference logic is used as for untyped allocation calls.
For example the ID that is passed to (with `-fsanitize=alloc-token`):
some_malloc(sizeof(Type), ...)
is equivalent to the token ID returned by
__builtin_infer_alloc_token(sizeof(Type), ...)
The builtin provides a mechanism to pass or compare token IDs in code
that needs to be explicitly allocation token-aware (such as inside an
allocator, or through wrapper macros).
A more concrete demonstration of __builtin_infer_alloc_token's use is
enabling type-aware Slab allocations in the Linux kernel:
https://lore.kernel.org/all/20250825154505.1558444-1-elver@google.com/
Notably, any kind of allocation-call rewriting is a poor fit for the
Linux kernel's kmalloc-family functions, which are macros that wrap
(multiple) layers of inline and non-inline wrapper functions. Given the
Linux kernel defines its own allocation APIs, the more explicit builtin
gives the right level of control over where the type inference happens
and the resulting token is passed.
This PR passes the VFS to LLVM's sanitizer passes from Clang, so that
the configuration files can be loaded in the same way all other compiler
inputs are.
Implement KCFI (Kernel Control Flow Integrity) backend support for
ARM32, Thumb2, and Thumb1. The Linux kernel has supported ARM KCFI via
Clang's generic KCFI implementation, but this has finally started to
[cause problems](https://github.com/ClangBuiltLinux/linux/issues/2124)
so it's time to get the KCFI operand bundle lowering working on ARM.
Supports patchable-function-prefix with adjusted load offsets. Provides
an instruction size worst case estimate of how large the KCFI bundle is
so that range-limited instructions (e.g. cbz) know how big the indirect
calls can become.
ARM implementation notes:
- Four-instruction EOR sequence builds the 32-bit type ID byte-by-byte
to work within ARM's modified immediate encoding constraints.
- Scratch register selection: r12 (IP) is preferred, r3 used as fallback
when r12 holds the call target. r3 gets spilled/reloaded if it is
being used as a call argument.
- UDF trap encoding: 0x8000 | (0x1F << 5) | target_reg_index, similar
to aarch64's trap encoding.
Thumb2 implementation notes:
- Logically the same as ARM
- UDF trap encoding: 0x80 | target_reg_index
Thumb1 implementation notes:
- Due to register pressure, 2 scratch registers are needed: r3 and r2,
which get spilled/reloaded if they are being used as call args.
- Instead of EOR, add/lsl sequence to load immediate, followed by
a compare.
- No trap encoding.
Update tests to validate all three sub targets.
Move the `AllocTokenMax` from `CodeGenOptions` and introduces a new
`AllocTokenMode` to `LangOptions`. Note, `-falloc-token-mode=`
deliberately remains an internal experimental option.
This refactoring is necessary because these options influence frontend
behavior, specifically constexpr evaluation of `__builtin_infer_alloc_token`.
Placing them in `LangOptions` makes them accessible during semantic analysis,
which occurs before codegen.
> clang -flto -funified-lto -save-temps test.c
Fails with the following error message:
```bash
module flag identifiers must be unique (or of 'require' type)
!"UnifiedLTO"
fatal error: error in backend: Broken module found, compilation aborted!
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
```
Here is what the driver does when `-save-temps` flag is set:
> clang -flto -funified-lto -save-temps test.c -ccc-print-phases
> +- 0: input, "test.c", c
> +- 1: preprocessor, {0}, cpp-output
> +- 2: compiler, {1}, ir
> +- 3: backend, {2}, lto-bc
> 4: linker, {3}, image
The IR output of "compiler" step has "UnifiedLTO" module flag. "backend"
step adds another module flag with "UnifiedLTO" identifier, which
invalidates the LLVM IR module.
This PR starts using the correct VFS in `GCOVProfilerPass` instead of
using the real FS directly. This matches compiler's behavior for other
input files.
This PR loads the path from `-fembed-offload-object=<path>` through the
VFS rather than going straight to the real file system. This matches the
behavior of other input files of the compiler. This technically changes
behavior in that `-fembed-offload-object=-` no longer loads the file
from stdin, but I don't think that was the intention of the original
code anyways.
This PR loads the path from `-fbasic-block-sections=list=<path>` through
the VFS rather than going straight to the real file system. This matches
the behavior of other input files of the compiler.
Some LLVM passes need access to the filesystem to read configuration
files and similar. In some places, this is achieved by grabbing the VFS
from `PGOOptions`, but some passes don't have access to these and resort
to just calling `vfs::getRealFileSystem()`. This PR allows setting the
VFS directly on `PassBuilder` that's able to pass it down to all passes
that need it.
Currently there are two serialization modes for bitstream Remarks:
standalone and separate. The separate mode splits remark metadata (e.g.
the string table) from actual remark data. The metadata is written into
the object file by the AsmPrinter, while the remark data is stored in a
separate remarks file. This means we can't use bitstream remarks with
tools like opt that don't generate an object file. Also, it is confusing
to post-process bitstream remarks files, because only the standalone
files can be read by llvm-remarkutil. We always need to use dsymutil
to convert the separate files to standalone files, which only works for
MachO. It is not possible for clang/opt to directly emit bitstream
remark files in standalone mode, because the string table can only be
serialized after all remarks were emitted.
Therefore, this change completely removes the separate serialization
mode. Instead, the remark string table is now always written to the end
of the remarks file. This requires us to tell the serializer when to
finalize remark serialization. This automatically happens when the
serializer goes out of scope. However, often the remark file goes out of
scope before the serializer is destroyed. To diagnose this, I have added
an assert to alert users that they need to explicitly call
finalizeLLVMOptimizationRemarks.
This change paves the way for further improvements to the remark
infrastructure, including more tooling (e.g. #159784), size optimizations
for bitstream remarks, and more.
Pull Request: https://github.com/llvm/llvm-project/pull/156715
This reverts commit 895cda70a95529fd22aac05eee7c34f7624996af.
And fix attempt: 06f671e57a574ba1c5127038eff8e8773273790e.
Performance regressions and broken sanitizers, see #142686.
This patch adds the flag -fexperimental-loop-fuse to the clang and flang
drivers. This is primarily useful for experiments as we envision to
enable the pass one day.
The options are based on the same principles and reason on which we have
`floop-interchange`.
---------
Co-authored-by: Madhur Amilkanthwar <madhura@nvidia.com>
Remove "approx-func-fp-math" attribute and related command line option,
users should always use afn flag in IR.
Resolve FIXME in `TargetMachine::resetTargetOptions` partially.
This PR removes the command line parsing workaround introduced in
https://github.com/llvm/llvm-project/pull/146342 by moving
`LangOptions::ExceptionHandling` to `CodeGenOptions` that get parsed
even for IR input. Additionally, this improves layering, where the
codegen library now checks `CodeGenOptions` instead of `LangOptions` for
exception handling. (This got enabled by
https://github.com/llvm/llvm-project/pull/146422.)