923 Commits

Author SHA1 Message Date
Hassnaa Hamdi
9e7ce77573
[Clang]: Support opt-in speculative devirtualization (#159685)
This patch adds Clang support for speculative devirtualization and
integrates the related pass into the pass pipeline.
It's building on the LLVM backend implementation from PR #159048.
Speculative devirtualization transforms an indirect call (the virtual
function) to a guarded direct call.
It is guarded by a comparison of the virtual function pointer to the
expected target.
This optimization is still safe without LTO because it doesn't do direct
calls, it's conditional according to the function ptr.
This optimization:
- Opt-in: Disabled by default, enabled via `-fdevirtualize-speculatively`
- Works in non-LTO mode
- Handles publicly-visible objects.
- Uses guarded devirtualization with fallback to indirect calls when the
speculation is incorrect.

For this C++ example:
```
class Base {
public:
    __attribute__((noinline))
    virtual void virtual_function1() { asm volatile("NOP"); }
    virtual void virtual_function2() { asm volatile("NOP"); }
};
class Derived : public Base {
public:
    void virtual_function2() override { asm volatile("NOP"); }
};
__attribute__((noinline))
void foo(Base *BV) {
    BV->virtual_function1();
}
void bar() {
    Base *b = new Derived();
    foo(b);
}
```
Here is the IR without enabling speculative devirtualization:
```
define dso_local void @_Z3fooP4Base(ptr noundef %BV) local_unnamed_addr #0 {
entry:
  %vtable = load ptr, ptr %BV, align 8, !tbaa !6
  %0 = load ptr, ptr %vtable, align 8
  tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV)
  ret void
}
```
IR after enabling speculative devirtualization:
```
define dso_local void @_Z3fooP4Base(ptr noundef %BV) local_unnamed_addr #0 {
entry:
  %vtable = load ptr, ptr %BV, align 8, !tbaa !12
  %0 = load ptr, ptr %vtable, align 8
  %1 = icmp eq ptr %0, @_ZN4Base17virtual_function1Ev
  br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect, !prof !15

if.true.direct_targ:                              ; preds = %entry
  tail call void @_ZN4Base17virtual_function1Ev(ptr noundef nonnull align 8 dereferenceable(8) %BV)
  br label %if.end.icp

if.false.orig_indirect:                           ; preds = %entry
  tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV)
  br label %if.end.icp

if.end.icp:                                       ; preds = %if.false.orig_indirect, %if.true.direct_targ
  ret void
}
```
2025-12-07 14:24:30 +00:00
Matt Arsenault
04c81a9973
CodeGen: Add LibcallLoweringInfo analysis pass (#168622)
The libcall lowering decisions should be program dependent,
depending on the current module's RuntimeLibcallInfo. We need
another related analysis derived from that plus the current
function's subtarget to provide concrete lowering decisions.

This takes on a somewhat unusual form. It's a Module analysis,
with a lookup keyed on the subtarget. This is a separate module
analysis from RuntimeLibraryAnalysis to avoid that depending on
codegen. It's not a function pass to avoid depending on any
particular function, to avoid repeated subtarget map lookups in
most of the use passes, and to avoid any recomputation in the
common case of one subtarget (and keeps it reusable across
repeated compilations).

This also switches ExpandFp and PreISelIntrinsicLowering as
a sample function and module pass. Note this is not yet wired
up to SelectionDAG, which is still using the LibcallLoweringInfo
constructed inside of TargetLowering.
2025-12-03 22:00:12 +01:00
Marco Elver
3c6864ab88
[Clang][CodeGen] Remove explicit insertion of AllocToken pass (#169360)
Remove explicit insertion of the AllocTokenPass, which is now handled by
the PassBuilder. Emit AllocToken configuration as LLVM module flags to
persist into the backend.

Specifically, this also means it will now be handled by LTO backend
phases; this avoids interference with other optimizations (e.g. PGHO)
and enable late heap-allocation optimizations with LTO enabled.
2025-12-02 16:13:45 +01:00
Marco Elver
96c69b7393
[LTO][AllocToken] Support AllocToken instrumentation in backend (#169358)
Unconditionally add AllocTokenPass to the optimization pipelines, and
ensure that it runs last in LTO backend pipelines. The latter ensures
that AllocToken instrumentation can be moved later in the LTO pipeline
to avoid interference with other optimizations (e.g. PGHO) and enable
late heap-allocation optimizations.

In preparation of removing AllocTokenPass being added by Clang, add
support for AllocTokenPass to read configuration options from LLVM
module flags.

To optimize given the pass is now runs unconditionally, only retrieve
TargetLibraryInfo and OptimizationRemarkEmitter when necessary.
2025-12-02 10:50:39 +01:00
Florian Mayer
e2a29eca56 [UBSan] Use -fsanitize-handler-preserve-all-regs in codegen
Pull Request: https://github.com/llvm/llvm-project/pull/168645
2025-11-26 17:29:48 -08:00
Jordan Rupprecht
3d3307ecd8
[clang][NFC] Inline Frontend/FrontendDiagnostic.h -> Basic/DiagnosticFrontend.h (#162883)
d076608d58d1ec55016eb747a995511e3a3f72aa moved some deps around to avoid
cycles and left clang/Frontend/FrontendDiagnostic.h as a shim that
simply includes clang/Basic/DiagnosticFrontend.h. This PR inlines it so
that nothing in tree still includes clang/Frontend/FrontendDiagnostic.h.

Doing this will help prevent future layering issues. See #162865.

Frontend already depends on Basic, so no new deps need to be added
anywhere except for places that do strict dep checking.
2025-11-21 03:39:49 +00:00
Matt Arsenault
862d34666f
opt: Fix bad merge of #167996 (#168110)
After the base branch was moved to main, this somehow ended up
adding a second definition of RTLCI, instead of modifying the
existing one.

Also fix other build error with gcc bots.
2025-11-14 12:03:26 -08:00
Matt Arsenault
590ab43e8a
RuntimeLibcalls: Move VectorLibrary handling into TargetOptions (#167996)
This fixes the -fveclib flag getting lost on its way to the backend.

Previously this was its own cl::opt with a random boolean. Move the
flag handling into CommandFlags with other backend ABI-ish options,
and have clang directly set it, rather than forcing it to go through
command line parsing.

Prior to de68181d7f, codegen used TargetLibraryInfo to find the vector
function. Clang has special handling for TargetLibraryInfo, where it
would
directly construct one with the vector library in the pass pipeline.
RuntimeLibcallsInfo currently is not used as an analysis in codegen, and
needs to know the vector library when constructed.

RuntimeLibraryAnalysis could follow the same trick that
TargetLibraryInfo is using in the future, but a lot more boilerplate changes 
are needed to thread that analysis through codegen. Ideally this would come 
from an IR module flag, and nothing would be in TargetOptions. For now, it's 
better for all of these sorts of controls to be consistent.
2025-11-14 11:19:21 -08:00
Jakub Kuderski
4c21d0cb14
[ADT] Prepare to deprecate variadic StringSwitch::Cases. NFC. (#166020)
Update all uses of variadic `.Cases` to use the initializer list
overload instead. I plan to mark variadic `.Cases` as deprecated in a
followup PR.

For more context, see https://github.com/llvm/llvm-project/pull/163117.
2025-11-02 00:12:33 +00:00
Marco Elver
8c8f2df232
[Clang][CodeGen] Implement code generation for __builtin_infer_alloc_token() (#156842)
Implement code generation for `__builtin_infer_alloc_token()`. The
`AllocToken` pass is now registered to run unconditionally in the
optimization pipeline.  This ensures that all instances of the
`llvm.alloc.token.id` intrinsic are lowered to constant token IDs,
regardless of whether `-fsanitize=alloc-token` is enabled. This
guarantees that the builtin always resolves to a token value, providing
a consistent and reliable mechanism for compile-time token querying.

This completes `__builtin_infer_alloc_token(<malloc-args>, ...)` to
allow compile-time querying of the token ID, where the builtin arguments
mirror those normally passed to any allocation function. The argument
expressions are unevaluated operands. For type-based token modes, the
same type inference logic is used as for untyped allocation calls.

For example the ID that is passed to (with `-fsanitize=alloc-token`):

    some_malloc(sizeof(Type), ...)

is equivalent to the token ID returned by

    __builtin_infer_alloc_token(sizeof(Type), ...)

The builtin provides a mechanism to pass or compare token IDs in code
that needs to be explicitly allocation token-aware (such as inside an
allocator, or through wrapper macros).

A more concrete demonstration of __builtin_infer_alloc_token's use is
enabling type-aware Slab allocations in the Linux kernel:

  https://lore.kernel.org/all/20250825154505.1558444-1-elver@google.com/

Notably, any kind of allocation-call rewriting is a poor fit for the
Linux kernel's kmalloc-family functions, which are macros that wrap
(multiple) layers of inline and non-inline wrapper functions. Given the
Linux kernel defines its own allocation APIs, the more explicit builtin
gives the right level of control over where the type inference happens
and the resulting token is passed.
2025-10-28 16:55:29 +01:00
Jan Svoboda
c1f652883b
[llvm][clang] Explicitly pass the VFS to sanitizer passes (#165267)
This PR passes the VFS to LLVM's sanitizer passes from Clang, so that
the configuration files can be loaded in the same way all other compiler
inputs are.
2025-10-27 10:52:27 -07:00
Kees Cook
d130f40264
[ARM][KCFI] Add backend support for Kernel Control-Flow Integrity (#163698)
Implement KCFI (Kernel Control Flow Integrity) backend support for
ARM32, Thumb2, and Thumb1. The Linux kernel has supported ARM KCFI via
Clang's generic KCFI implementation, but this has finally started to
[cause problems](https://github.com/ClangBuiltLinux/linux/issues/2124)
so it's time to get the KCFI operand bundle lowering working on ARM.

Supports patchable-function-prefix with adjusted load offsets. Provides
an instruction size worst case estimate of how large the KCFI bundle is
so that range-limited instructions (e.g. cbz) know how big the indirect
calls can become.

ARM implementation notes:
- Four-instruction EOR sequence builds the 32-bit type ID byte-by-byte
  to work within ARM's modified immediate encoding constraints.
- Scratch register selection: r12 (IP) is preferred, r3 used as fallback
  when r12 holds the call target. r3 gets spilled/reloaded if it is
  being used as a call argument.
- UDF trap encoding: 0x8000 | (0x1F << 5) | target_reg_index, similar
  to aarch64's trap encoding.

Thumb2 implementation notes:
- Logically the same as ARM
- UDF trap encoding: 0x80 | target_reg_index

Thumb1 implementation notes:
- Due to register pressure, 2 scratch registers are needed: r3 and r2,
  which get spilled/reloaded if they are being used as call args.
- Instead of EOR, add/lsl sequence to load immediate, followed by
  a compare.
- No trap encoding.

Update tests to validate all three sub targets.
2025-10-23 08:27:13 -07:00
Marco Elver
f7fb52aea0
[Clang] Move AllocToken frontend options to LangOptions (#163635)
Move the `AllocTokenMax` from `CodeGenOptions` and introduces a new
`AllocTokenMode` to `LangOptions`. Note, `-falloc-token-mode=`
deliberately remains an internal experimental option.

This refactoring is necessary because these options influence frontend
behavior, specifically constexpr evaluation of `__builtin_infer_alloc_token`.
Placing them in `LangOptions` makes them accessible during semantic analysis,
which occurs before codegen.
2025-10-22 14:58:06 +02:00
paperchalice
bc7e3ff5c5
[clang] Remove rest TargetOptions::UnsafeFPMath use (#164525)
This should be the last use in clang.
2025-10-22 11:43:41 +08:00
Alexey Bader
4a44f03271
[Clang][LTO] Fix use of funified-lto and save-temps flags together (#162763)
> clang -flto -funified-lto -save-temps test.c

Fails with the following error message:

```bash
module flag identifiers must be unique (or of 'require' type)
!"UnifiedLTO"
fatal error: error in backend: Broken module found, compilation aborted!
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
```

Here is what the driver does when `-save-temps` flag is set:

> clang -flto -funified-lto -save-temps test.c -ccc-print-phases
>          +- 0: input, "test.c", c
>       +- 1: preprocessor, {0}, cpp-output
>    +- 2: compiler, {1}, ir
> +- 3: backend, {2}, lto-bc
> 4: linker, {3}, image

The IR output of "compiler" step has "UnifiedLTO" module flag. "backend"
step adds another module flag with "UnifiedLTO" identifier, which
invalidates the LLVM IR module.
2025-10-15 13:46:39 -07:00
Prabhu Rajasekaran
e6b49ceeaa
[clang] Introduce CallGraphSection codegen option (#117037) 2025-10-11 13:59:18 -07:00
Marco Elver
774ffe5cce
[Clang] Wire up -fsanitize=alloc-token (#156839)
Wire up the `-fsanitize=alloc-token` command-line option, hooking up
the `AllocToken` pass -- it provides allocation tokens to compatible
runtime allocators, enabling different heap organization strategies,
e.g. hardening schemes based on heap partitioning.

The instrumentation rewrites standard allocation calls into variants
that accept an additional `size_t token_id` argument. For example,
calls to `malloc(size)` become `__alloc_token_malloc(size, token_id)`,
and a C++ `new MyType` expression will call
`__alloc_token__Znwm(size, token_id)`.

Currently untyped allocation calls do not yet have `!alloc_token`
metadata, and therefore receive the fallback token only. This will be
fixed in subsequent changes through best-effort type-inference.

One benefit of the instrumentation approach is that it can be applied
transparently to large codebases, and scales in deployment as other
sanitizers.

Similarly to other sanitizers, instrumentation can selectively be
controlled using `__attribute__((no_sanitize("alloc-token")))`. Support
for sanitizer ignorelists to disable instrumentation for specific
functions or source files is implemented.

See clang/docs/AllocToken.rst for more usage instructions.

Link:
https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434

---

This change is part of the following series:
  1. https://github.com/llvm/llvm-project/pull/160131
  2. https://github.com/llvm/llvm-project/pull/156838
  3. https://github.com/llvm/llvm-project/pull/162098
  4. https://github.com/llvm/llvm-project/pull/162099
  5. https://github.com/llvm/llvm-project/pull/156839
  6. https://github.com/llvm/llvm-project/pull/156840
  7. https://github.com/llvm/llvm-project/pull/156841
  8. https://github.com/llvm/llvm-project/pull/156842
2025-10-08 20:59:24 +02:00
Jan Svoboda
cbfe89f0ec
[llvm][clang] Use the VFS in GCOVProfilerPass (#161260)
This PR starts using the correct VFS in `GCOVProfilerPass` instead of
using the real FS directly. This matches compiler's behavior for other
input files.
2025-09-29 21:37:18 +00:00
Jan Svoboda
37282bcee1
[clang] Load -fembed-offload-object= through the VFS (#160906)
This PR loads the path from `-fembed-offload-object=<path>` through the
VFS rather than going straight to the real file system. This matches the
behavior of other input files of the compiler. This technically changes
behavior in that `-fembed-offload-object=-` no longer loads the file
from stdin, but I don't think that was the intention of the original
code anyways.
2025-09-26 10:35:22 -07:00
Jan Svoboda
078a4e93da
[clang] Load -fbasic-block-sections=list= through the VFS (#160785)
This PR loads the path from `-fbasic-block-sections=list=<path>` through
the VFS rather than going straight to the real file system. This matches
the behavior of other input files of the compiler.
2025-09-25 15:25:08 -07:00
Jan Svoboda
a5569b4bd7
[llvm] Add vfs::FileSystem to PassBuilder (#160188)
Some LLVM passes need access to the filesystem to read configuration
files and similar. In some places, this is achieved by grabbing the VFS
from `PGOOptions`, but some passes don't have access to these and resort
to just calling `vfs::getRealFileSystem()`. This PR allows setting the
VFS directly on `PassBuilder` that's able to pass it down to all passes
that need it.
2025-09-25 10:15:47 -07:00
Tobias Stadler
dfbd76bda0
[Remarks] Restructure bitstream remarks to be fully standalone (#156715)
Currently there are two serialization modes for bitstream Remarks:
standalone and separate. The separate mode splits remark metadata (e.g.
the string table) from actual remark data. The metadata is written into
the object file by the AsmPrinter, while the remark data is stored in a
separate remarks file. This means we can't use bitstream remarks with
tools like opt that don't generate an object file. Also, it is confusing
to post-process bitstream remarks files, because only the standalone
files can be read by llvm-remarkutil. We always need to use dsymutil
to convert the separate files to standalone files, which only works for
MachO. It is not possible for clang/opt to directly emit bitstream
remark files in standalone mode, because the string table can only be
serialized after all remarks were emitted.

Therefore, this change completely removes the separate serialization
mode. Instead, the remark string table is now always written to the end
of the remarks file. This requires us to tell the serializer when to
finalize remark serialization. This automatically happens when the
serializer goes out of scope. However, often the remark file goes out of
scope before the serializer is destroyed. To diagnose this, I have added
an assert to alert users that they need to explicitly call
finalizeLLVMOptimizationRemarks.

This change paves the way for further improvements to the remark
infrastructure, including more tooling (e.g. #159784), size optimizations
for bitstream remarks, and more.

Pull Request: https://github.com/llvm/llvm-project/pull/156715
2025-09-22 16:41:39 +01:00
Jan Svoboda
152a2162a1
[llvm][clang] Pass VFS to llvm::cl command line handling (#159174)
This PR passes the VFS down to `llvm::cl` functions so that they don't
assume the real file system.
2025-09-18 14:53:40 -07:00
Madhur Amilkanthwar
a134b06217
Reapply "Introduce -fexperimental-loop-fusion to clang and flang (#158844)
This PR is a reapplication of
https://github.com/llvm/llvm-project/pull/142686
2025-09-16 16:43:37 +05:30
Vitaly Buka
cedceeb800
Revert "Introduce -fexperimental-loop-fuse to clang and flang (#142686)" (#158764)
This reverts commit 895cda70a95529fd22aac05eee7c34f7624996af.
And fix attempt: 06f671e57a574ba1c5127038eff8e8773273790e.

Performance regressions and broken sanitizers, see #142686.
2025-09-16 02:30:18 +00:00
Sebastian Pop
895cda70a9
Introduce -fexperimental-loop-fuse to clang and flang (#142686)
This patch adds the flag -fexperimental-loop-fuse to the clang and flang
drivers. This is primarily useful for experiments as we envision to
enable the pass one day.

The options are based on the same principles and reason on which we have
`floop-interchange`.

---------

Co-authored-by: Madhur Amilkanthwar <madhura@nvidia.com>
2025-09-15 18:48:32 +05:30
paperchalice
205d461a19
[IR][CodeGen] Remove "approx-func-fp-math" attribute (#155740)
Remove "approx-func-fp-math" attribute and related command line option,
users should always use afn flag in IR.
Resolve FIXME in `TargetMachine::resetTargetOptions` partially.
2025-08-29 09:52:07 +08:00
AZero13
f2fe4718aa
[ObjCARC] Completely remove ObjCARCAPElimPass (#150717)
ObjCARCAPElimPass has been made obsolete now that we remove unused
autorelease pools.
2025-07-26 08:07:27 -07:00
Jan Svoboda
76058c0907
[clang] Move ExceptionHandling from LangOptions to CodeGenOptions (#148982)
This PR removes the command line parsing workaround introduced in
https://github.com/llvm/llvm-project/pull/146342 by moving
`LangOptions::ExceptionHandling` to `CodeGenOptions` that get parsed
even for IR input. Additionally, this improves layering, where the
codegen library now checks `CodeGenOptions` instead of `LangOptions` for
exception handling. (This got enabled by
https://github.com/llvm/llvm-project/pull/146422.)
2025-07-16 07:11:13 -07:00
Stephen Tozer
242996efee
[Clang][DLCov][NFCish] Fix debugloc coverage tracking macro in Clang (#146521)
In a previous commit, the llvm-config-defined macro
LLVM_ENABLE_DEBUGLOC_COVERAGE_TRACKING was renamed to
LLVM_ENABLE_DEBUGLOC_TRACKING_COVERAGE. One instance of this in Clang
remains unchanged; this patch renames it, and adds an explicit
llvm-config inclusion to ensure the define doesn't silently get removed.

NFC outside of coverage tracking builds, which we do not currently test.
2025-07-02 15:57:56 +01:00
Florian Mayer
8d2034cf68
[clang] Add flag fallow-runtime-check-skip-hot-cutoff (#145999)
Co-authored-by: Kazu Hirata <kazu@google.com>
2025-06-27 13:46:54 -07:00
Andrew Rogers
a451fff1ad
[llvm] fix extern cl::opt definitions for DLL export (#145374)
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch ensures a few `cl::opt` declarations
are properly annotated with `LLVM_ABI`. The annotations currently have
no meaningful impact on the LLVM build; however, they are a prerequisite
to support an LLVM Windows DLL (shared library) build.

## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

## Overview
- Remove local `extern` declarations of `llvm::PrintPipelinePasses`
because it is already correctly declared with an `LLVM_ABI` annotation
in `llvm\Passes\PassBuilder.h`. Leaving these declarations results in a
gcc compile warning unless they are also annotated with `LLVM_ABI`.
- Similarly, remove local `extern` declarations of
`ProfileSummaryCutoffHot` and `UseContextLessSummary` from
`llvm/tools/llvm-profgen/ProfileGenerator.cpp` since they are declared
with `LLVM_ABI` in `llvm\ProfileData\ProfileCommon.h`.
- Explicitly annotate the extern declaration of `ProfileCorrelate` in
`clang/lib/CodeGen/BackendUtil.cpp` since it is not declared in a
header. The definition of `ProfileCorrelate` in
`llvm\lib\Transforms\Instrumentation\InstrProfiling.cpp` is already
annotated with `LLVM_ABI`.

## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
2025-06-24 16:23:00 -07:00
FYK
52d34865b9
Fix and reapply IR PGO support for Flang (#142892)
This PR resubmits the changes from #136098, which was previously
reverted due to a build failure during the linking stage:

```
undefined reference to `llvm::DebugInfoCorrelate'  
undefined reference to `llvm::ProfileCorrelate'
```

The root cause was that `llvm/lib/Frontend/Driver/CodeGenOptions.cpp`
references symbols from the `Instrumentation` component, but the
`LINK_COMPONENTS` in the `llvm/lib/Frontend/CMakeLists.txt` for
`LLVMFrontendDriver` did not include it. As a result, linking failed in
configurations where these components were not transitively linked.

### Fix:

This updated patch explicitly adds `Instrumentation` to
`LINK_COMPONENTS` in the relevant `llvm/lib/Frontend/CMakeLists.txt`
file to ensure the required symbols are properly resolved.

---------

Co-authored-by: ict-ql <168183727+ict-ql@users.noreply.github.com>
Co-authored-by: Chyaka <52224511+liliumshade@users.noreply.github.com>
Co-authored-by: Tarun Prabhu <tarunprabhu@gmail.com>
2025-06-13 12:05:16 -06:00
Nuri Amari
347186b259
Avoid Assertion Failure Using -fcs-profile-generate with distributed thin-lto (#129736)
When using `-fcs-generate-profile` with distributed thin-lto in the same
fashion we do for local thin-lto, we hit the following assertion:


6041c745f3/llvm/lib/Support/PGOOptions.cpp (L36)

Using local thin-lto with LLD for MachO, we set the missing path
automatically to a default value: https://reviews.llvm.org/D151589. In
this fix we add the same behavior.

---------

Co-authored-by: Nuri Amari <nuriamari@fb.com>
2025-06-06 17:58:19 -04:00
Snehasish Kumar
16c7b3c9f5
[MemProf] Split MemProfiler into Instrumentation and Use. (#142811)
Most of the recent development on the MemProfiler has been on the Use part. The instrumentation has been quite stable for a while. As the complexity of the use grows (with undrifting, diagnostics etc) I figured it would be good to separate these two implementations.
2025-06-05 07:36:50 -07:00
Tarun Prabhu
597340b5b6
Revert "Add IR Profile-Guided Optimization (IR PGO) support to the Flang compiler" (#142159)
Reverts llvm/llvm-project#136098
2025-05-30 08:27:08 -06:00
FYK
d27a210a77
Add IR Profile-Guided Optimization (IR PGO) support to the Flang compiler (#136098)
This patch implements IR-based Profile-Guided Optimization support in
Flang through the following flags:

- `-fprofile-generate` for instrumentation-based profile generation

- `-fprofile-use=<dir>/file` for profile-guided optimization

Resolves #74216 (implements IR PGO support phase)

**Key changes:**

- Frontend flag handling aligned with Clang/GCC semantics

- Instrumentation hooks into LLVM PGO infrastructure

- LIT tests verifying:

    - Instrumentation metadata generation

    - Profile loading from specified path

    - Branch weight attribution (IR checks)

**Tests:**

- Added gcc-flag-compatibility.f90 test module verifying:

    -  Flag parsing boundary conditions

    -  IR-level profile annotation consistency

    -  Profile input path normalization rules

- SPEC2006 benchmark results will be shared in comments

For details on LLVM's PGO framework, refer to [Clang PGO
Documentation](https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization).

This implementation was developed by [XSCC Compiler
Team](https://github.com/orgs/OpenXiangShan/teams/xscc).

---------

Co-authored-by: ict-ql <168183727+ict-ql@users.noreply.github.com>
Co-authored-by: Tom Eccles <t@freedommail.info>
2025-05-30 08:13:53 -06:00
Kazu Hirata
8075c15f54
[CodeGen] Remove unused includes (NFC) (#141418)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-25 10:55:28 -07:00
Kees Cook
c7b2d98c93
[sancov] Introduce optional callback for stack-depth tracking (#138323)
Normally -fsanitize-coverage=stack-depth inserts inline arithmetic to
update thread_local __sancov_lowest_stack. To support stack depth
tracking in the Linux kernel, which does not implement traditional
thread_local storage, provide the option to call a function instead.

This matches the existing "stackleak" implementation that is supported
in Linux via a GCC plugin. To make this coverage more performant, a
minimum estimated stack depth can be chosen to enable the callback mode,
skipping instrumentation of functions with smaller stacks.

With -fsanitize-coverage-stack-depth-callback-min set greater than 0,
the __sanitize_cov_stack_depth() callback will be injected when the
estimated stack depth is greater than or equal to the given minimum.
2025-05-07 05:41:24 -07:00
Stephen Tozer
92195f6fc8 Reapply "[DLCov] Implement DebugLoc coverage tracking (#107279)"
Reapplied after fixing the config issue that was causing issues following
the previous merge.

This reverts commit fdbf073a86573c9ac4d595fac8e06d252ce1469f.
2025-04-30 11:39:29 +01:00
Reid Kleckner
db2315afa8
[clang] Merge gtest binaries into AllClangUnitTests (#134196)
This reduces the size of the clang/unittests build directory by 64% and
my overall build dir size by 5%. Static linking is the real driving
factor here, but even if the default build configuration used shared
libraries, I don't see why we should be building so many unit test
binaries.

To make the project more approachable for new contributors, I'm
attempting to make the build a bit less resource-intensive. Build
directory size is a common complaint, and this is low-hanging fruit.

I've noticed that incremental builds leave behind the old, stale gtest binaries, and lit will keep running them. This mostly doesn't matter unless they use shared libraries, which will eventually stop working after successive builds. You can clean up the old test binaries with this command in the build directory:
  $ find tools/clang/unittests/ -iname '*Tests' -type f | xargs rm

... or you can simply clean the build directory in a more holistic way.

---------

Co-authored-by: Petr Hosek <phosek@google.com>
2025-04-29 06:32:03 -07:00
Stephen Tozer
fdbf073a86 Revert "[DLCov] Implement DebugLoc coverage tracking (#107279)"
This reverts commit a9d93ecf1f8d2cfe3f77851e0df179b386cff353.

Reverted due to the commit including a config in LLVM headers that is not
available outside of the llvm source tree.
2025-04-25 00:36:28 +01:00
Stephen Tozer
a9d93ecf1f
[DLCov] Implement DebugLoc coverage tracking (#107279)
This is part of a series of patches that tries to improve DILocation bug
detection in Debugify; see the review for more details. This is the patch
that adds the main feature, adding a set of `DebugLoc::get<Kind>`
functions that can be used for instructions with intentionally empty
DebugLocs to prevent Debugify from treating them as bugs, removing the
currently-pervasive false positives and allowing us to use Debugify (in
its original DI preservation mode) to reliably detect existing bugs and
regressions. This patch does not add uses of these functions, except for
once in Clang before optimizations, and in
`Instruction::dropLocation()`, since that is an obvious case that
immediately removes a set of false positives.
2025-04-24 19:41:25 +01:00
Alex Voicu
1bcec036e1
[HIP][HIPSTDPAR][NFC] Re-order & adapt hipstdpar specific passes (#134753)
The `hipstdpar` specific passes were not ordered ideally, especially for
`fgpu-rdc` compilations, which meant that we'd eagerly run accelerator
code selection and remove symbols that might end up used. This change
corrects that aspect by ensuring that accelerator code selection is only
done after linking (this will have to be revisited in the future once
the closed-world assumption no longer holds). Furthermore, we take the
opportunity to move allocation interposition so that it properly gets
printed when print-pipeline-passes is requested. NFC.
2025-04-15 00:47:09 +03:00
Nikita Popov
f137c3d592
[TargetRegistry] Accept Triple in createTargetMachine() (NFC) (#130940)
This avoids doing a Triple -> std::string -> Triple round trip in lots
of places, now that the Module stores a Triple.
2025-03-12 17:35:09 +01:00
Nikita Popov
979c275097
[IR] Store Triple in Module (NFC) (#129868)
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.

For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.

The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
2025-03-06 10:27:47 +01:00
Wael Yehia
8e61aae4a8
[profile] Add a clang option -fprofile-continuous that enables continuous instrumentation profiling mode (#124353)
In Continuous instrumentation profiling mode, profile or coverage data
collected via compiler instrumentation is continuously synced to the
profile file. This feature has existed for a while, and is documented
here:

https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program
This PR creates a user facing option to enable the feature.

---------

Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
2025-02-08 17:25:07 -05:00
Sjoerd Meijer
612df14c00
[Clang][Driver] Add an option to control loop-interchange (#125830)
This introduces options `-floop-interchange` and `-fno-loop-interchange`
to enable/disable the loop-interchange pass. This is part of the work
that tries to get that pass enabled by default (#124911), where it was
remarked that a user facing option to control this would be convenient
to have. The option name is the same as GCC's.
2025-02-07 10:31:24 +00:00
Thurston Dang
9c0606a08b
Reapply "[ubsan] Connect -fsanitize-skip-hot-cutoff to LowerAllowCheckPass<cutoffs>" (#125032) (#125037)
This reverts commit 928cad49beec0120686478f502899222e836b545 i.e.,
relands dccd27112722109d2e2f03e8da9ce8690f06e11b, with a fix to avoid
use-after-scope by changing the lambda to capture by value.
2025-01-30 09:37:16 -08:00
Thurston Dang
928cad49be
Revert "[ubsan] Connect -fsanitize-skip-hot-cutoff to LowerAllowCheckPass<cutoffs>" (#125032)
Reverts llvm/llvm-project#124857 due to buildbot breakage
(https://lab.llvm.org/buildbot/#/builders/46/builds/11310)
2025-01-29 22:03:05 -08:00