3367 Commits

Author SHA1 Message Date
Kazu Hirata
890c4bece2
[memprof] Use SmallVector for InlinedCallStack (NFC) (#114599)
We can stay within 8 inlined elements more than 99% of the time while
building a large application.
2024-11-01 19:52:11 -07:00
Thurston Dang
e549ec529c
[msan] Add handleIntrinsicByApplyingToShadow; support NEON tbl/tbx (#114490)
This adds a general function that handles intrinsics by applying the
intrinsic to the shadows, and applies it to the specific case of Arm
NEON TBL/TBX intrinsics.

This also updates the tests from
https://github.com/llvm/llvm-project/pull/114462
2024-11-01 14:58:45 -07:00
Lei Wang
bef3b54ea1
[InstrPGO] Avoid using global variable to fix potential data race (#114364)
In https://github.com/llvm/llvm-project/pull/109837, it sets a global
variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp
which introduced a data race detected by TSan. To fix this, I decouple
the flag setting, the flags are now set
separately(`instrument-cold-function-only-path` is required to be used
with `--pgo-instrument-cold-function-only`).
2024-10-31 21:28:13 -07:00
Dmitry Chernenkov
d924a9ba03 Revert "[InstrPGO] Support cold function coverage instrumentation (#109837)"
This reverts commit e517cfc531886bf6ed64b4e7109bb3141ac7f430.
2024-10-31 10:55:17 +00:00
Lei Wang
e517cfc531
[InstrPGO] Support cold function coverage instrumentation (#109837)
This patch adds support for cold function coverage instrumentation based
on sampling PGO counts. The major motivation is to detect dead functions
for the services that are optimized with sampling PGO. If a function is
covered by sampling profile count (e.g., those with an entry count > 0),
we choose to skip instrumenting those functions, which significantly
reduces the instrumentation overhead.

More details about the implementation and flags:
- Added a flag `--pgo-instrument-cold-function-only` in
`PGOInstrumentation.cpp` as the main switch to control skipping the
instrumentation.
- Built the extra instrumentation passes(a bundle of passes in
`addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by
`--instrument-cold-function-only-path` flag.
- Added a driver flag `-fprofile-generate-cold-function-coverage`: 
- 1) Config the flags in one place, i,e. adding
`--instrument-cold-function-only-path=<...>` and
`--pgo-function-entry-coverage`. Note that the instrumentation file path
is passed through `--instrument-sample-cold-function-path`, because we
cannot use the `PGOOptions.ProfileFile` as it's already used by
`-fprofile-sample-use=<...>`.
- 2) makes linker to link `compiler_rt.profile` lib(see
[ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131)
).
- Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry
count to determine cold function.

Overall, the full command is like:

```
clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...>  code.cc -o code
```
2024-10-28 10:13:45 -07:00
davidtrevelyan
4102625380
[rtsan][llvm][NFC] Rename sanitize_realtime_unsafe attr to sanitize_realtime_blocking (#113155)
# What

This PR renames the newly-introduced llvm attribute
`sanitize_realtime_unsafe` to `sanitize_realtime_blocking`. Likewise,
sibling variables such as `SanitizeRealtimeUnsafe` are renamed to
`SanitizeRealtimeBlocking` respectively. There are no other functional
changes.


# Why?

- There are a number of problems that can cause a function to be
real-time "unsafe",
- we wish to communicate what problems rtsan detects and *why* they're
unsafe, and
- a generic "unsafe" attribute is, in our opinion, too broad a net -
which may lead to future implementations that need extra contextual
information passed through them in order to communicate meaningful
reasons to users.
- We want to avoid this situation and make the runtime library boundary
API/ABI as simple as possible, and
- we believe that restricting the scope of attributes to names like
`sanitize_realtime_blocking` is an effective means of doing so.

We also feel that the symmetry between `[[clang::blocking]]` and
`sanitize_realtime_blocking` is easier to follow as a developer.

# Concerns

- I'm aware that the LLVM attribute `sanitize_realtime_unsafe` has been
part of the tree for a few weeks now (introduced here:
https://github.com/llvm/llvm-project/pull/106754). Given that it hasn't
been released in version 20 yet, am I correct in considering this to not
be a breaking change?
2024-10-26 13:06:11 +01:00
Vitaly Buka
cf8d24531e
[msan] Reduces overhead of #113200, by 10% (#113201)
CTMark #113200 size overhead was 5.3%, now it's 4.7%.

The patch affects only signed integers.

https://alive2.llvm.org/ce/z/Lv5hyi

* The patch replaces code which extracted sign bit,
maximized/minimized it, then packed it back, with
simple sign bit flip. The another way to think about
transformation is as a subtraction of MIN_SINT from
A/B. Then we map MIN_SINT to 0, 0 to -MIN_SINT, and
MAX_SINT to MAX_UINT.

* Then to maximize/minimize A/B we don't need
to extract sign bit, we can apply shadow the
same way as to other bits.

* After sign bit flip, we had to switch to unsigned
version of the predicates.

* After change above  getHighestPossibleValue/getLowestPossibleValue
became very similar, so we can combine into a single function.

* Because the function does sign bit flip and
requires unsigned predicates used for returned values,
there is no point in keeping it as a member of class,
to hide, we switch to function local lambda.
2024-10-24 20:46:49 -07:00
Michael O'Farrell
10f0c1aadd
[PGO] Ensure non-zero entry-count after populateCounters (#112029)
With sampled instrumentation (#69535), profile counts may appear corrupt
and `fixFuncEntryCount` may assert. In particular a function can have a
0 block count for its entry, while later blocks are non zero. This is
only likely to happen for colder functions, so it is reasonable to take
any action that does not crash. Here we simply bail from fixing the
entry count.
2024-10-22 16:05:40 -07:00
Michael O'Farrell
b4fcaa137f
[PGO][SampledInstr] Correct off by 1s and allow 100% sampling (#113350)
This corrects a couple off by ones related to the sampling of
**instrumented** counters, and enables setting 100% rates for burst
sampling (burst duration = period).

Off by ones:
Prior to this change it was impossible to set a period of 65535 because
this was converted to fast sampling which rollsover at USHRT_MAX + 1
(65536). Similarly the burst durations would collect burst duration + 1
counts as they used an ULE comparison.

100% sampling:
Although this is not useful for a productionized use case, it does allow
for more deterministic testing with the sampling checks in place. After
all the off by ones are fixed, allowing for 100% sampling is a matter of
letting burst duration = period.
2024-10-22 16:01:13 -07:00
Vitaly Buka
c77d8edf80
Revert "Revert "[msan] Switch to -msan-handle-icmp-exact my default"" (#113379)
Reverts llvm/llvm-project#113376

Fixed with #113378
2024-10-22 14:05:35 -07:00
Vitaly Buka
71792dc570
[NFC][msan] Workaround arg evaluation order diff GCC vs Clang (#113378) 2024-10-22 13:31:46 -07:00
Vitaly Buka
c3aa8b7dd6
Revert "[msan] Switch to -msan-handle-icmp-exact my default" (#113376)
Reverts llvm/llvm-project#113200

Breaks bots, see llvm/llvm-project#113200
2024-10-22 13:05:59 -07:00
Vitaly Buka
395093ec15
[msan] Switch to -msan-handle-icmp-exact my default (#113200)
Fixes #111212.

This grows .text by 5.3% on CTMark, (or 2.6% large internal binary)
Perf regressed by 1.6%. We will try to improve in follow up patches.

It worth to pay some performance regression to fix
correctness to avoid stuff like #111212.
2024-10-22 12:35:18 -07:00
goldsteinn
c85611e858
[SimplifyLibCall][Attribute] Fix bug where we may keep range attr with incompatible type (#112649)
In a variety of places we change the bitwidth of a parameter but don't
update the attributes.

The issue in this case is from the `range` attribute when inlining
`__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an
`i8`, and if the `i32` had a `range` attr assosiated it will cause an
error.

Fixes #112633
2024-10-17 10:32:55 -05:00
Jay Foad
85c17e4092
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)
Convert many instances of:
  Fn = Intrinsic::getOrInsertDeclaration(...);
  CreateCall(Fn, ...)
to the equivalent CreateIntrinsic call.
2024-10-17 16:20:43 +01:00
Qiongsi Wu
f9d0789064
[PGO] Initialize GCOV Writeout and Reset Functions in the Runtime on AIX (#108570)
This PR registers the writeout and reset functions for `gcov` for all
modules in the PGO runtime, instead of registering them
using global constructors in each module. The change is made for AIX
only, but the same mechanism works on Linux on Power.

When registering such functions using global constructors in each module
without `-ffunction-sections`, the AIX linker cannot garbage collect
unused undefined symbols, because such symbols are grouped in the same
section as the `__sinit` symbol. Keeping such undefined symbols causes
link errors (see test case
https://github.com/llvm/llvm-project/pull/108570/files#diff-500a7e1ba871e1b6b61b523700d5e30987900002add306e1b5e4972cf6d5a4f1R1
for this scenario). This PR implements the initialization in the
runtime, hence avoiding introducing `__sinit` into each module.

The implementation adds a new global variable `__llvm_covinit_functions`
to each module. This new global variable contains the function pointers
to the `Writeout` and `Reset` functions. `__llvm_covinit_functions`'s
section is the named section `__llvm_covinit`. The linker will aggregate
all the `__llvm_covinit` sections from each module
to form one single named section in the final binary. The pair of
functions
```
const __llvm_gcov_init_func_struct *__llvm_profile_begin_covinit();
const __llvm_gcov_init_func_struct *__llvm_profile_end_covinit();
```
are implemented to return the start and end address of this named
section in the final binary, and they are used in function
```
__llvm_profile_gcov_initialize()
```
(which is a constructor function in the runtime) so the runtime knows
the addresses of all the `Writeout` and `Reset` functions from all the
modules.

One noticeable implementation detail relevant to AIX is that to preserve
the `__llvm_covinit` from the linker's garbage collection, a `.ref`
pseudo instruction is inserted into them, referring to the section that
contains the `__llvm_gcov_ctr` variables, which are used in the
instrumented code. The `__llvm_gcov_ctr` variables did not belong to
named sections before, but this PR added them to the
`__llvm_gcov_ctr_section` named section, so we can add a `.ref` pseudo
instruction that refers to them in the `__llvm_covinit` section.
2024-10-17 09:32:10 -04:00
thetruestblue
927af63fdd
[SanitizerCoverage] Add an option to gate the invocation of the tracing callbacks (#108328)
Implement -sanitizer-coverage-gated-trace-callbacks to gate the
invocation of the tracing callbacks based on the value of a global
variable, which is stored in a specific section.
When this option is enabled, the instrumentation will not call into the
runtime-provided callbacks for tracing, thus only incurring in a trivial
branch without going through a function call. It is up to the runtime to
toggle the value of the global variable in order to enable tracing.

This option is only supported for trace-pc-guard. 

Note: will add additional support for trace-cmp in a follow up PR.

Patch by Filippo Bigarella

rdar://101626834
2024-10-16 21:52:38 -07:00
Jay Foad
9255850e89 [LLVM] Remove unused variables after #112546 2024-10-16 16:15:34 +01:00
Jay Foad
d9c95efb6c
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112546)
Convert almost every instance of:
  CreateCall(Intrinsic::getOrInsertDeclaration(...), ...)
to the equivalent CreateIntrinsic call.
2024-10-16 15:43:30 +01:00
Rahul Joshi
6924fc0326
[LLVM] Add Intrinsic::getDeclarationIfExists (#112428)
Add `Intrinsic::getDeclarationIfExists` to lookup an existing
declaration of an intrinsic in a `Module`.
2024-10-16 07:21:10 -07:00
Howard Roark
e36b22f3bf Revert "[PGO] Preserve analysis results when nothing was instrumented (#93421)"
This reverts commit 23c64beeccc03c6a8329314ecd75864e09bb6d97.
2024-10-16 10:50:48 +03:00
Yuta Saito
d4efc3e097
[Coverage][WebAssembly] Add initial support for WebAssembly/WASI (#111332)
Currently, WebAssembly/WASI target does not provide direct support for
code coverage.
This patch set fixes several issues to unlock the feature. The main
changes are:

1. Port `compiler-rt/lib/profile` to WebAssembly/WASI.
2. Adjust profile metadata sections for Wasm object file format.
- [CodeGen] Emit `__llvm_covmap` and `__llvm_covfun` as custom sections
instead of data segments.
    - [lld] Align the interval space of custom sections at link time.
- [llvm-cov] Copy misaligned custom section data if the start address is
not aligned.
    - [llvm-cov] Read `__llvm_prf_names` from data segments
3. [clang] Link with profile runtime libraries if requested

See each commit message for more details and rationale.
This is part of the effort to add code coverage support in Wasm target
of Swift toolchain.
2024-10-15 02:41:43 +09:00
Pavel Samolysov
23c64beecc
[PGO] Preserve analysis results when nothing was instrumented (#93421)
The `PGOInstrumentationGen` pass should preserve all analysis results
when nothing was actually instrumented. Currently, only modules that
contain at least a single function definition are instrumented. When a
module contains only function declarations and, optionally, global
variable definitions (a module for the regular-LTO phase for thin-LTO
when LTOUnit splitting is enabled, for example), such module is not
instrumented (yet?) and there is no reason to invalidate any analysis
results.

NFC.
2024-10-12 06:29:55 +03:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Youngsuk Kim
f0ed31ce4b
[llvm][PGOCtxProfLowering] Avoid Type::getPointerTo() (NFC) (#111857)
`Type::getPointerTo()` is to be deprecated & removed soon.
2024-10-10 16:02:13 -04:00
Florian Mayer
5f36042508
[NFC] [HWASan] [MTE] factor out threadlong increment (#110340) 2024-10-08 15:53:01 -07:00
davidtrevelyan
4547d6042a
[llvm][rtsan] Add transform pass for sanitize_realtime_unsafe (#109543) 2024-10-03 06:32:21 -07:00
NAKAMURA Takumi
6c331e50e4
[MC/DC] Rework tvbitmap.update to get rid of the inlined function (#110792)
Per the discussion in #102542, it is safe to insert BBs under
`lowerIntrinsics()` since #69535 has made tolerant of modifying BBs.

So, I can get rid of using the inlined function `rmw_or`, introduced in
#96040.
2024-10-03 17:57:03 +09:00
Mingming Liu
34f0edd509
[TypeProf][PGO]Support skipping vtable comparisons for a class and its derived ones (#110575)
Performance critical core libraries could be highly-optimized for arch
or micro-arch features. For instance, the absl crc library specializes
different templated classes among different hardwares [1]. In a
practical setting, it's likely that instrumented profiles are collected
on one type of machine and used to optimize binaries that run on
multiple types of hardwares.

While this kind of specialization is rare in terms of lines of code,
compiler can do a better job to skip vtable-based ICP.
* The per-class `Extend` implementation is arch-specific as well. If an
instrumented profile is collected on one arch and applied to another
arch where `Extend` implementation is different, `Extend` might be
regarded as unlikely function in the latter case. `ABSL_ATTRIBUTE_HOT`
annotation alleviates the problem by putting all `Extend` implementation
into the hot text section [2]

This change introduces a comma-separated list to specify the mangled
vtable names, and ICP pass will skip vtable-based comparison if a vtable
variable definition is shown to be in its class hierarchy (per LLVM type
metadata).

[1]
c6b27359c3/absl/crc/internal/crc_x86_arm_combined.cc (L621-L650)
[2]
c6b27359c3/absl/crc/internal/crc_x86_arm_combined.cc (L370C3-L370C21)
2024-10-02 10:23:54 -07:00
Vitaly Buka
b2180481ec
[hwasan] Consider order of mapping copts (#109621)
Flags "-hwasan-mapping-offset" and
"-hwasan-mapping-offset-dynamic" are mutually
exclusive, use the last one.
2024-09-24 21:11:13 -07:00
Vitaly Buka
4ca4460bae [hwasan] Add "-hwasan-with-frame-record" (#109620)
It should not be implied form mapping settings.
No longer disable frame records for fixed offset.
2024-09-24 19:46:23 -07:00
Vitaly Buka
0673642cab
[hwasan] Replace "-hwasan-with-ifunc" and "-hwasan-with-tls" options (#109619)
Relationship between "-hwasan-mapping-offset",
"-hwasan-with-ifunc", and "-hwasan-with-tls" can
be to hard to understand.

Now we will have "-hwasan-mapping-offset",
presense of which will imply fixed shadow.

If "-hwasan-mapping-offset-dynamic" will set one
of 3 available dynamic shadows.

As-is "-hwasan-mapping-offset" has precedence over
"-hwasan-mapping-offset-dynamic". In follow up
patches we need to use the one with last
occurrence.
2024-09-23 17:13:25 -07:00
Vitaly Buka
083f0fa454
[NFC][hwasan] Remove code duplication in ShadowMapping::init (#109618)
The goal to is to reorder this function to make
initialization in following order:
1. Defaults
2. Target specific overrides
3. Explicit copt<> overrides
2024-09-23 16:55:42 -07:00
Vitaly Buka
8dbb739ffb
[NFC][hwasan] Use enum class in ShadowMapping (#109617) 2024-09-23 15:51:56 -07:00
Vitaly Buka
c9e2c38f2c
[NFC][hwasan] Convert ShadowMapping into class (#109616)
In the next patch we can switch to enum.
2024-09-23 15:34:12 -07:00
Mircea Trofin
783bac7ffb
[ctx_prof] Handle select and its step instrumentation (#109185)
The `step` instrumentation shouldn't be treated, during use, like an `increment`. The latter is treated as a BB ID. The step isn't that, it's more of a type of value profiling. We need to distinguish between the 2 when really looking for BB IDs (==increments), and handle appropriately `step`s. In particular, we need to know when to elide them because `select`s may get elided by function cloning, if the condition of the select is statically known.
2024-09-23 15:21:25 -07:00
Nikita Popov
ecb98f9fed [IRBuilder] Remove uses of CreateGlobalStringPtr() (NFC)
Since the migration to opaque pointers, CreateGlobalStringPtr()
is the same as CreateGlobalString(). Normalize to the latter.
2024-09-23 16:30:50 +02:00
Vitaly Buka
10266279c3 [NFC][hwasan] Add a few of {} 2024-09-22 18:12:59 -07:00
Florian Mayer
0cab475d11
[NFC] [HWASan] pull removeFnAttributes into function (#109488) 2024-09-20 20:37:13 -07:00
Florian Mayer
cdf29709d7 [NFC] [HWASan] fix LLVM style guide violations 2024-09-20 16:29:45 -07:00
Youngsuk Kim
d31e314131 [llvm] Don't call raw_string_ostream::flush() (NFC)
Don't call raw_string_ostream::flush(), which is essentially a no-op.
As specified in the docs, raw_string_ostream is always unbuffered.
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
2024-09-20 12:19:59 -05:00
Alex Rønne Petersen
72a218056d
[llvm][Triple] Add Environment members and parsing for glibc/musl parity. (#107664)
This adds support for:

* `muslabin32` (MIPS N32)
* `muslabi64` (MIPS N64)
* `muslf32` (LoongArch ILP32F/LP64F)
* `muslsf` (LoongArch ILP32S/LP64S)

As we start adding glibc/musl cross-compilation support for these
targets in Zig, it would make our life easier if LLVM recognized these
triples. I'm hoping this'll be uncontroversial since the same has
already been done for `musleabi`, `musleabihf`, and `muslx32`.

I intentionally left out a musl equivalent of `gnuf64` (LoongArch
ILP32D/LP64D); my understanding is that Loongson ultimately settled on
simply `gnu` for this much more common case, so there doesn't *seem* to
be a particularly compelling reason to add a `muslf64` that's basically
deprecated on arrival.

Note: I don't have commit access.
2024-09-20 08:53:03 +08:00
Pavel Skripkin
8a34f6dba1
[ASAN] Do not consider alignment during object size calculations (#109120)
It was found that ASAN logic optimizes away out-of-bound access
instrumentation for over-aligned arrays. See #108287 for complete code
examples.

Fix it by not considering alignment during object size calculation,
since out-of-bounds access for over-aligned object is still UB and
should be reported by ASAN.

Closes: #108287
2024-09-19 10:16:28 -07:00
Jay Foad
e03f427196
[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133)
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
2024-09-19 16:16:38 +01:00
Mircea Trofin
12d94850cd [ctx_prof] Avoid llvm::append_range to fix some build bots
Example: https://lab.llvm.org/buildbot/#/builders/169/builds/3381

The CI allowed the `llvm::append_range` instantiation, but
on the other hand it's quite unnecessary here.
2024-09-18 21:19:28 -07:00
Mircea Trofin
ce9209f50e
[ctx_prof] Fix ProfileAnnotator::allTakenPathsExit (#109183)
Added tests to the validator and fixed issues stemming from the previous skipping over BBs with single successors - which is incorrect. That would be now picked by added tests where the assertions are expected to be triggered.
2024-09-18 21:08:34 -07:00
Mircea Trofin
b2d3c315d5
[ctx_prof] Fix checks in PGOCtxprofFlattening (#108467)
The assertion that all out-edges of a BB can't be 0 is incorrect: they
can be, if that branch is on a cold subgraph.

Added validators and asserts about the expected proprerties of the
propagated counters.
2024-09-17 18:19:20 -07:00
Antonio Frighetto
942e872d5b [Instrumentation] Do not request sanitizers for naked functions
Sanitizers instrumentation may be incompatible with naked functions,
which lack of standard prologue/epilogue.
2024-09-17 09:23:39 +02:00
Antonio Frighetto
2ae968a0d9
[Instrumentation] Move out to Utils (NFC) (#108532)
Utility functions have been moved out to Utils. Minor opportunity to
drop the header where not needed.
2024-09-15 21:07:40 -07:00
Mircea Trofin
82266d3a2b
[nfc][ctx_prof] Factor the callsite instrumentation exclusion criteria (#108471)
Reusing this in the logic fetching the instrumentation in `CtxProfAnalysis`.
2024-09-13 21:25:47 -07:00