2926 Commits

Author SHA1 Message Date
Bjorn Pettersson
a20f7efbc5 Remove several no longer needed includes. NFCI
Mostly removing includes of InitializePasses.h and Pass.h in
passes that no longer has support for the legacy PM.
2023-04-17 13:54:19 +02:00
Kazu Hirata
c83c4b58d1 [Transforms] Apply fixes from performance-for-range-copy (NFC) 2023-04-16 08:25:28 -07:00
Kazu Hirata
4bac5f8344 Apply fixes from performance-faster-string-find (NFC) 2023-04-16 00:51:27 -07:00
Nikita Popov
62ef97e063 [llvm-c] Remove PassRegistry and initialization APIs
Remove C APIs for interacting with PassRegistry and pass
initialization. These are legacy PM concepts, and are no longer
relevant for the new pass manager.

Calls to these initialization functions can simply be dropped.

Differential Revision: https://reviews.llvm.org/D145043
2023-04-14 12:12:48 +02:00
Ellis Hoag
244be0b0de [InstrProf] Temporal Profiling
As described in [0], this extends IRPGO to support //Temporal Profiling//.

When `-pgo-temporal-instrumentation` is used we add the `llvm.instrprof.timestamp()` intrinsic to the entry of functions which in turn gets lowered to a call to the compiler-rt function `INSTR_PROF_PROFILE_SET_TIMESTAMP()`. A new field in the `llvm_prf_cnts` section stores each function's timestamp. Then in `llvm-profdata merge` we convert these function timestamps into a //trace// and add it to the indexed profile.

Since these traces could significantly increase the profile size, we've added `-max-temporal-profile-trace-length` and `-temporal-profile-trace-reservoir-size` to limit the length of a trace and the number of traces in a profile, respectively.

In a future diff we plan to use these traces to construct an optimized function order to reduce the number of page faults during startup.

Special thanks to Julian Mestre for helping with reservoir sampling.

[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068

Reviewed By: snehasish

Differential Revision: https://reviews.llvm.org/D147287
2023-04-11 08:30:52 -07:00
Evgenii Stepanov
e0f7ef4b9c [msan] Fix handling of ParamTLS overflow.
Ironically, MSan copies uninitialized data off the stack into
VAArgTLSCopy in the callee-side handling of va_start. Clamp the copy
size to the actual length of the buffer, and zero-initialize the
remainder.

Differential Revision: https://reviews.llvm.org/D146858
2023-04-04 13:52:09 -07:00
Ellis Hoag
167e8f8b6b [InstrProf] Minimal Block Coverage
This diff implements minimal block coverage instrumentation. When the `-pgo-block-coverage` option is used, basic blocks will be instrumented for block coverage using single byte booleans. The coverage of some basic blocks can be inferred from others, so not every basic block is instrumented. In fact, we found that only ~60% of basic blocks need to be instrumented. These differences lead to less size overhead when compared to instrumenting block counts. For example, block coverage on the clang binary has an overhead of 20 Mi (17%) compared to 56 Mi (47%) with block counts.

Even though block coverage profiles have less precision than block count profiles, they can still be used to guide optimizations. In `PGOUseFunc` we use block coverage to populate edge weights such that BFI gives nonzero counts to only covered blocks. We do this by 1) setting the entry count of covered functions to a large value, i.e., 10000 and 2) populating edge weights using block coverage. In the next diff https://reviews.llvm.org/D125743 we use BFI to guide the machine outliner to avoid outlining covered blocks. This `-pgo-block-coverage` option provides a trade off of generating less precise profiles for faster and smaller instrumented binaries.

The `BlockCoverageInference` class defines the algorithm to find the minimal set of basic blocks that need to be instrumented for coverage. This is different from the Kirchhoff circuit law optimization that is used for edge **counts** because that does not work for block **coverage**. The reason for this is that edge counts can be added together to find a missing count while block coverage cannot since they store boolean values. So we need a new algorithm to find which blocks must be instrumented.

The details on this algorithm can be found in this paper titled "Minimum Coverage Instrumentation": https://arxiv.org/abs/2208.13907

Special thanks to Julian Mestre for creating this block coverage inference algorithm.

Binary size of `clang` using `-O2`:

* Base
  * `.text`: 65.8 Mi
  * Total: 119 Mi
* IRPGO (`-fprofile-generate -mllvm -disable-vp -mllvm -debug-info-correlate`)
  * `.text`: 93.0 Mi
  * `__llvm_prf_cnts`: 14.5 Mi
  * Total: 175 Mi
* Minimal Block Coverage (`-fprofile-generate -mllvm -disable-vp -mllvm -debug-info-correlate -mllvm -pgo-block-coverage`)
  * `.text`: 82.1 Mi
  * `__llvm_prf_cnts`: 1.38 Mi
  * Total: 139 Mi

Reviewed By: spupyrev, kyulee

Differential Revision: https://reviews.llvm.org/D124490
2023-03-29 16:24:20 -07:00
Krzysztof Drewniak
916425b2d1 [llvm] Use pointer index type for more GEP offsets (pre-codegen)
Many uses of getIntPtrType() were using that type to calculate the
neened type for GEP offset arguments. However, some time ago,
DataLayout was extended to support pointers where the size of the
pointer is not equal to the size of the values used to index it.

Much code was already migrated to, for example, use getIndexSizeInBits
instead of getPtrSizeInBits, but some rewrites still used
getIntPtrType() to get the type for GEP offsets.

This commit changes uses of getIntPtrType() to getIndexType() where
they are involved in a GEP-related calculation.

In at least one case (bounds check insertion) this resolves a compiler
crash that the new test added here would previously trigger.

This commit does not impact
- C library-related rewriting (memcpy()), which are operating under
the assumption that intptr_t == size_t. While all the mechanisms for
breaking this assumption now exist, doing so is outside the scope of
this commit.
- Code generation and below. Note that the use of getIntPtrType() in
CodeGenPrepare will be changed in a future commit.
- Usage of getIntPtrType() in any backend

Depends on D143435

Reviewed By: arichardson

Differential Revision: https://reviews.llvm.org/D143437
2023-03-28 16:41:02 +00:00
Philip Reames
2cfd06ba67 [BoundsChecking] Don't crash on scalable vector sizes 2023-03-23 08:53:41 -07:00
Philip Reames
5eb9acf9be [HWASAN] Instrument scalable load/store without crashing
We can simply push them down the existing call slowpath with some minor changes to how we compute the size argument.
2023-03-23 08:24:30 -07:00
Philip Reames
5bcb4c4da9 [MSAN] Support load and stores of scalable vector types
This adds support for scalable vector types - at least far enough to get basic load and store cases working. It turns out that load/store without origin tracking already worked; I apparently got that working with one of the pre patches to use TypeSize utilities and didn't notice. The code changes here are required to enable origin tracking.

For origin tracking, a 4 byte value - the origin - is broadcast into a shadow region whose size exactly matches the type being accessed. This origin is only written if the shadow value is non-zero. The details of how shadow is computed from the original value being stored aren't relevant for this patch.

The code changes involve two related primitives.

First, we need to be able to perform that broadcast into a scalable sized memory region. This requires the use of a loop, and appropriate bound. The fixed size case optimizes with larger stores and alignment; I did not bother with that for the scalable case for now. We can optimize this codepath later if desired.

Second, we need a way to test if the shadow is zero. The mechanism for this in the code is to convert the shadow value into a scalar, and then zero check that. There's an assumption that this scalar is zero exactly when all elements of the shadow value are zero. As a result, we use an OR reduction on the scalable vector. This is analogous to how e.g. an array is handled. I landed a bunch of cleanup changes to remove other direct uses of the scalar conversion to convince myself there were no other undocumented invariants.

Differential Revision: https://reviews.llvm.org/D146157
2023-03-23 07:37:56 -07:00
Arthur Eubanks
60ebe901eb [HWAsan] Fix returned PreservedAnalyses
Initialization modifies the module.
2023-03-16 09:42:09 -07:00
Kazu Hirata
398af9b43b [llvm] Use *{Map,Set}::contains (NFC) 2023-03-15 18:06:32 -07:00
Philip Reames
e3dac9e93f [MSAN] Replace another open-coded convertToBool instance [nfc]
Note that getCleanShadow always returns Constant::getNullValue so the prior code is equivalent to convertToBool.
2023-03-15 10:24:49 -07:00
Philip Reames
75e22e8699 [MSAN] Inline getShadowTyNoVec into convertShadowToScalar [nfc]
This is an implementation detail of the flattening scheme, so hide it in the implementation thereof.  This does require one caller to go through the appropriate utility, but doing that makes the code cleaner anyways.
2023-03-15 10:24:49 -07:00
Nikita Popov
77c90ebeb7 [ASAN] Use AI.getAllocationSize() helper (NFC) 2023-03-15 16:51:52 +01:00
Philip Reames
de71056a7d [ASAN] Initial support memory checks on scalable vector typed allocas
This patch adjusts the memory instrumentation to account for scalable vector types in allocas. Note that we don't allow scalable vector globals, so we don't need to update that codepath.

A couple points.

First, this simply disables the optimization for scalable allocas. We can revisit this in the future, but it requires a bit of plumbing to get scalable object sizes through the visitor to be useful.

Second, I am simply disabling stack poisoning for scalable vector allocas. This is mostly for staging the change as I can't write a working test for memory instrumentation without doing so. I don't think it's unreasonable to do on it's own basis as without the bailout, we crash the compiler.

Differential Revision: https://reviews.llvm.org/D145259
2023-03-15 07:59:42 -07:00
Philip Reames
a98ac8ea55 [MSAN] Minor refactor to reduce future diff [nfc] 2023-03-14 13:18:24 -07:00
Philip Reames
9227f286ac Move utility for acting on each lane of ElementCount to common code [nfc]
This was first written for AddressSanitizer, but I'm about to reuse it for MemorySanitizer as well.
2023-03-14 10:38:02 -07:00
Philip Reames
5d2ddb129c [ASAN] Extract out a helper routine for foreach lane on vectors [nfc]
The new API matches a case we also need in MSAN.  For the moment, I'm staging this as a local-to-ASAN commit, but I expect to move this to a shared location and reuse in the next day or two.
2023-03-13 15:06:58 -07:00
Philip Reames
991e573046 [MSAN] Use TypeSize and related utilities [nfc-ish]
This is part of prework for supporting scalable vector types.  This isn't NFC because it shifts the point of failure (i.e. which assert triggers first), but should be NFC for all non-scalable vector inputs.
2023-03-13 14:10:37 -07:00
Philip Reames
a835577269 [MSAN] Remove usage of FixedVectorType where trivial [nfc]
This is a prepass on generalizing for scalable vectors; I'm just picking off the easy bits.
2023-03-13 13:18:37 -07:00
Philip Reames
dae682ce92 [IRBuilder] Add utilities for materializing scalable values [nfc]
These idioms already appear a number of places in code, and upcoming changes to the various sanitizers continue to need more instances of the same patterns.

Differential Revision: https://reviews.llvm.org/D145945
2023-03-13 11:54:19 -07:00
Philip Reames
368cb421c3 [ASAN] Support memory checks on scalable vector typed masked load and store
This takes the approach of using the loop based formation for scalable vectors only. We could potentially use the loop form for fixed vectors only, but we'd loose the unroll and specialize on constant vector logic which is already present. I don't have a strong opinion on whether the existing logic is worthwhile, I kept it mostly to minimize test churn.

Worth noting is that there is a better lowering available. The plain vector lowering appears to check only the first and last byte. By analogy, we should be able to check only the first active and last active byte in the masked op. This is a more invasive change to asan, and I decided simply supporting scalable vectors at all was a better starting place.

Differential Revision: https://reviews.llvm.org/D145198
2023-03-10 16:20:25 -08:00
Philip Reames
c24d44fd86 [ASAN] Address a style issue noticed during review of D145175 [nfc] 2023-03-09 08:50:59 -08:00
Philip Reames
32188047fc [ASAN] Support memory checks on scalable vector typed loads and stores
This only covers the common load/store case. There will be further patches required for masked load/store and some of the fast-path optimization cases.

Differential Revision: https://reviews.llvm.org/D145175
2023-03-09 07:55:58 -08:00
Marco Elver
61ed64954b [SanitizerBinaryMetadata] Do not add to GPU code
SanitizerBinaryMetadata should only apply to to host code, and not GPU
code. Recently AMD GPU target code has experimental sanitizer support.

If we're compiling a mixed host/device source file, only add sanitizer
metadata to host code.

Differential Revision: https://reviews.llvm.org/D145519
2023-03-09 10:15:28 +01:00
Kazu Hirata
912404db78 [ControlHeightReduction] Freeze potentially poisonous conditions
This patch freezes potentially poisonous conditions in conditional
branches so that we do not "move up" conditional branches
"br i1 poison".

Differential Revision: https://reviews.llvm.org/D145008
2023-03-07 13:20:21 -08:00
Nikita Popov
ffe8f47d72 [IR] Add operator<< overload for CmpInst::Predicate (NFC)
I regularly try and fail to use this while debugging.
2023-03-07 15:10:56 +01:00
Philip Reames
67dab19453 [ASAN] Rename TypeSize to TypeStoreSize [mostly NFC]
This is a mechanical prep change for scalable vector support.  All it does is move the point of TypeSize to unsigned (i.e. the unsafe cast) closer to point of use.
2023-03-02 10:47:28 -08:00
Philip Reames
45b6a33b51 [ASAN] Use TypeSize in InterestingMemoryOperand [mostly NFC]
This is a mechanical prep change for scalable vector support.  All it does is move the point of TypeSize to unsigned (i.e. the unsafe cast) closer to point of use.
2023-03-02 07:57:40 -08:00
Kazu Hirata
368c9f8813 Revert "[ControlHeightReduction] Don't combine a "poison" branch"
This reverts commit 38a64aab4a3fbaaeb383638ff654247902796556.

llvm-clang-x86_64-expensive-checks-debian is failing:

https://lab.llvm.org/buildbot/#/builders/16/builds/44249
2023-02-28 17:58:45 -08:00
Kazu Hirata
38a64aab4a [ControlHeightReduction] Don't combine a "poison" branch
Without this patch, the control height reduction pass would combine a
"poison" branch with an earlier well-defined branch, turning the
earlier branch into a "poison" branch also.

This patch fixes the problem by rejecting "poison" conditional
branches.

Differential Revision: https://reviews.llvm.org/D145008
2023-02-28 16:13:30 -08:00
Nikita Popov
26202a57e5 [CGProfile] Don't fetch BFI without profile (NFCI)
Don't fetch BFI if the function has no entry count. Peculiarly,
the implementation was already doing this for the (no longer
existing) legacy PM implementation, but the same principle applies
to the new pass manager. The only reason why the new PM doesn't
have LazyBFI is that with the new pass manager all passes are
lazy.

This improves compile-time for non-PGO builds.
2023-02-28 15:23:07 +01:00
Nikita Popov
b82083b32b [CGProfile] Remove unnecessary analysis callbacks (NFC)
These were used to abstract between NewPM and LegacyPM. Now that
the LegacyPM implementation is gone, we can fetch the analyses
directly from the FAM.
2023-02-28 14:49:38 +01:00
Nikita Popov
43b725f834 [CHR] Do not fetch BFI without profile summary (NFCI)
Do not compute BFI if PGO is not used. This addresses the
compile-time regression from https://reviews.llvm.org/D144769.
2023-02-28 11:31:29 +01:00
Matthew Voss
4364e2429c [NFC][PGO] Prefix duplicate profile MemOp entry diagnostic with 'warning:'
Adding this prefix will indicate clearly that the compiler doesn't exit
when it hits this diagnostic. Searches for other non-fatal diagnostics
will also be able to find this diagnostic easily.
2023-02-27 14:04:27 -08:00
Rong Xu
666731660c [Pass][CHR] Move ControlHeightReduction to module optimization pipeline
This is a modified version of commit b374423304a8 by
Arthur (https://reviews.llvm.org/D143424).

Here we invoke to the pass independent of PGOOPT. We now check if the
profile is available through the program summary. This ensures CHR is
called in distributed ThinLTO BE compilation (where PGOOPT might not
be created).

Differential Revision: https://reviews.llvm.org/D144769
2023-02-27 11:47:54 -08:00
Qiongsi Wu
401f768061 [PGO] Setting ValueProfNode Array's Alignment
`instrprof` currently does not set `__llvm_prf_vnds`'s alignment after creating it. The consequence is that the alignment is set to 16 later (c0f3ac1d00/llvm/lib/IR/DataLayout.cpp (L1019)). This can lead to undefined behaviour when we calculate `NumVNodes` in `lprofGetLoadModuleSignature` (c0f3ac1d00/compiler-rt/lib/profile/InstrProfilingMerge.c (L32)). The reason is that when the `__llvm_prf_vnds` array is 16 byte aligned, `__llvm_profile_end_vnodes() - __llvm_profile_begin_vnodes()` may not be a multiple of the size of ValueProfNode (which is 24, 20 on 32 bit targets).

This patch sets `__llvm_prf_vnds`'s alignment to its ABI alignment, which always divides its size. Then `__llvm_profile_end_vnodes() - __llvm_profile_begin_vnodes()` will be a multiple of `sizeof(ValueProfNode)`.

Reviewed By: w2yehia, MaskRay

Differential Revision: https://reviews.llvm.org/D144302
2023-02-23 19:07:29 -05:00
Liren Peng
529ee9750b [NFC] Use single quotes for single char output during printPipline
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144365
2023-02-22 02:35:13 +00:00
Kazu Hirata
f8f3db2756 Use APInt::count{l,r}_{zero,one} (NFC) 2023-02-19 22:04:47 -08:00
Marco Elver
421215b919 [SanitizerBinaryMetadata] Support ignore list
For large projects it will be required to opt out entire subdirectories.
In the absence of fine-grained control over the flags passed via the
build system, introduce -fexperimental-sanitize-metadata-ignorelist=.

The format is identical to other sanitizer ignore lists, and its effect
will be to simply not instrument either functions or entire modules
based on the rules in the ignore list file.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D143664
2023-02-10 10:25:48 +01:00
Marco Elver
8c469d1693 [SanitizerBinaryMetadata] Make constructors/destructors hidden
By switching them to external with default visibility, DSOs may not call
their own constructor/destructor. This is incorrect, because they pass
different parameters.

Fix it by marking the ctors/dtors as external linkage but with hidden
visibility.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D143611
2023-02-09 00:46:28 +01:00
Marco Elver
bf9814b705 [SanitizerBinaryMetadata] Emit constants as ULEB128
Emit all constant integers produced by SanitizerBinaryMetadata as
ULEB128 to further reduce binary space used. Increasing the version is
not necessary given this change depends on (and will land) along with
the bump to v2.

To support this, the !pcsections metadata format is extended to allow
for per-section options, encoded in the first MD operator which must
always be a string and contain the section: "<section>!<options>".

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D143484
2023-02-08 13:12:34 +01:00
Marco Elver
3d53b52730 [SanitizerBinaryMetadata] Optimize used space for features and UAR stack args
Optimize the encoding of "covered" metadata by:

 1. Reducing feature mask from 4 bytes to 1 byte (needs increase once we
    reach more than 8 features).

 2. Only emitting UAR stack args size if it is non-zero, saving 4 bytes
    in the common case.

One caveat is that the emitted metadata for function PC (offset), size,
and UAR size (if enabled) are no longer aligned to 4 bytes.

SanitizerBinaryMetadata version base is increased to 2, since the change
is backwards incompatible.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D143482
2023-02-08 13:12:33 +01:00
Fangrui Song
6ce8e716bf [SanitizerBinaryMetadata] Make module_[cd]tor external
If a COMDAT key has a local linkage, it behaves as `comdat nodeduplicate` and
llvm/lib/Linker/LinkModules.cpp does not deduplicate its members.
This is not intended. Switch to an external linkage to allow deduplication.

See also https://maskray.me/blog/2021-07-25-comdat-and-section-group#grp_comdat

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D143530
2023-02-08 00:58:39 -08:00
Archibald Elliott
62c7f035b4 [NFC][TargetParser] Remove llvm/ADT/Triple.h
I also ran `git clang-format` to get the headers in the right order for
the new location, which has changed the order of other headers in two
files.
2023-02-07 12:39:46 +00:00
Ilya Leoshkevich
322e150e33 [MSan] Fix calling pointers to varargs functions on SystemZ
VarArgSystemZHelper.visitCallBase() checks whether the callee has the
"use-soft-float" attribute, but if the callee is a function pointer, a
null pointer dereference happens.

Fix by checking this attribute on the current function. Alternatively,
one could try the callee first, but this is pointless, since one should
not be mixing hardfloat and softfloat code anyway.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D143296
2023-02-06 23:35:13 +01:00
Marco Elver
960b4c3b5d [SanitizerBinaryMetadata] Treat constant globals and non-escaping addresses specially
For atomics metadata, we can make data race analysis more efficient by
entirely ignoring functions that include memory accesses but which only
access non-escaping (non-shared) and/or non-mutable memory. Such
functions will not be considered to be covered by "atomics" metadata,
resulting in the following benefits:

  1. reduces "covered" metadata; and
  2. allows data race analysis to skip such functions.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D143159
2023-02-03 15:35:24 +01:00
Steven Wu
516e301752 [NFC][Profile] Access profile through VirtualFileSystem
Make the access to profile data going through virtual file system so the
inputs can be remapped. In the context of the caching, it can make sure
we capture the inputs and provided an immutable input as profile data.

Reviewed By: akyrtzi, benlangmuir

Differential Revision: https://reviews.llvm.org/D139052
2023-02-01 09:25:02 -08:00