8 Commits

Author SHA1 Message Date
Dean Michael Berris
21d4a1eec7 [XRay][compiler-rt] Avoid InternalAlloc(...) in Profiling Mode
Summary:
We avoid using dynamic memory allocated with the internal allocator in
the profile collection service used by profiling mode. We use aligned
storage for globals and in-struct storage of objects we dynamically
initialize.

We also remove the dependency on `Vector<...>` which also internally
uses the dynamic allocator in sanitizer_common (InternalAlloc) in favour
of the XRay allocator and segmented array implementation.

This change addresses llvm.org/PR38577.

Reviewers: eizan

Reviewed By: eizan

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50782

llvm-svn: 339978
2018-08-17 01:57:42 +00:00
David Carlier
12be7b7bf7 [Xray] fix c99 warning build about flexible array semantics
Reviewers: dberris

Reviewed By: dberris

Differential Revision: https://reviews.llvm.org/D49590

llvm-svn: 337536
2018-07-20 09:22:22 +00:00
Dean Michael Berris
4719c52455 [XRay][compiler-rt] Segmented Array: Simplify and Optimise
Summary:
This is a follow-on to D49217 which simplifies and optimises the
implementation of the segmented array. In this patch we co-locate the
book-keeping for segments in the `__xray::Array<T>` with the data it's
managing. We take the chance in this patch to actually rename `Chunk` to
`Segment` to better align with the high-level description of the
segmented array.

With measurements using benchmarks landed in D48879, we've identified
that calls to `pthread_getspecific` started dominating the cycles, which
led us to revert the change made in D49217 to use C++ thread_local
initialisation instead (it reduces the cost by a huge margin, since we
save one PLT-based call to pthread functions in the hot path). In
particular, this is in `__xray::getThreadLocalData()`.

We also took the opportunity to remove the least-common-multiple based
calculation and instead pack as much data into segments of the array.
This greatly simplifies the API of the container which hides as much of
the implementation details as possible. For instance, we calculate the
number of elements we need for the each segment internally in the Array
instead of making it part of the type.

With the changes here, we're able to get a measurable improvement on the
performance of profiling mode on top of what D48879 already provides.

Depends on D48879.

Reviewers: kpw, eizan

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49363

llvm-svn: 337343
2018-07-18 02:08:39 +00:00
Dean Michael Berris
9d6b7a5f2b [XRay][compiler-rt] Simplify Allocator Implementation
Summary:
This change simplifies the XRay Allocator implementation to self-manage
an mmap'ed memory segment instead of using the internal allocator
implementation in sanitizer_common.

We've found through benchmarks and profiling these benchmarks in D48879
that using the internal allocator in sanitizer_common introduces a
bottleneck on allocating memory through a central spinlock. This change
allows thread-local allocators to eliminate contention on the
centralized allocator.

To get the most benefit from this approach, we also use a managed
allocator for the chunk elements used by the segmented array
implementation. This gives us the chance to amortize the cost of
allocating memory when creating these internal segmented array data
structures.

We also took the opportunity to remove the preallocation argument from
the allocator API, simplifying the usage of the allocator throughout the
profiling implementation.

In this change we also tweak some of the flag values to reduce the
amount of maximum memory we use/need for each thread, when requesting
memory through mmap.

Depends on D48956.

Reviewers: kpw, eizan

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49217

llvm-svn: 337342
2018-07-18 01:53:39 +00:00
Dean Michael Berris
0dd4f9f22f [XRay][compiler-rt] xray::Array Freelist and Iterator Updates
Summary:
We found a bug while working on a benchmark for the profiling mode which
manifests as a segmentation fault in the profiling handler's
implementation. This change adds unit tests which replicate the
issues in isolation.

We've tracked this down as a bug in the implementation of the Freelist
in the `xray::Array` type. This happens when we trim the array by a
number of elements, where we've been incorrectly assigning pointers for
the links in the freelist of chunk nodes. We've taken the chance to add
more debug-only assertions to the code path and allow us to verify these
assumptions in debug builds.

In the process, we also took the opportunity to use iterators to
implement both `front()` and `back()` which exposes a bug in the
iterator decrement operation.  In particular, when we decrement past a
chunk size boundary, we end up moving too far back and reaching the
`SentinelChunk` prematurely.

This change unblocks us to allow for contributing the non-crashing
version of the benchmarks in the test-suite as well.

Reviewers: kpw

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D48653

llvm-svn: 336644
2018-07-10 08:25:44 +00:00
Dean Michael Berris
ca856e07de [XRay] Fixup: Address some warnings breaking build
Follow-up to D45758.

llvm-svn: 333625
2018-05-31 04:55:11 +00:00
Dean Michael Berris
238aa1366e [XRay][compiler-rt] Relocate a DCHECK to the correct location.
Fixes a bad DCHECK where the condition being checked is still valid (for
iterators pointing to sentinels).

Follow-up to D45756.

llvm-svn: 332212
2018-05-14 04:21:12 +00:00
Dean Michael Berris
26e81209ef [XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.

Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.

Key features of the `Allocator` type:

*  It uses cache-aligned blocks, intended to host the actual data. These
   blocks are cache-line-size multiples of contiguous bytes.

*  The `Allocator` has a maximum memory budget, set at construction
   time. This allows us to cap the amount of data each specific
   `Allocator` instance is responsible for.

*  Upon destruction, the `Allocator` will clean up the storage it's
   used, handing it back to the internal allocator used in
   sanitizer_common.

Key features of the `Array` type:

*  Each segmented array is always backed by an `Allocator`, which is
   either user-provided or uses a global allocator.

*  When an `Array` grows, it grows by appending a segment that's
   fixed-sized. The size of each segment is computed by the number of
   elements of type `T` that can fit into cache line multiples.

*  An `Array` does not return memory to the `Allocator`, but it can keep
   track of the current number of "live" objects it stores.

*  When an `Array` is destroyed, it will not return memory to the
   `Allocator`. Users should clean up the `Allocator` independently of
   the `Array`.

*  The `Array` type keeps a freelist of the chunks it's used before, so
   that trimming and growing will re-use previously allocated chunks.

These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.

Reviewers: echristo, pelikan, kpw

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D45756

llvm-svn: 331141
2018-04-29 13:46:30 +00:00