llvm-project

Author	SHA1	Message	Date
Dean Michael Berris	21d4a1eec7	[XRay][compiler-rt] Avoid InternalAlloc(...) in Profiling Mode Summary: We avoid using dynamic memory allocated with the internal allocator in the profile collection service used by profiling mode. We use aligned storage for globals and in-struct storage of objects we dynamically initialize. We also remove the dependency on `Vector<...>` which also internally uses the dynamic allocator in sanitizer_common (InternalAlloc) in favour of the XRay allocator and segmented array implementation. This change addresses llvm.org/PR38577. Reviewers: eizan Reviewed By: eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50782 llvm-svn: 339978	2018-08-17 01:57:42 +00:00
David Carlier	12be7b7bf7	[Xray] fix c99 warning build about flexible array semantics Reviewers: dberris Reviewed By: dberris Differential Revision: https://reviews.llvm.org/D49590 llvm-svn: 337536	2018-07-20 09:22:22 +00:00
Dean Michael Berris	4719c52455	[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343	2018-07-18 02:08:39 +00:00
Dean Michael Berris	9d6b7a5f2b	[XRay][compiler-rt] Simplify Allocator Implementation Summary: This change simplifies the XRay Allocator implementation to self-manage an mmap'ed memory segment instead of using the internal allocator implementation in sanitizer_common. We've found through benchmarks and profiling these benchmarks in D48879 that using the internal allocator in sanitizer_common introduces a bottleneck on allocating memory through a central spinlock. This change allows thread-local allocators to eliminate contention on the centralized allocator. To get the most benefit from this approach, we also use a managed allocator for the chunk elements used by the segmented array implementation. This gives us the chance to amortize the cost of allocating memory when creating these internal segmented array data structures. We also took the opportunity to remove the preallocation argument from the allocator API, simplifying the usage of the allocator throughout the profiling implementation. In this change we also tweak some of the flag values to reduce the amount of maximum memory we use/need for each thread, when requesting memory through mmap. Depends on D48956. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49217 llvm-svn: 337342	2018-07-18 01:53:39 +00:00
Dean Michael Berris	0dd4f9f22f	[XRay][compiler-rt] xray::Array Freelist and Iterator Updates Summary: We found a bug while working on a benchmark for the profiling mode which manifests as a segmentation fault in the profiling handler's implementation. This change adds unit tests which replicate the issues in isolation. We've tracked this down as a bug in the implementation of the Freelist in the `xray::Array` type. This happens when we trim the array by a number of elements, where we've been incorrectly assigning pointers for the links in the freelist of chunk nodes. We've taken the chance to add more debug-only assertions to the code path and allow us to verify these assumptions in debug builds. In the process, we also took the opportunity to use iterators to implement both `front()` and `back()` which exposes a bug in the iterator decrement operation. In particular, when we decrement past a chunk size boundary, we end up moving too far back and reaching the `SentinelChunk` prematurely. This change unblocks us to allow for contributing the non-crashing version of the benchmarks in the test-suite as well. Reviewers: kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D48653 llvm-svn: 336644	2018-07-10 08:25:44 +00:00
Dean Michael Berris	ca856e07de	[XRay] Fixup: Address some warnings breaking build Follow-up to D45758. llvm-svn: 333625	2018-05-31 04:55:11 +00:00
Dean Michael Berris	238aa1366e	[XRay][compiler-rt] Relocate a DCHECK to the correct location. Fixes a bad DCHECK where the condition being checked is still valid (for iterators pointing to sentinels). Follow-up to D45756. llvm-svn: 332212	2018-05-14 04:21:12 +00:00
Dean Michael Berris	26e81209ef	[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141	2018-04-29 13:46:30 +00:00

8 Commits