Add a new configuration option QuarantineDisabled that allows all of the
quarantine code to be compiled out.
Add new tests that verify that the code is removed properly.
On Android, this saves ~4000 bytes for 32 bit and ~6000 bytes for 64
bit.
On Android, I used some microbenchmarks that do malloc/free in a loop
and for allocations in the primary, the performance is about the same
for both 32 bit and 64 bit. For secondary allocations, I saw ~8% speed
up on 32 bit and ~3% on 64 bit speed up which feels like it could just
be code size improvements.
The current code always unmaps a secondary allocation when MTE is
enabled. Fix this to match the comment, namely only unmap if MTE was
enabled and is no longer enabled after acquiring the lock.
In addition, allow quaratine to work in the secondary even if MTE is not
enabled.
In the caching part of the secondary path, when about to try to release
memory to the OS, we always wait while acquiring the lock. However, if
multiple threads are attempting this at the same time, all other threads
will likely do nothing when the release call is made. Change the
algorithm to skip the release if there is another release in process.
Also, pull the lock outside the releaseOlderThan function. This is so
that in the store path, we use the tryLock and skip if another thread is
releasing. But in the path where a forced release call is being made,
that call will wait for release to complete which guarantees that all
entries are released when requested.
Add an optional flag for the secondary allocator called
`EnableGuardPages` to enable/disable the use of guard pages. By default,
this option is enabled.
Secondary cache entries are now released to the OS from least recent to
most recent entries. This helps to avoid unnecessary scans of the cache
since entries ready to be released (specifically, entries that are
considered old relative to the configurable release interval) will
always be at the tail of the list of committed entries by the LRU
ordering. For this same reason, the `OldestTime` variable is no longer
needed to indicate when releases are necessary so it has been removed.
`MaxReleasedCachePages` has been set to 4. Initially, in #105009 , we
set `MaxReleasedCachePages` to 0 so that the partial chunk heuristic
could be introduced incrementally as we observed its impact on retrieval
order and more generally, performance.
Co-authored-by: Joshua Baehring <josh.baehring@yale.edu>
gcc interprets a backslash '\\' as the last char before a new line as a
line continuation character, even in a comment context. This can produce
an "error: multi-line comment [-Werror=comment]".
This PR adds delimiters so that the comment can compile with gcc.
Previously the secondary cache retrieval algorithm would not allow
retrievals of memory chunks where the number of unused bytes would be
greater than than `MaxUnreleasedCachePages * PageSize` bytes. This meant
that even if a memory chunk satisfied the requirements of the optimal
fit algorithm, it may not be returned. This remains true if memory
tagging is enabled. However, if memory tagging is disabled, a new
heuristic has been put in place. Specifically, If a memory chunk is a
non-optimal fit, the cache retrieval algorithm will attempt to release
the excess memory to force a cache hit while keeping RSS down.
In the event that a memory chunk is a non-optimal fit, the retrieval
algorithm will release excess memory as long as the amount of memory to
be released is less than or equal to 4 Pages. If the amount of memory to
be released exceeds 4 Pages, the retrieval algorithm will not consider
that cached memory chunk valid for retrieval.
This change also addresses an alignment issue in a test case submitted
in #104807.
Previously the secondary cache retrieval algorithm would not allow
retrievals of memory chunks where the number of unused bytes would be
greater than than `MaxUnusedCachePages * PageSize` bytes. This meant
that even if a memory chunk satisfied the requirements of the optimal
fit algorithm, it may not be returned. This remains true if memory
tagging is enabled. However, if memory tagging is disabled, a new
heuristic has been put in place. Specifically, If a memory chunk is a
non-optimal fit, the cache retrieval algorithm will attempt to release
the excess memory to force a cache hit while keeping RSS down.
In the event that a memory chunk is a non-optimal fit, the retrieval
algorithm will release excess memory as long as the amount of memory to
be released is less than or equal to 16 KB. If the amount of memory to
be released exceeds 16 KB, the retrieval algorithm will not consider
that cached memory chunk valid for retrieval.
Initially, the LRU list stored all mapped entries with no distinction
between the committed (non-madvise()'d) entries and decommitted
(madvise()'d) entries. Now these two types of entries re separated into
two lists, allowing future cache logic to branch depending on whether or
not entries are committed or decommitted. Furthermore, the retrieval
algorithm will prioritize committed entries over decommitted entries.
Specifically, committed entries that satisfy the MaxUnusedCachePages
requirement are retrieved before optimal-fit, decommitted entries.
This commit addresses the compiler errors raised
[here](https://github.com/llvm/llvm-project/pull/100818#issuecomment-2261043261).
The test fixture simplifies some of the logic for allocations and
mmap-based allocations
are separated from the cache to allow for more direct cache tests.
Additionally, a couple
of end to end tests for the cache and the LRU algorithm are added.
Initially, the LRU list stored all mapped entries with no distinction
between the committed (non-madvise()'d) entries and decommitted
(madvise()'d) entries. Now these two types of entries are separated into
two lists, allowing future cache logic to branch depending on whether or
not entries are committed or decommitted. Furthermore, the retrieval
algorithm will prioritize committed entries over decommitted entries.
Specifically, valid-fit, committed entries (not necessarily optimal-fit)
are retrieved before optimal-fit, decommitted entries.
The logic for emptying the cache now follows an LRU eviction policy.
When the cache is full on any given free operation, the oldest entry in
the cache is evicted, and the memory associated with that cache entry is
unmapped.
Finding empty cache entries is now a constant operation with the use of
a stack of available cache entries.
Through the LRU structure, the cache retrieval algorithm now only
iterates through valid entries of the cache. Furthermore, the retrieval
algorithm will first search cache entries that have not been decommitted
(i.e. madvise() has not been called on their corresponding memory
chunks) to reduce the likelihood of returning a memory chunk to the user
that would induce a page fault.
Initially, the scudo allocator would return an error if the user
attempted to set the cache capacity
(i.e. the number of possible entries in the cache) above the maximum
cache capacity.
Now the allocator will resort to using the maximum cache capacity in
this event.
An error will still be returned if the user attempts to set the number
of entries to a negative value.
Instead of explicitly disabling a feature by declaring the variable and
set it to false, this change supports the optional flags. I.e., you can
skip certain flags if you are not using it.
This optional feature supports both forms,
1. Value: A parameter for a feature. E.g., EnableRandomOffset
2. Type: A C++ type implementing a feature. E.g., ConditionVariableT
On the other hand, to access the flags will be through one of the
wrappers, BaseConfig/PrimaryConfig/SecondaryConfig/CacheConfig
(CacheConfig is embedded in SecondaryConfig). These wrappers have the
getters to access the value and the type. When adding a new feature, we
need to add it to `allocator_config.def` and mark the new variable with
either *_REQUIRED_* or *_OPTIONAL_* macro so that the accessor will be
generated properly.
In addition, also remove the need of `UseConditionVariable` to flip
on/off of condition variable. Now we only need to define the type of
condition variable.
Don't use multiple tagged pages at the beginning of an allocation, since
it prevents using such allocations for memrefs, and mappings aren't
reused anyway since Trusty uses MapAllocatorNoCache.
Upstreamed from https://r.android.com/2537251.
Co-authored-by: Marco Nelissen <marcone@google.com>
All of the code assumes that when the pages are released, the entry is
zero'd, so use the correct function. On most systems, this does not
change anything.
Added a variable that tracks the FragmentedBytes in the secondary cache.
Updates this information after successful retrieves and stores. Dumps
info in getStats().
Reviewed By: Chia-hungDuan
Differential Revision: https://reviews.llvm.org/D157527
changed cache retrieve algorithm to an "optimal-fit" which immediate
returns blocks that are less than 110% of the requested size. This
reduces memory waste while still allowing for an early return without
traversing the entire array of cached blocks
Reviewed By: cferris, Chia-hungDuan
Differential Revision: https://reviews.llvm.org/D157155
Keeps track of CallsToRetrieve, how many SuccessfulRetrieves, from
cached block allocations. Dumps this data in the
MapAllocatorCache::getStats() function
Reviewed By: cferris, Chia-hungDuan
Differential Revision: https://reviews.llvm.org/D157154
Cached block may have nearly aligned address for a certain alignment so
that we don't have to round up the size in advance. The rounding should
only happen at determing the availability of a cached block.
Reviewed By: cferris
Differential Revision: https://reviews.llvm.org/D156582
RoundedSize is supposed to be used on directly mapping. To determine if
a cached block is feasible, the size doesn't need to be rounded in
advance. As a result, the use of RoundedSize may miss some chance of
using cached blocks.
This reverts commit 4c6b8bb87b3452d0bcef83cd0ea712d8426603b8.
Reviewed By: frs513
Differential Revision: https://reviews.llvm.org/D156583
Modify all places that use the Options structure to be a const
reference. The underlying structure is a u32 so making it a
reference doesn't really do anything. However, if the structure
changes in the future it already works and avoids future coders
wondering why a structure is being passed by value. This also
makes it clear that the Options should not be modified in those functions.
Reviewed By: Chia-hungDuan
Differential Revision: https://reviews.llvm.org/D156372
made checking for invalid cache entries and setting invalid cache
entries more implicit and clear.
Reviewed By: cferris
Differential Revision: https://reviews.llvm.org/D155983
Split cache::retrieve() into separate functions. One that retrieves
the cached block and another that sets the header and MTE environment.
These were split so that the retrieve function could be more easily
changed in the future and so that the retrieve function had the sole
purpose of retrieving a CachedBlock.
Reviewed By: cferris
Differential Revision: https://reviews.llvm.org/D155660
Exclude cached blocks with invalid start address. Mainly concerned with
cached blocks that are still available/unused.
Reviewed By: Chia-hungDuan, cferris
Differential Revision: https://reviews.llvm.org/D154148
When I moved the primary to use the faster get time syscall, I missed
the secondary use. Now fix the secondary to use this function too.
Reviewed By: Chia-hungDuan
Differential Revision: https://reviews.llvm.org/D154012
Dumped some basic info about what is being cached and about the cache
itself. Output of test below:
...
Stats: MapAllocatorCache: EntriesCount: 33, MaxEntriesCount: 64, MaxEntrySize: 1048576
StartBlockAddress: 0x6d342c1000, EndBlockAddress: 0x6d342d2000, BlockSize: 69632
StartBlockAddress: 0x6fc45ff000, EndBlockAddress: 0x6fc462f000, BlockSize: 196608
...
Reviewed By: Chia-hungDuan
Differential Revision: https://reviews.llvm.org/D153483
This patches an error flaged by Fuchsia builds e.g.
https://ci.chromium.org/ui/p/turquoise/builders/global.try/core.x64-asan/b8779376650819379137/overview)
```
build failed:
[87176/332302](525) CXX user.libc_x64-asan-ubsan/obj/zircon/system/ulib/c/scudo/gwp-asan-info.gwp_asan_info.cc.o
FAILED: user.libc_x64-asan-ubsan/obj/zircon/system/ulib/c/scudo/gwp-asan-info.gwp_asan_info.cc.o
../../prebuilt/third_party/python3/linux-x64/bin/python3.8 -S ../../build/rbe/cxx_remote_wrapper.py --exec_strategy=remote_local_fallback -- ../../prebuilt/third_party/clang/linux-x64/bin/clang++ -MD -MF user.libc_x64-asan-ubsan/obj/zircon/system/ulib/c/scudo/gwp-asan-info.gwp_asan_info.cc.o.d -o user.libc_x64-asan-ubsan/obj/zircon/system/ulib/c/scudo/gwp-asan-info.gwp_asan_info.cc.o -D_LIBCPP...
In file included from ../../zircon/system/ulib/c/scudo/gwp_asan_info.cc:7:
In file included from ../../third_party/scudo/src/allocator_config.h:12:
In file included from ../../third_party/scudo/src/combined.h:22:
../../third_party/scudo/src/secondary.h:67:13: error: 'static' function 'unmap' declared in header file should be declared 'static inline' [-Werror,-Wunneeded-internal-declaration]
static void unmap(LargeBlock::Header *H) {
^
1 error generated.
```
Differential Revision: https://reviews.llvm.org/D152038
To define custom allocation, you only need to put the configuration in
custom_scudo_config.h and define two required aliases, then you will be
switched to the customized config and the tests will also run with your
configuration.
In this CL, we also have a minor refactor the structure of
configuration. Now the essential fields are put under the associated
hierarchy and which will make the defining new configuration easier.
Reviewed By: cferris
Differential Revision: https://reviews.llvm.org/D150481