Add a new configuration option QuarantineDisabled that allows all of the
quarantine code to be compiled out.
Add new tests that verify that the code is removed properly.
On Android, this saves ~4000 bytes for 32 bit and ~6000 bytes for 64
bit.
On Android, I used some microbenchmarks that do malloc/free in a loop
and for allocations in the primary, the performance is about the same
for both 32 bit and 64 bit. For secondary allocations, I saw ~8% speed
up on 32 bit and ~3% on 64 bit speed up which feels like it could just
be code size improvements.
The current code always unmaps a secondary allocation when MTE is
enabled. Fix this to match the comment, namely only unmap if MTE was
enabled and is no longer enabled after acquiring the lock.
In addition, allow quaratine to work in the secondary even if MTE is not
enabled.
The original version of ResidentMemorySize could be a little flaky.
Replace the test with a version that verifies exactly how much of the
map is resident.
The previous test simply tried to double free the pointer in the
EXPECT_DEATH macro. Unfortunately, the gtest infrastructure can allocate
a pointer that happens to be the previously freed pointer. Thus the free
doesn't fail since the spawned process does not attempt to free all of
the pointers allocated in the original test.
NOTE: Scudo should be checked to make sure that the TSD is not always
returning pointers in the same order they are freed. Although this
appears to be a problem with a program that only does a small number of
allocations.
This allows us to change the number of blocks stored according to the
size of BatchClass.
Also change the name `TransferBatch` to `Batch` given that it's never
the unit of transferring blocks.
Mark as many of the reportXX functions that take pointers const. This
avoid the need to use const_cast when calling these functions on an
already const pointer.
Fix reportHeaderCorruption calls where an argument was passed into an
append call that didn't use them.
In the caching part of the secondary path, when about to try to release
memory to the OS, we always wait while acquiring the lock. However, if
multiple threads are attempting this at the same time, all other threads
will likely do nothing when the release call is made. Change the
algorithm to skip the release if there is another release in process.
Also, pull the lock outside the releaseOlderThan function. This is so
that in the store path, we use the tryLock and skip if another thread is
releasing. But in the path where a forced release call is being made,
that call will wait for release to complete which guarantees that all
entries are released when requested.
Update the error message to be explicit that this is likely due to
memory corruption.
In addition, check if the chunk header is all zero, which could mean
corruption or an attempt to free a pointer after the memory has been
released to the kernel. This case results in a slightly different error
message to also indicate this could still be a double free.
Add an optional flag for the secondary allocator called
`EnableGuardPages` to enable/disable the use of guard pages. By default,
this option is enabled.
Remove all redundant code and create a couple of structs to handle
automatic init and destruction. This replaces the test fixtures in
prepartion for passing in multiple configs for some of these tests. This
is necessary because not all of the gtest features are supported here,
and there is no easy way to create a test fixture with a template.
Change names to all begin with ScudoSecondary and change tests names
appropriately.
Move the cache option test to the cache test fixture.
Force the allocator test to use the no cached config so that all of
the allocations always fully exercise the allocator function and
don't skip this by using a previously cached element.
This is a quick fix for b71c44b9be17dc6295eb733d685b38e797f3c846
"last released" was removed by accident in primary64.h and the update of
"NumReleasesAttempted" was missing.
* Finished the type and size verification
* Remove the TODO for checking if array size can be fit into LinkTy
because if there's a truncation happens, other DCHECK like offset
checking will catch the failure. In addition, it's supposed to be a rare
case.
Fixes bug where a device that supports tagged pointers doesn't use
the tagged pointer when computing the checksum.
Add tests to verify that double frees result in chunk state error
not corrupted header errors.
For the block smaller than a page size, one block is unlikely to
introduce more unused pages (at most 2 if it acrosses the page boundary
and both touched pages are unused). So it's better to apply the
threshold to reduce the time of scanning groups that can't release any
new pages.
We introduce a new strategy to track how many bytes are not released
because of the contraint of release interval. This will change the
`TryReleaseThreshold` adaptively so that we avoid releasing the same
pages multiple times (and wasting time on the case of no pages to
release).
On Android, the number of release attempts decreases 33% (572 to 382)
and the worst case drops from 251 to 33. At the same time, it maintains
almost the same RSS usage (with some improvements as well).
Note that in this CL, this is only applied to non small blocks. We will
bring the strategy to all the size classes later.
If called on an address that is actually not owned, the header tag might not
match. This would cause an MTE fault in Chunk::isValid.
Disable tag checks in isOwned().
Secondary cache entries are now released to the OS from least recent to
most recent entries. This helps to avoid unnecessary scans of the cache
since entries ready to be released (specifically, entries that are
considered old relative to the configurable release interval) will
always be at the tail of the list of committed entries by the LRU
ordering. For this same reason, the `OldestTime` variable is no longer
needed to indicate when releases are necessary so it has been removed.
Currently, only Android supports using a hard-code page size. Make this
a bit more generic so any platform that wants to can use this.
In addition, add a getPageSizeLogCached() function since this value is
used in release.h and can avoid keeping this value around in objects.
Finally, change some of the release.h page size multiplies to shifts
using the new page size log value.
`MaxReleasedCachePages` has been set to 4. Initially, in #105009 , we
set `MaxReleasedCachePages` to 0 so that the partial chunk heuristic
could be introduced incrementally as we observed its impact on retrieval
order and more generally, performance.
Co-authored-by: Joshua Baehring <josh.baehring@yale.edu>
gcc interprets a backslash '\\' as the last char before a new line as a
line continuation character, even in a comment context. This can produce
an "error: multi-line comment [-Werror=comment]".
This PR adds delimiters so that the comment can compile with gcc.
Previously the secondary cache retrieval algorithm would not allow
retrievals of memory chunks where the number of unused bytes would be
greater than than `MaxUnreleasedCachePages * PageSize` bytes. This meant
that even if a memory chunk satisfied the requirements of the optimal
fit algorithm, it may not be returned. This remains true if memory
tagging is enabled. However, if memory tagging is disabled, a new
heuristic has been put in place. Specifically, If a memory chunk is a
non-optimal fit, the cache retrieval algorithm will attempt to release
the excess memory to force a cache hit while keeping RSS down.
In the event that a memory chunk is a non-optimal fit, the retrieval
algorithm will release excess memory as long as the amount of memory to
be released is less than or equal to 4 Pages. If the amount of memory to
be released exceeds 4 Pages, the retrieval algorithm will not consider
that cached memory chunk valid for retrieval.
This change also addresses an alignment issue in a test case submitted
in #104807.
Previously the secondary cache retrieval algorithm would not allow
retrievals of memory chunks where the number of unused bytes would be
greater than than `MaxUnusedCachePages * PageSize` bytes. This meant
that even if a memory chunk satisfied the requirements of the optimal
fit algorithm, it may not be returned. This remains true if memory
tagging is enabled. However, if memory tagging is disabled, a new
heuristic has been put in place. Specifically, If a memory chunk is a
non-optimal fit, the cache retrieval algorithm will attempt to release
the excess memory to force a cache hit while keeping RSS down.
In the event that a memory chunk is a non-optimal fit, the retrieval
algorithm will release excess memory as long as the amount of memory to
be released is less than or equal to 16 KB. If the amount of memory to
be released exceeds 16 KB, the retrieval algorithm will not consider
that cached memory chunk valid for retrieval.
The nodes of list may be managed in an array. Instead of managing them
with pointers, using the array index will save some memory in some
cases.
When using the list linked with index, remember to call init() to set up
the base address of the array and a `EndOfListVal` which is nil of the
list.
Initially, the LRU list stored all mapped entries with no distinction
between the committed (non-madvise()'d) entries and decommitted
(madvise()'d) entries. Now these two types of entries re separated into
two lists, allowing future cache logic to branch depending on whether or
not entries are committed or decommitted. Furthermore, the retrieval
algorithm will prioritize committed entries over decommitted entries.
Specifically, committed entries that satisfy the MaxUnusedCachePages
requirement are retrieved before optimal-fit, decommitted entries.
This commit addresses the compiler errors raised
[here](https://github.com/llvm/llvm-project/pull/100818#issuecomment-2261043261).