Summary: There were several gaps that caused the allocator not to work under NVIDIA's independent thread scheduling model. The problems (I know of) are fixed in this commit. Generally this required using correct masks, synchronizing before a few dependent operations, and overhauling the allocate function to stick with the existing mask instead of querying it. The general idiom here is that at the start we obtain a single mask and opportunistically use it. Every use must specifically sync this subset. I.e. query a single time and never change it. This passes most tests, however I have encountered two issues. 1. A bug in `nvlink` failing to link symbols called in 'free' 2. A deadlock under heavy divergence caused by IPSCCP altering control flow. I will address these later, but for now this makes the *source* correct so it can be enabled by anyone else if they need it.
LLVM libc ========= This directory and its subdirectories contain source code for llvm-libc, a retargetable implementation of the C standard library. LLVM is open source software. You may freely distribute it under the terms of the license agreement found in LICENSE.txt.