Vulkan Memory Allocator
|
Version 2.0.0-alpha.3 (2017-09-12)
Source repository: VulkanMemoryAllocator project on GitHub
Product page: Vulkan Memory Allocator on GPUOpen
Documentation of members grouped: Modules
Documentation of all members: vk_mem_alloc.h
In your project code:
#define VMA_IMPLEMENTATION #include "vk_mem_alloc.h"
At program startup:
VkPhysicalDevice
and VkDevice
object.VmaAllocator
object by calling vmaCreateAllocator().VmaAllocatorCreateInfo allocatorInfo = {}; allocatorInfo.physicalDevice = physicalDevice; allocatorInfo.device = device; VmaAllocator allocator; vmaCreateAllocator(&allocatorInfo, &allocator);
When you want to create a buffer or image:
VkBufferCreateInfo
/ VkImageCreateInfo
structure.VkBuffer
/VkImage
with memory already allocated and bound to it.VkBufferCreateInfo bufferInfo = { VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO }; bufferInfo.size = 65536; bufferInfo.usage = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT; VmaAllocationCreateInfo allocInfo = {}; allocInfo.usage = VMA_MEMORY_USAGE_GPU_ONLY; VkBuffer buffer; VmaAllocation allocation; vmaCreateBuffer(allocator, &bufferInfo, &allocInfo, &buffer, &allocation, nullptr);
Don't forget to destroy your objects when no longer needed:
vmaDestroyBuffer(allocator, buffer, allocation); vmaDestroyAllocator(allocator);
If you need to map memory on host, it may happen that two allocations are assigned to the same VkDeviceMemory
block, so if you map them both at the same time, it will cause error because mapping single memory block multiple times is illegal in Vulkan.
It is safer, more convenient and more efficient to use special feature designed for that: persistently mapped memory. Allocations made with VMA_ALLOCATION_CREATE_PERSISTENT_MAP_BIT
flag set in VmaAllocationCreateInfo::flags are returned from device memory blocks that stay mapped all the time, so you can just access CPU pointer to it. VmaAllocationInfo::pMappedData pointer is already offseted to the beginning of particular allocation. Example:
VkBufferCreateInfo bufCreateInfo = { VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO }; bufCreateInfo.size = 1024; bufCreateInfo.usage = VK_BUFFER_USAGE_TRANSFER_SRC_BIT; VmaAllocationCreateInfo allocCreateInfo = {}; allocCreateInfo.usage = VMA_MEMORY_USAGE_CPU_ONLY; allocCreateInfo.flags = VMA_ALLOCATION_CREATE_PERSISTENT_MAP_BIT; VkBuffer buf; VmaAllocation alloc; VmaAllocationInfo allocInfo; vmaCreateBuffer(allocator, &bufCreateInfo, &allocCreateInfo, &buf, &alloc, &allocInfo); .// Buffer is immediately mapped. You can access its memory. memcpy(allocInfo.pMappedData, myData, 1024);
Memory in Vulkan doesn't need to be unmapped before using it e.g. for transfers, but if you are not sure whether it's HOST_COHERENT
(here is surely is because it's created with VMA_MEMORY_USAGE_CPU_ONLY
), you should check it. If it's not, you should call vkInvalidateMappedMemoryRanges()
before reading and vkFlushMappedMemoryRanges()
after writing to mapped memory on CPU. Example:
VkMemoryPropertyFlags memFlags; vmaGetMemoryTypeProperties(allocator, allocInfo.memoryType, &memFlags); if((memFlags & VK_MEMORY_PROPERTY_HOST_COHERENT_BIT) == 0) { VkMappedMemoryRange memRange = { VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE }; memRange.memory = allocInfo.deviceMemory; memRange.offset = allocInfo.offset; memRange.size = allocInfo.size; vkFlushMappedMemoryRanges(device, 1, &memRange); }
On AMD GPUs on Windows, Vulkan memory from the type that has both DEVICE_LOCAL
and HOST_VISIBLE
flags should not be mapped for the time of any call to vkQueueSubmit()
or vkQueuePresent()
. Although legal, that would cause performance degradation because WDDM migrates such memory to system RAM. To ensure this, you can unmap all persistently mapped memory using just one function call. For details, see function vmaUnmapPersistentlyMappedMemory(), vmaMapPersistentlyMappedMemory().
The library automatically creates and manages default memory pool for each memory type available on the device. A pool contains a number of VkDeviceMemory
blocks. You can create custom pool and allocate memory out of it. It can be useful if you want to:
To use custom memory pools:
VmaPool
handle.Example:
.// Create a pool that could have at most 2 blocks, 128 MB each. VmaPoolCreateInfo poolCreateInfo = {}; poolCreateInfo.memoryTypeIndex = ... poolCreateInfo.blockSize = 128ull * 1024 * 1024; poolCreateInfo.maxBlockCount = 2; VmaPool pool; vmaCreatePool(allocator, &poolCreateInfo, &pool); .// Allocate a buffer out of it. VkBufferCreateInfo bufCreateInfo = { VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO }; bufCreateInfo.size = 1024; bufCreateInfo.usage = VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT; VmaAllocationCreateInfo allocCreateInfo = {}; allocCreateInfo.pool = pool; VkBuffer buf; VmaAllocation alloc; VmaAllocationInfo allocInfo; vmaCreateBuffer(allocator, &bufCreateInfo, &allocCreateInfo, &buf, &alloc, &allocInfo);
You have to free all allocations made from this pool before destroying it.
vmaDestroyBuffer(allocator, buf, alloc); vmaDestroyPool(allocator, pool);
Interleaved allocations and deallocations of many objects of varying size can cause fragmentation, which can lead to a situation where the library is unable to find a continuous range of free memory for a new allocation despite there is enough free space, just scattered across many small free ranges between existing allocations.
To mitigate this problem, you can use vmaDefragment(). Given set of allocations, this function can move them to compact used memory, ensure more continuous free space and possibly also free some VkDeviceMemory
. It can work only on allocations made from memory type that is HOST_VISIBLE
. Allocations are modified to point to the new VkDeviceMemory
and offset. Data in this memory is also memmove
-ed to the new place. However, if you have images or buffers bound to these allocations (and you certainly do), you need to destroy, recreate, and bind them to the new place in memory.
For further details and example code, see documentation of function vmaDefragment().
If your game oversubscribes video memory, if may work OK in previous-generation graphics APIs (DirectX 9, 10, 11, OpenGL) because resources are automatically paged to system RAM. In Vulkan you can't do it because when you run out of memory, an allocation just fails. If you have more data (e.g. textures) that can fit into VRAM and you don't need it all at once, you may want to upload them to GPU on demand and "push out" ones that are not used for a long time to make room for the new ones, effectively using VRAM (or a cartain memory pool) as a form of cache. Vulkan Memory Allocator can help you with that by supporting a concept of "lost allocations".
To create an allocation that can become lost, include VMA_ALLOCATION_CREATE_CAN_BECOME_LOST_BIT
flag in VmaAllocationCreateInfo::flags. Before using a buffer or image bound to such allocation in every new frame, you need to query it if it's not lost. To check it: call vmaGetAllocationInfo() and see if VmaAllocationInfo::deviceMemory is not VK_NULL_HANDLE
. If the allocation is lost, you should not use it or buffer/image bound to it. You mustn't forget to destroy this allocation and this buffer/image.
To create an allocation that can make some other allocations lost to make room for it, use VMA_ALLOCATION_CREATE_CAN_MAKE_OTHER_LOST_BIT
flag. You will usually use both flags VMA_ALLOCATION_CREATE_CAN_MAKE_OTHER_LOST_BIT
and VMA_ALLOCATION_CREATE_CAN_BECOME_LOST_BIT
at the same time.
Warning! Current implementation uses quite naive, brute force algorithm, which can make allocation calls that use VMA_ALLOCATION_CREATE_CAN_MAKE_OTHER_LOST_BIT
flag quite slow. A new, more optimal algorithm and data structure to speed this up is planned for the future.
When interleaving creation of new allocations with usage of existing ones, how do you make sure that an allocation won't become lost while it's used in the current frame?
It is ensured because vmaGetAllocationInfo() not only returns allocation parameters and checks whether it's not lost, but when it's not, it also atomically marks it as used in the current frame, which makes it impossible to become lost in that frame. It uses lockless algorithm, so it works fast and doesn't involve locking any internal mutex.
What if my allocation may still be in use by the GPU when it's rendering a previous frame while I already submit new frame on the CPU?
You can make sure that allocations "touched" by vmaGetAllocationInfo() will not become lost for a number of additional frames back from the current one by specifying this number as VmaAllocatorCreateInfo::frameInUseCount (for default memory pool) and VmaPoolCreateInfo::frameInUseCount (for custom pool).
How do you inform the library when new frame starts?
You need to call function vmaSetCurrentFrameIndex().
Example code:
struct MyBuffer { VkBuffer m_Buf = nullptr; VmaAllocation m_Alloc = nullptr; .// Called when the buffer is really needed in the current frame. void EnsureBuffer(); }; void MyBuffer::EnsureBuffer() { .// Buffer has been created. if(m_Buf != VK_NULL_HANDLE) { .// Check if its allocation is not lost + mark it as used in current frame. VmaAllocationInfo allocInfo; vmaGetAllocationInfo(allocator, m_Alloc, &allocInfo); if(allocInfo.deviceMemory != VK_NULL_HANDLE) { .// It's all OK - safe to use m_Buf. return; } } .// Buffer not yet exists or lost - destroy and recreate it. vmaDestroyBuffer(allocator, m_Buf, m_Alloc); VkBufferCreateInfo bufCreateInfo = { VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO }; bufCreateInfo.size = 1024; bufCreateInfo.usage = VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT; VmaAllocationCreateInfo allocCreateInfo = {}; allocCreateInfo.usage = VMA_MEMORY_USAGE_GPU_ONLY; allocCreateInfo.flags = VMA_ALLOCATION_CREATE_CAN_BECOME_LOST_BIT | VMA_ALLOCATION_CREATE_CAN_MAKE_OTHER_LOST_BIT; vmaCreateBuffer(allocator, &bufCreateInfo, &allocCreateInfo, &m_Buf, &m_Alloc, nullptr); }
When using lost allocations, you may see some Vulkan validation layer warnings about overlapping regions of memory bound to different kinds of buffers and images. This is still valid as long as you implement proper handling of lost allocations (like in the example above) and don't use them.
The library uses following algorithm for allocation, in order:
VkDeviceMemory
, with preferred block size.VMA_ALLOCATION_CREATE_CAN_MAKE_OTHER_LOST_BIT
flag was specified, try to find space in existing blocks, possilby making some other allocations lost.VkDeviceMemory
for this allocation, just like when you use VMA_ALLOCATION_CREATE_OWN_MEMORY_BIT
.VK_ERROR_OUT_OF_DEVICE_MEMORY
.Please check "CONFIGURATION SECTION" in the code to find macros that you can define before each include of this file or change directly in this file to provide your own implementation of basic facilities like assert, min()
and max()
functions, mutex etc. C++ STL is used by default, but changing these allows you to get rid of any STL usage if you want, as many game developers tend to do.
The library uses Vulkan functions straight from the vulkan.h
header by default. If you want to provide your own pointers to these functions, e.g. fetched using vkGetInstanceProcAddr()
and vkGetDeviceProcAddr()
:
VMA_STATIC_VULKAN_FUNCTIONS 0
.If you use custom allocator for CPU memory rather than default operator new
and delete
from C++, you can make this library using your allocator as well by filling optional member VmaAllocatorCreateInfo::pAllocationCallbacks. These functions will be passed to Vulkan, as well as used by the library itself to make any CPU-side allocations.
The library makes calls to vkAllocateMemory()
and vkFreeMemory()
internally. You can setup callbacks to be informed about these calls, e.g. for the purpose of gathering some statistics. To do it, fill optional member VmaAllocatorCreateInfo::pDeviceMemoryCallbacks.
If you want to test how your program behaves with limited amount of Vulkan device memory available (without switching your graphics card to one that really has smaller VRAM), you can use a feature of this library intended for this purpose. To do it, fill optional member VmaAllocatorCreateInfo::pHeapSizeLimit.
VmaAllocator
objects can be used independently.VmaAllocator
as first parameter are safe to call from multiple threads simultaneously because they are synchronized internally when needed.VMA_ALLOCATOR_EXTERNALLY_SYNCHRONIZED_BIT
flag, calls to functions that take such VmaAllocator
object must be synchronized externally.VmaAllocation
object must be externally synchronized. For example, you must not call vmaGetAllocationInfo() and vmaDefragment() from different threads at the same time if you pass the same VmaAllocation
object to these functions.