Interleaved allocations and deallocations of many objects of varying size can cause fragmentation over time, which can lead to a situation where the library is unable to find a continuous range of free memory for a new allocation despite there is enough free space, just scattered across many small free ranges between existing allocations.

To mitigate this problem, you can use defragmentation feature: structure VmaDefragmentationInfo2, function vmaDefragmentationBegin(), vmaDefragmentationEnd(). Given set of allocations, this function can move them to compact used memory, ensure more continuous free space and possibly also free some VkDeviceMemory blocks.

What the defragmentation does is:

Updates VmaAllocation objects to point to new VkDeviceMemory and offset. After allocation has been moved, its VmaAllocationInfo::deviceMemory and/or VmaAllocationInfo::offset changes. You must query them again using vmaGetAllocationInfo() if you need them.
Moves actual data in memory.

What it doesn't do, so you need to do it yourself:

Recreate buffers and images that were bound to allocations that were defragmented and bind them with their new places in memory. You must use vkDestroyBuffer(), vkDestroyImage(), vkCreateBuffer(), vkCreateImage() for that purpose and NOT vmaDestroyBuffer(), vmaDestroyImage(), vmaCreateBuffer(), vmaCreateImage(), because you don't need to destroy or create allocation objects!
Recreate views and update descriptors that point to these buffers and images.

Defragmenting CPU memory

Following example demonstrates how you can run defragmentation on CPU. Only allocations created in memory types that are HOST_VISIBLE can be defragmented. Others are ignored.

The way it works is:

It temporarily maps entire memory blocks when necessary.
It moves data using memmove() function.

// Given following variables already initialized:
VkDevice device;
VmaAllocator allocator;
std::vector<VkBuffer> buffers;
std::vector<VmaAllocation> allocations;
const uint32_t allocCount = (uint32_t)allocations.size();
std::vector<VkBool32> allocationsChanged(allocCount);
VmaDefragmentationInfo2 defragInfo = {};
defragInfo.allocationCount = allocCount;
defragInfo.pAllocations = allocations.data();
defragInfo.pAllocationsChanged = allocationsChanged.data();
defragInfo.maxCpuBytesToMove = VK_WHOLE_SIZE; // No limit.
defragInfo.maxCpuAllocationsToMove = UINT32_MAX; // No limit.
VmaDefragmentationContext defragCtx;
vmaDefragmentationBegin(allocator, &defragInfo, nullptr, &defragCtx);
vmaDefragmentationEnd(allocator, defragCtx);
for(uint32_t i = 0; i < allocCount; ++i)
{
    if(allocationsChanged[i])
    {
        // Destroy buffer that is immutably bound to memory region which is no longer valid.
        vkDestroyBuffer(device, buffers[i], nullptr);
        // Create new buffer with same parameters.
        VkBufferCreateInfo bufferInfo = ...;
        vkCreateBuffer(device, &bufferInfo, nullptr, &buffers[i]);
            
        // You can make dummy call to vkGetBufferMemoryRequirements here to silence validation layer warning.
            
        // Bind new buffer to new memory region. Data contained in it is already moved.
        VmaAllocationInfo allocInfo;
        vmaGetAllocationInfo(allocator, allocations[i], &allocInfo);
        vkBindBufferMemory(device, buffers[i], allocInfo.deviceMemory, allocInfo.offset);
    }
}

Filling VmaDefragmentationInfo2::pAllocationsChanged is optional. This output array tells whether particular allocation in VmaDefragmentationInfo2::pAllocations at the same index has been modified during defragmentation. You can pass null, but you then need to query every allocation passed to defragmentation for new parameters using vmaGetAllocationInfo() if you might need to recreate and rebind a buffer or image associated with it.

If you use Custom memory pools, you can fill VmaDefragmentationInfo2::poolCount and VmaDefragmentationInfo2::pPools instead of VmaDefragmentationInfo2::allocationCount and VmaDefragmentationInfo2::pAllocations to defragment all allocations in given pools. You cannot use VmaDefragmentationInfo2::pAllocationsChanged in that case. You can also combine both methods.

Defragmenting GPU memory

It is also possible to defragment allocations created in memory types that are not HOST_VISIBLE. To do that, you need to pass a command buffer that meets requirements as described in VmaDefragmentationInfo2::commandBuffer. The way it works is:

It creates temporary buffers and binds them to entire memory blocks when necessary.
It issues vkCmdCopyBuffer() to passed command buffer.

Example:

// Given following variables already initialized:
VkDevice device;
VmaAllocator allocator;
VkCommandBuffer commandBuffer;
std::vector<VkBuffer> buffers;
std::vector<VmaAllocation> allocations;
const uint32_t allocCount = (uint32_t)allocations.size();
std::vector<VkBool32> allocationsChanged(allocCount);
VkCommandBufferBeginInfo cmdBufBeginInfo = ...;
vkBeginCommandBuffer(commandBuffer, &cmdBufBeginInfo);
VmaDefragmentationInfo2 defragInfo = {};
defragInfo.allocationCount = allocCount;
defragInfo.pAllocations = allocations.data();
defragInfo.pAllocationsChanged = allocationsChanged.data();
defragInfo.maxGpuBytesToMove = VK_WHOLE_SIZE; // Notice it's "GPU" this time.
defragInfo.maxGpuAllocationsToMove = UINT32_MAX; // Notice it's "GPU" this time.
defragInfo.commandBuffer = commandBuffer;
VmaDefragmentationContext defragCtx;
vmaDefragmentationBegin(allocator, &defragInfo, nullptr, &defragCtx);
vkEndCommandBuffer(commandBuffer);
// Submit commandBuffer.
// Wait for a fence that ensures commandBuffer execution finished.
vmaDefragmentationEnd(allocator, defragCtx);
for(uint32_t i = 0; i < allocCount; ++i)
{
    if(allocationsChanged[i])
    {
        // Destroy buffer that is immutably bound to memory region which is no longer valid.
        vkDestroyBuffer(device, buffers[i], nullptr);
        // Create new buffer with same parameters.
        VkBufferCreateInfo bufferInfo = ...;
        vkCreateBuffer(device, &bufferInfo, nullptr, &buffers[i]);
            
        // You can make dummy call to vkGetBufferMemoryRequirements here to silence validation layer warning.
            
        // Bind new buffer to new memory region. Data contained in it is already moved.
        VmaAllocationInfo allocInfo;
        vmaGetAllocationInfo(allocator, allocations[i], &allocInfo);
        vkBindBufferMemory(device, buffers[i], allocInfo.deviceMemory, allocInfo.offset);
    }
}

You can combine these two methods by specifying non-zero maxGpu* as well as maxCpu* parameters. The library automatically chooses best method to defragment each memory pool.

You may try not to block your entire program to wait until defragmentation finishes, but do it in the background, as long as you carefully fullfill requirements described in function vmaDefragmentationBegin().

Additional notes

While using defragmentation, you may experience validation layer warnings, which you just need to ignore. See Validation layer warnings.

If you defragment allocations bound to images, these images should be created with VK_IMAGE_CREATE_ALIAS_BIT flag, to make sure that new image created with same parameters and pointing to data copied to another memory region will interpret its contents consistently. Otherwise you may experience corrupted data on some implementations, e.g. due to different pixel swizzling used internally by the graphics driver.

If you defragment allocations bound to images, new images to be bound to new memory region after defragmentation should be created with VK_IMAGE_LAYOUT_PREINITIALIZED and then transitioned to their original layout from before defragmentation using an image memory barrier.

Please don't expect memory to be fully compacted after defragmentation. Algorithms inside are based on some heuristics that try to maximize number of Vulkan memory blocks to make totally empty to release them, as well as to maximimze continuous empty space inside remaining blocks, while minimizing the number and size of allocations that need to be moved. Some fragmentation may still remain - this is normal.

Writing custom defragmentation algorithm

If you want to implement your own, custom defragmentation algorithm, there is infrastructure prepared for that, but it is not exposed through the library API - you need to hack its source code. Here are steps needed to do this:

Main thing you need to do is to define your own class derived from base abstract class VmaDefragmentationAlgorithm and implement your version of its pure virtual method. See definition and comments of this class for details.
Your code needs to interact with device memory block metadata. If you need more access to its data than it's provided by its public interface, declare your new class as a friend class e.g. in class VmaBlockMetadata_Generic.
If you want to create a flag that would enable your algorithm or pass some additional flags to configure it, define some enum with such flags and use them in VmaDefragmentationInfo2::flags.
Modify function VmaBlockVectorDefragmentationContext::Begin to create object of your new class whenever needed.