When developing a graphics-intensive game or program, it is important to avoid allocating more GPU memory than it's physically available. When the memory is over-committed, various bad things can happen, depending on the specific GPU, graphics driver, and operating system:

It may just work without any problems.
The application may slow down because some memory blocks are moved to system RAM and the GPU has to access them through PCI Express bus.
A new allocation may take very long time to complete, even few seconds, and possibly freeze entire system.
The new allocation may fail with VK_ERROR_OUT_OF_DEVICE_MEMORY.
It may even result in GPU crash (TDR), observed as VK_ERROR_DEVICE_LOST returned somewhere later.

Querying for budget

To query for current memory usage and available budget, use function vmaGetBudget(). Returned structure VmaBudget contains quantities expressed in bytes, per Vulkan memory heap.

Please note that this function returns different information and works faster than vmaCalculateStats(). vmaGetBudget() can be called every frame or even before every allocation, while vmaCalculateStats() is intended to be used rarely, only to obtain statistical information, e.g. for debugging purposes.

It is recommended to use VK_EXT_memory_budget device extension to obtain information about the budget from Vulkan device. VMA is able to use this extension automatically. When not enabled, the allocator behaves same way, but then it estimates current usage and available budget based on its internal information and Vulkan memory heap sizes, which may be less precise. In order to use this extension:

Make sure extensions VK_EXT_memory_budget and VK_KHR_get_physical_device_properties2 required by it are available and enable them. Please note that the first is a device extension and the second is instance extension!
Use flag VMA_ALLOCATOR_CREATE_EXT_MEMORY_BUDGET_BIT when creating VmaAllocator object.
Make sure to call vmaSetCurrentFrameIndex() every frame. Budget is queried from Vulkan inside of it to avoid overhead of querying it with every allocation.

Controlling memory usage

There are many ways in which you can try to stay within the budget.

First, when making new allocation requires allocating a new memory block, the library tries not to exceed the budget automatically. If a block with default recommended size (e.g. 256 MB) would go over budget, a smaller block is allocated, possibly even dedicated memory for just this resource.

If the size of the requested resource plus current memory usage is more than the budget, by default the library still tries to create it, leaving it to the Vulkan implementation whether the allocation succeeds or fails. You can change this behavior by using VMA_ALLOCATION_CREATE_WITHIN_BUDGET_BIT flag. With it, the allocation is not made if it would exceed the budget or if the budget is already exceeded. Some other allocations become lost instead to make room for it, if the mechanism of lost allocations is used. If that is not possible, the allocation fails with VK_ERROR_OUT_OF_DEVICE_MEMORY. Example usage pattern may be to pass the VMA_ALLOCATION_CREATE_WITHIN_BUDGET_BIT flag when creating resources that are not essential for the application (e.g. the texture of a specific object) and not to pass it when creating critically important resources (e.g. render targets).

Finally, you can also use VMA_ALLOCATION_CREATE_NEVER_ALLOCATE_BIT flag to make sure a new allocation is created only when it fits inside one of the existing memory blocks. If it would require to allocate a new block, if fails instead with VK_ERROR_OUT_OF_DEVICE_MEMORY. This also ensures that the function call is very fast because it never goes to Vulkan to obtain a new block.

Please note that creating Custom memory pools with VmaPoolCreateInfo::minBlockCount set to more than 0 will try to allocate memory blocks without checking whether they fit within budget.