vulkan_best_practice_for_mobile_developers

Appropriate use of render pass attachments

Overview

Vulkan render-passes use attachments to describe input and output render targets. This sample shows how loading and storing attachments might affect performance on mobile.

During the creation of a render-pass, you can specify various color attachments and a depth-stencil attachment. Each of those is described by a VkAttachmentDescription struct, which contains attributes to specify the load operation (loadOp) and the store operation (storeOp). This sample lets you choose between different combinations of these operations at runtime.

VkAttachmentDescription desc = {};
desc.loadOp  = VK_ATTACHMENT_LOAD_OP_*;
desc.storeOp = VK_ATTACHMENT_STORE_OP_*;

Color attachment load operation

The sample renders a scene with a render-pass using one color attachment, which is a swapchain image used for presentation. Since we do not need to read its content at the beginning of the pass, it would make sense to use LOAD_OP_DONT_CARE in order to avoid spending time loading it.

If we do not draw on the entire framebuffer, the frame might show random colors on the areas we do not draw on. In addition, it would show pixels drawn during previous frames. The solution consists in using LOAD_OP_CLEAR to clear the content of the framebuffer using a user-specified color.

VkAttachmentDescription color_desc = {};
color_desc.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;

// Remember to set the clear value when beginning the render pass
VkClearValue clear = {};
clear.color = {0.5f, 0.5f, 0.5f, 1.0f};

VkRenderPassBeginInfo begin = {};
begin.clearValueCount = 1;
begin.pClearValues    = &clear;

Using the LOAD_OP_LOAD flag is the wrong choice in this case. Not only do we not use its content during this render-pass, it will cost us more in terms of bandwidth.

Below is a screenshot showing a scene rendered using LOAD_OP_LOAD. We can estimate the bandwidth cost of loading/storing an uncompressed attachment as width * height * bpp/8 * FPS [MiB/s]. In this case we get an estimate of 2220 * 1080 * (32/8) * 61.7 = 591 MiB/s.

Using LOAD_OP_LOAD

Comparing the read bandwidth values, we observe a difference of 5099.5 - 4453.8 = 645 MiB/s if we select LOAD_OP_CLEAR:

Using LOAD_OP_CLEAR

The savings will be lower if the images are compressed, see Enabling AFBC in your Vulkan Application.

Depth attachment store operation

The render-pass also uses a depth attachment. In case we need to use it in a second render-pass, the right operation to set would be STORE_OP_STORE, because choosing STORE_OP_DONT_CARE means that the second render-pass will potentially load the wrong values. The sample does not have a second render-pass, therefore there is no need to store the depth attachment.

VkAttachmentDescription depth_desc = {};
depth_desc.storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;

It is worth noticing that we can create a depth image with the LAZILY_ALLOCATED memory property, which means that it will be allocated by the GPU only if we use it.

VmaAllocationCreateInfo depth_alloc = {};
depth_alloc.preferredFlags = VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT;

Using LOAD_OP_CLEAR and STORE_OP_DONT_CARE

In this case the write transactions were reduced by 769.6 - 239.5 = 530 MiB/s, again what we would roughly expect from storing the size of an uncompressed image at ~60 FPS.

Streamline

The streamline trace shows us a more in-depth analysis of what is going on in the GPU. The delta between LOAD_OP_LOAD and LOAD_OP_CLEAR is evident at 10.4s having consistently less external reads. The delta between STORE_OP_STORE and STORE_OP_DONT_CARE is clear at 18.1s with the external write graphs plunging down.

Depth image usage

Beyond setting the depth image usage bit to specify that it can be used as a DEPTH_STENCIL_ATTACHMENT, we can set the TRANSIENT_ATTACHMENT bit to tell the GPU that it can be used as a transient attachment which will only live for the duration of a single render-pass. Then if this is backed by LAZILY_ALLOCATED memory it will not even need physical storage.

VkImageCreateInfo depth_info = {VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO};
depth_info.usage = VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT;

Best-practice summary

Do

Don’t

Impact

Debugging