This document will give you an introduction to efficiently use multisampling in Vulkan.
Multisampling in Vulkan with resolve attachments
Same quad, without multisampling
- Note
- The source for this sample can be found in samples/multisampling in the SDK.
Introduction
For this sample, we will look at how we can efficiently implement multisampled anti-aliasing (MSAA) on Mali GPUs the most efficient way. There are two main approaches we can choose from, where one alternative is dramatically better than the other.
We will base the sample on Rotating Texture so we can focus on the differences from rendering without MSAA to rendering with MSAA.
Rendering to Multisampled Texture, Resolving Later (slow)
The traditional way of doing multisampling is to first create a multisampled texture, render to it, then have an explicit "resolve" step. This is highly inefficient. To implement this, the GPU needs to write out a full 4xMSAA buffer, which is four times the size of a regular texture, then read it back to the GPU in order to resolve the final pixel values.
vkCmdResolveImage(cmd, srcImage, dstImage, ...);
It is highly recommended to avoid this path on Mali.
Resolving a transient multisampled texture to non-multisampled texture (optimal)
Vulkan exposes a fast path which takes full advantage of tiled architectures. On Mali, we can obtain 4xMSAA practically "free" (typically 1-2 % speed hit) by making use of resolve attachments in Vulkan.
Setting up the VkRenderpass
For multisampled rendering, we need to change how we set up our render pass. We will need two attachments, one multisampled texture, and one without.
VkAttachmentDescription attachments[2] = { { 0 } };
attachments[0].format = format;
attachments[0].samples = VK_SAMPLE_COUNT_4_BIT;
attachments[0].loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
attachments[0].storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
attachments[0].stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
attachments[0].stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
attachments[0].initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
attachments[0].finalLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
attachments[1].format = format;
attachments[1].samples = VK_SAMPLE_COUNT_1_BIT;
attachments[1].loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
attachments[1].storeOp = VK_ATTACHMENT_STORE_OP_STORE;
attachments[1].stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
attachments[1].stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
attachments[1].initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
attachments[1].finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
It is critically important that we set up the storeOp correctly for the multisampled attachment. After resolve, there is no need why we should ever want to keep the multisampled data, so we set it to STORE_OP_DONT_CARE. This allows the driver to only keep the multisampled buffer on-tile instead of in main memory.
Now, we specify our subpass, we only have one subpass, but it will have two attachments. One color buffer, and one resolve buffer.
VkAttachmentReference colorRef = { 0, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL };
VkAttachmentReference resolveRef = { 1, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL };
VkSubpassDescription subpass = { 0 };
subpass.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
subpass.colorAttachmentCount = 1;
subpass.pColorAttachments = &colorRef;
subpass.pResolveAttachments = &resolveRef;
VkRenderPassCreateInfo rpInfo = { VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO };
rpInfo.attachmentCount = 2;
rpInfo.pAttachments = attachments;
rpInfo.subpassCount = 1;
rpInfo.pSubpasses = &subpass;
VK_CHECK(vkCreateRenderPass(pContext->getDevice(), &rpInfo, nullptr, &renderPass));
Setting up the VkPipeline
In the VkPipeline, there aren't many changes. We need to specify that we are rendering with 4x multisampling.
VkPipelineMultisampleStateCreateInfo multisample = { VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO };
multisample.rasterizationSamples = VK_SAMPLE_COUNT_4_BIT;
multisample.sampleShadingEnable = false;
multisample.alphaToCoverageEnable = false;
multisample.alphaToOneEnable = false;
Setting up the VkFramebuffers
When creating the framebuffers, we just have to include our multisampled texture. Note that multisampledRenderTarget comes first since we specified that attachment 0 was multisampled in VkRenderpass.
VkFramebufferCreateInfo fbInfo = { VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO };
fbInfo.renderPass = renderPass;
fbInfo.attachmentCount = 2;
const VkImageView attachments[] = { multisampledRenderTarget.view, backbuffer.view };
fbInfo.pAttachments = attachments;
fbInfo.width = width;
fbInfo.height = height;
fbInfo.layers = 1;
VK_CHECK(vkCreateFramebuffer(device, &fbInfo, nullptr, &backbuffer.framebuffer));
Creating a Transient, Lazily Allocated Texture
We know that we will never actually need to write to the multisampled texture. It will only live as a temporary entity while executing the render pass.
We can express this by using TRANSIENT_ATTACHMENT_BIT.
VkImageCreateInfo info = { VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO };
info.imageType = VK_IMAGE_TYPE_2D;
info.format = format;
info.extent.width = width;
info.extent.height = height;
info.extent.depth = 1;
info.mipLevels = 1;
info.arrayLayers = 1;
info.samples = VK_SAMPLE_COUNT_4_BIT;
info.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
info.tiling = VK_IMAGE_TILING_OPTIMAL;
info.usage = VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT;
info.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
VkImage image;
VkDeviceMemory memory;
VK_CHECK(vkCreateImage(device, &info, nullptr, &image));
When we allocate memory for this texture, we can choose a lazy allocation which only actually allocated memory for the texture when it's being written to (never).
VkMemoryAllocateInfo alloc = { VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO };
alloc.allocationSize = memReqs.size;
alloc.memoryTypeIndex =
findMemoryTypeFromRequirementsWithFallback(memReqs.memoryTypeBits, VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT);
VK_CHECK(vkAllocateMemory(device, &alloc, nullptr, &memory));
vkBindImageMemory(device, image, memory, 0);