Vulkan requires the application to manage image layouts, so that all render pass
attachments are in the correct layout when the render pass begins.
This is usually done using pipeline barriers or the
parameters of the render pass.
If the rendering pipeline is complex, transitioning each image to its correct layout
is not trivial, as it requires some sort of state tracking.
If previous image contents are not needed, there is an easy way out, that is setting
While this is functionally correct, it can have performance implications as it may
prevent the GPU from performing some optimizations.
This tutorial will cover an example of such optimizations and how to avoid the performance overhead from using sub-optimal layouts.
Mali GPUs feature transaction elimination, a technology that is used to avoid frame buffer write bandwidth for static regions of the framebuffer. This is especially beneficial for games that contain many static opaque overlays.
Transaction elimination is used for an image under the following conditions:
The driver keeps a signature buffer for the image to check for redundant frame buffer writes. The signature buffer must always be in sync with the actual contents of the image, which is the case when an image is only used within the tile write path. In practice, this corresponds to only using layouts that are either read-only or can only be written to by fragment shading. These “safe” layouts are:
All other layouts, including
UNDEFINED layout, are considered “unsafe” as they allow writes
to an image outside the tile write path.
When an image is transitioned via an “unsafe” layout, the signature buffer must be invalidated
to prevent the signature and the data from becoming desynchronized.
Note that the swapchain image is a slightly special case, as it is considered “safe” even
when transitioned from
In addition signature invalidation could happen as part of a
vkCmdWaitEvents(), or as part of a
if the color attachment reference layout is different from the final layout.
vkCmdBlitImage() framebuffer transfer stage operation will also always invalidate
the signature buffer, so shader-based blits will likely be more efficient.
The sample sets up deferred rendering using two render passes, to show the effect of transitioning
G-buffer images from
UNDEFINED rather than their last known layout.
Note that a deferred rendering implementation using subpasses might be more efficient overall; see the subpasses tutorial for more detail.
The base case is with all color images being transitioned from
UNDEFINED, as shown in the image below.
When we switch to using the last known layout as
oldLayout in the pipeline barriers, transaction elimination
can take place.
This is highlighted in the counters showing about double the amount of tiles killed by CRC match,
along with ~10% reduction in write bandwidth.
A reduction in memory bandwidth will reduce the power consumption of the device, resulting in less overheating and longer battery life. Additionally, this may improve performance on games that are bandwidth limited.
COLOR_ATTACHMENT_OPTIMALimage layout for color attachments.
storeOp = DONT_CARErather than
UNDEFINEDlayouts to skip unneeded render target writes.
vkCmdBlitImage()to copy constant data between two images; shader-based blits are likely to be more efficient as they will preserve the signature integrity.