What you will get from this page: Graphics optimization tips to ensure your console games run fast. These optimizations were made to an especially difficult scene to ensure smooth 30fps(frames per second) performance. Thanks to Rob Thompson, a console graphics developer at Unity (who presented these at Unite) for the tips.
10. Utilize Async Compute
Async Compute is a method available for minimizing periods of underutilization on the GPU with useful compute shader work. It’s supported on PS4 and has now become available on Xbox One with the 2019 cycle. It's accessible through Unity's Command Buffer interface. It's a method meant to be used with the SRP mainly, though not exclusively. Code examples are available in the BOTD assets and the HDR PSOS.
The depth only phase, which is what you’re doing with shadow mapping, is traditionally a point where you’re not making full use of the GPU's potential. Async Compute allows you to move your compute shader work to run in parallel with your graphics queue, thereby making use of resources that the graphics queue is underutilizing.
BOTD uses Async Compute for it's tiled light list gather which is part of the deferred lighting, all of which is mostly done with compute shaders on console in HDRP. It also uses it for its SSAO calculations. Both of these overlap with the shadow map rendering to fill in the gaps in the wave front utilization.
For a run-through of some conceptual code where Async Compute is employed, tune into Rob’s Unite session at 35:30.
The Built-in Render Pipeline, the URP, and the High Definition Render Pipeline (HDRP) also use the Deferred Shading rendering path. In Deferred Shading, lighting is not calculated per object.
Instead, Deferred Shading postpones heavy rendering, such as lighting, to a later stage and uses two passes. In the first pass, also called the G-buffer geometry pass, Unity renders the GameObjects. This pass retrieves several types of geometric properties and stores them in a set of textures.
G-buffer textures can include:
- Diffuse and Specular colors
- Surface smoothness
- World Space normals
- Emission + ambient + reflections + lightmaps
In the second pass, or lighting pass, Unity renders the scene’s lighting based on the G-buffer. Imagine iterating over each pixel and calculating the lighting information based on the buffer instead of the individual objects. Adding more non-shadow casting lights in Deferred Shading does not incur the same performance hit as with Forward rendering.
While choosing a rendering path is not exactly an optimization per se, it can affect how you optimize your project. The other techniques and workflows in this section vary depending on what render pipeline and path you select.
Both HDRP and URP support Shader Graph, a node-based visual interface for shader creation. It allows users without experience in shader programming to create complex shading effects.
Over 150 nodes are currently available in Shader Graph. Additionally, you can make your own custom nodes with the API.
Each shader in a Shader Graph begins with a Master Node, which determines the graph’s output. Construct the shader logic by adding and connecting nodes and operators within the visual interface.
The Shader Graph then passes into the render pipeline’s backend. The final result is a ShaderLab shader that’s functionally similar to one written in HLSL or Cg.
Optimizing a Shader Graph follows many of the same rules that apply to traditional HLSL or Cg shaders; an important one being that the more processing your Shader Graph does, the more it will impact the performance of your application.
If you are CPU-bound, optimizing your shaders won’t improve frame rate, but it might improve your battery life for mobile platforms.
If you are GPU-bound, follow these guidelines for improving performance with Shader Graph:
- Remove unused nodes: Don’t change any defaults or connect nodes unless those changes are necessary. Shader Graph compiles out any unused features automatically. When possible, bake values into textures. For example, instead of using a node to brighten a texture, apply the extra brightness into the texture asset itself.
- Use a smaller data format when possible: Consider using Vector2 instead of Vector3, or reducing precision (e.g., half instead of float), if your project allows for it.
- Reduce math operations: Shader operations run many times per second, so try to optimize math operators when possible. Aim to blend results instead of creating a logical branch. Use constants and combine scalar values before applying vectors. Finally, convert any properties that do not need to appear in the Inspector as inline nodes. All of these incremental enhancements can help your frame budget.
- Branch a preview: As your graph gets larger, it can become slower to compile. Simplify your workflow with a separate, smaller branch containing only the operations you want to preview at the moment. Then, iterate more quickly on this smaller branch until you achieve the desired results. If the branch is not connected to the Master Node, you can safely leave the preview branch in your graph. Unity removes nodes that do not affect the final output during compilation.
- Manually optimize: Even experienced graphics programmers can still use a Shader Graph to lay down some boilerplate code for a script-based shader. Select the Shader Graph asset, then select Copy Shader from the Context menu. Create a new HLSL/Cg shader and then paste in the copied Shader Graph. This is a one-way operation, but it lets you squeeze additional performance with manual optimizations.
Remove built-in shader settings
Remove every unused shader from the list called Always Included Shaders, which is found in the Graphics settings (Edit > Project Settings > Graphics). Add any shaders here that are needed for the lifetime of the application.
Strip shader variants
Use the Shader compilation pragma directives to adapt the compiling of a shader to each target platform. Then use a shader keyword (or Shader Graph keyword node) to create shader variants with certain features enabled or disabled.
Shader variants can be useful for platform-specific features, but also increase build times and file size. You can prevent shader variants from being included in your build if you know that they’re not required.
First, parse the Editor.log for shader timing and size. Then locate the lines that begin with Compiled shader and Compressed shader.
This example log shows the following statistics:
Compiled shader 'TEST Standard (Specular setup)' in 31.23s
d3d9 (total internal programs: 482, unique: 474)
d3d11 (total internal programs: 482, unique: 466)
metal (total internal programs: 482, unique: 480)
glcore (total internal programs: 482, unique: 454)
Compressed shader 'TEST Standard (Specular setup)' on d3d9 from 1.04MB to 0.14MB
Compressed shader 'TEST Standard (Specular setup)' on d3d11 from 1.39MB to 0.12MB
Compressed shader 'TEST Standard (Specular setup)' on metal from 2.56MB to 0.20MB
Compressed shader 'TEST Standard (Specular setup)' on glcore from 2.04MB to 0.15MB
These stats tell you a few things about the shader:
- It expands into 482 variants due to the #pragma multi_compile and shader_feature.
- Unity compresses the shader included in the game data to roughly the sum of the compressed sizes: 0.14+0.12+0.20+0.15 = 0.61 MB.
- At runtime, Unity keeps the compressed data in memory (0.61 MB), while the data for your currently used graphics API remains uncompressed. For instance, if your current API is Metal, that would account for 2.56 MB.
After a build, the Project Auditor (experimental) can parse the Editor.log to display a list of all shaders, shader keywords, and shader variants compiled into the project. It can also analyze the Player.log after the game is run. This shows you what variants the application actually compiled and used at runtime.
Employ this information to build a scriptable shader stripping system and reduce the number of variants. This can improve build times, build sizes, and runtime memory usage.
Read the Stripping scriptable shader variants article to see this process in detail.
Smooth edges with anti-aliasing
Anti-aliasing contributes to sharper image quality by reducing jagged edges and minimizing Specular aliasing.
If you are using Forward rendering with the Built-in Render Pipeline, Multisample Anti-aliasing (MSAA) is available in the Quality settings. MSAA produces high-quality anti-aliasing, but can be expensive. The setting called MSAA Sample Count from the drop-down menu defines how many samples the renderer uses to evaluate the effect (None, 2X, 4X, 8X). If you are using Forward rendering with URP or HDRP, you can enable MSAA on the URP Asset or HDRP Asset respectively.
Alternatively, you can add anti-aliasing as a post-processing effect. This appears on the Camera component (under Anti-aliasing) with a couple of options:
- Fast Approximate Anti-aliasing (FXAA) smoothes the edges on a per-pixel level. This is the least resource-intensive type of anti-aliasing. It slightly blurs the final image.
- Subpixel Morphological Anti-aliasing (SMAA) blends pixels based on the borders of an image. It offers much sharper results than FXAA and is suited for flat, cartoon-like, or clean art styles.
In HDRP, you can also use FXAA and SMAA in the Post-processing Anti-aliasing on the Camera with an additional option:
- Temporal Anti-aliasing (TAA) smoothes edges using frames from the history buffer. This works more effectively than FXAA but requires motion vectors in order to work. TAA can also improve Ambient Occlusion and Volumetrics. It's generally higher quality than FXAA, but requires extra resources and can produce occasional ghosting artifacts.
The quickest option to create lighting is one that doesn’t need to be computed per frame. Use Lightmapping to bake static lighting just once, instead of calculating it in real-time.
Add dramatic lighting to your static geometry using Global Illumination (GI). Check the Contribute GI option for objects to store high-quality lighting in the form of lightmaps.
The process of generating a lightmapped environment takes longer than just placing a light in the scene, but it provides key benefits such as:
- Running two or three times faster for two-per-pixel lights
- Improved visuals via Global Illumination, which can calculate realistic-looking, direct and indirect lighting, while the lightmapper smoothes and denoises the resulting map
- Baked shadows and lighting render without the performance hit that typically results from real-time lighting and shadows
More complex scenes can require long bake times. If your hardware supports the Progressive GPU Lightmapper (in preview), this option can dramatically speed up your lightmap generation by using the GPU instead of the CPU.
Follow this guide to get started with Lightmapping in Unity.
Minimize Reflection Probes
While Reflection Probes can create realistic reflections, they can be costly in terms of batches. As such, try these optimization tips to minimize impact on performance:
- Use low-resolution cubemaps, culling masks, and texture compression to improve runtime performance.
- Use Type: Baked to avoid per-frame updates.
- If the use of Type: Real-time is necessary in URP, try to avoid Every Frame whenever possible. Adjust the Refresh Mode and Time Slicing settings to reduce the update rate. You can also control the refresh with the Via Scripting option and render the probe from a custom script.
- If the use of Type: Real-time is necessary in HDRP, select the On Demand mode. You can also modify the Frame Settings in Project Settings > HDRP Default Settings.
- Reduce the quality and features under Real-time Reflection for improved performance.
Shadow casting can be disabled per Mesh Renderer and light. Disable shadows whenever possible to reduce draw calls. You can also create fake shadows using a blurred texture applied to a simple mesh or quad underneath your characters. Otherwise, you can create blob shadows with custom shaders.
In particular, avoid enabling shadows for Point Lights. Each Point Light with shadows requires six shadow map passes per light – compare that to a single shadow map pass for a Spot Light. Consider replacing Point Lights with Spot Lights where dynamic shadows are absolutely necessary. If you can avoid dynamic shadows, use a cubemap as a Light.cookie with your Point Lights instead.
Substitute a shader effect
In some cases, you can apply simple tricks rather than adding multiple extra lights. For example, instead of creating a light that shines straight into the Camera to give a Rim lighting effect, use a shader to simulate Rim Lighting (see Surface Shader examples for an implementation of this in HLSL).
Use Light Layers
For complex scenes with many lights, separate your objects using layers, then confine each light’s influence to a specific Culling Mask.
Use Light Probes
Light Probes store baked lighting information about the empty space in your scene, while providing high-quality lighting (both direct and indirect). They use spherical harmonics, which calculate quickly compared to dynamic lights. This is especially useful for moving objects, which normally cannot receive Baked Lightmapping.
Light Probes can apply to static meshes as well. In the Mesh Renderer component, locate the Receive Global Illumination drop-down menu and toggle it from Lightmaps to Light Probes.
Continue using Lightmapping for your prominent level geometry, but switch to Light Probes for lighting smaller details. Light Probe illumination does not require proper UVs, saving you the extra step of unwrapping your meshes. Probes also reduce disk space since they don’t generate Lightmap textures.