Last updated January 2020. 15 min read.

10 tips for optimizing console game graphics

On this page

What you will get from this page: Graphics optimization tips to ensure your console games run fast. These optimizations were made to an especially difficult scene to ensure smooth 30fps(frames per second) performance. Thanks to Rob Thompson, a console graphics developer at Unity (who presented these at Unite) for the tips.

Unity-사자의 서(Book of the Dead)-그래픽스 최적화

A focus on GPU optimization

The Book of the Dead (BOTD) was produced by Unity's demo team. It’s a real-time rendered animated short that showcases the visual quality possible with the High-Definition Render Pipeline (HDRP)

The HDRP is a high-fidelity Scriptable Render Pipeline built by Unity to target modern (Compute Shader compatible) platforms. The HDRP utilizes Physically-Based lighting techniques, Linear lighting, HDR lighting and a configurable hybrid Tile/Cluster deferred/Forward lighting architecture. 

All of the assets and all of the script code for BOTD are available for you in the Asset Store.

The objective with the demo was to offer an interactive experience where people could wander around inside that environment and get their hands on it and experience it in a way that was familiar from a traditional AAA games perspective. Specifically, we wanted to show BOTD running on Xbox One and PS4. We had the performance requirements of 1080p at 30fps, or better. 

As it’s a demo, and not a full game, the main focus for optimizations was on the rendering. 

Generally, performance for BOTD is fairly consistent as it doesn’t have any scenes with thousands of particles suddenly spawning into life, for example, or loads of animated characters appearing. 

Rob and the demo team found the view that was performing most poorly re: GPU load, shown in the above image.  

What's going on in the scene is pretty much constant; what varies is what’s within the view of the camera. If they could make savings on this scene, they’d ultimately increase performance throughout the entire demo.

The reason why this scene performed poorly is that it's an exterior view of the level looking into the center of it, so the vast majority of assets in the scene are in the camera frustum. This results in a lot of draw calls. 

In brief, here is how this scene was rendered:  

  • With the HDRP
  • Most of the artist-authored textures are between 1K and 2K sized maps, with a handful at 4K. 
  • It uses Baked Occlusion and Baked GI for the indirect lighting, and a Single Dynamic Shadow Casting Light source for direct lighting from the sun. 
  • It issues around a few thousand draw calls at any point (draw calls and compute shader dispatches)
  • At the start of the optimization pass, the view was GPU bound on PS4 Pro at around 45 milliseconds.
Unity-콘솔 게임 그래픽스 최적화하기

Finding the performance bottlenecks

Rob and the team looked at the GPU frame step by step, and saw the following performance: 

  • The Gbuffer was at 11ms
  • Motion Vectors and Screen Space Ambient Occlusion was pretty fast, at .25 and .6ms respectively
  • Shadow maps from the directional shadow casting with dynamic lights came in at a whopping 13.9ms
  • Deferred lighting was at 4.9ms
  • Atmospheric scattering was at 6.6ms

The image above shows what their GPU frame looked like, from start to finish: 

As you can see, they’re at 45 milliseconds and the two vertical orange lines show where they needed to be to hit 30fps and 60fps respectively.

Let’s look at 10 things the team did to improve performance for this scene.  

1. Control the batch count

CPU performance was not a big issue for the team because BOTD is a demo, so it doesn’t have the complexities of the script code that goes along with all of the systems necessary for a full game. 

However, keeping the batch count low is still a valuable tip for any platform. If your project uses one of the built-in renderers then you can do this by using Occlusion Culling, and, primarily, GPU instancing. Avoid using Dynamic batching on consoles unless you are sure it’s providing a performance win. 

If you are using one of the SRPs then you can control batch count with the SRP Batcher. The SRP Batcher reduces the GPU setup between DrawCalls by batching a sequence of Bind and Draw GPU commands. To get the maximum performance for your rendering, these batches must be as large as possible. To achieve this, you can use as many different Materials with the same Shader as you want, but you must use as few Shader Variants as possible.

Another takeaway: The number of individual assets used to create this scene is actually very small. By using good quality assets, and placing them intelligently, the team created complex scenes that don't look repetitive.

2. Use multiple cores

Both Xbox One and PS4 are multi core devices, and in order to get the best CPU performance, we need to try and keep those cores busy all of the time.

Unity’s new high performance multithreaded system, DOTS, makes it possible for your game to fully utilise the multicore processors available today (and in the future). DOTS comprises three subsystems: the Entity Component System, C# Job System and Burst Compiler

Please note that some of the DOTS packages are in Preview and therefore we do not recommend using it for production. 

However, you can make use of multiple cores via the Graphics Jobs mode under Player Settings -> Other Settings. 

Graphics Jobs provides a performance optimization in almost all circumstances on console unless you're only drawing a handful of batches. There are two types available: 

  • Legacy Jobs, available on PS4, and DirectX 11 for Xbox One

    • Takes pressure off the main thread by distributing work to other cores. Be aware that in very large scenes it can a bottleneck in the “Render Thread”, a thread that Unity uses to talk to the platform holders graphics API. 
  • Native Jobs, (available as the default in 2019.3 for new projects) on PS4, and DirectX 12 for Xbox One 
    • Distributes the most work across available cores and is the best option for large scenes. 

Learn more about multithreaded rendering and graphics jobs here

3. Use platform-specific analysis tools

Microsoft and Sony provide excellent tools for analyzing your project’s performance on both the CPU and on the GPU. These tools are available for free if you're developing on console. Learn them early on and keep using them throughout your development cycle. Pix for Xbox One and  Razor Suite for PlayStation are key tools in your arsenal when it comes to optimization on these platforms. 

4. Profile your post-process effects

Post-processing effects can take up a great deal of the framerate. Often this is caused by downloading post-processing assets from the Asset Store that are authored primarily for PC. They appear to run fine on console but in fact are not optimized to do so.  

When applying such effects, profile how long they take on the GPU, and iterate until you find a happy balance between visual quality and performance. And then, leave them alone, because they comprise a static cost in every scene, which means you know how much GPU bandwidth is left over to work with.

테셀레이션 사용 피하기

5. Avoid using tessellation (unless for a good reason)

In general don’t use tessellation in console game graphics. In most cases, you’re better off using the equivalent artist authored assets than you are runtime tessellating them on the GPU. 

But, in the case of BOTD, there was a good reason for using tessellation: rendering the bark of the trees. 

Tessellated displacement allowed them to add the deep recesses and gnarly details into the geometry that will self-shadow correctly in a way that normal mapping won't. 

As the trees are “hero” objects in much of BOTD, it was justified. This was done by having the same mesh used on the trees at LOD 0 and LOD 1. The difference between them is simply that the tessellated displacement was scaled back so that it's no longer in effect by the time they reached LOD one.

GPU에서 항상 안정적인 웨이브 프론트 점유를 목표로 하기

6. Aim for healthy wavefront occupancy at all times on the GPU

You can think of a wave front as a packet of GPU work. When you submit a draw call to the GPU, or a compute shader dispatch, that work is then split into many wave fronts and those wave fronts are distributed throughout all of the SIMDs within all of the compute units that are available on the GPU.

Each SIMD has a maximum number of wave fronts that can be running at any one time and therefore, we have a maximum total number of wave fronts that can be running in parallel on the GPU at any one point. How many of those wave fronts we are using is referred to as wave front occupancy, and it’s a useful metric for understanding how well you are using the GPU's potential for parallelism.

Pix and Razor can show wave front occupancy in great detail. The graphs above are from Pix for Xbox One. On the left we have an example of good wave front occupancy. Along the bottom on the green strip we can see some vertex shader wave fronts running and above that in blue we can see some pixel shader wave fronts running.

On the right though we can see there’s a performance issue. It’s showing a lot of  vertex shader work that's not resulting in much pixel shader activity. This is an under utilization of the GPU's potential. This brings us to the next optimization tip.

How does this come about? This scenario is typical when we're doing vertex shader work that doesn't result in pixels.

7. Utilize Depth Prepass

Some more analysis on Pix and Razor showed that the team were getting a lot of overdraw during the Gbuffer pass. This is particularly bad on console when looking at alpha-tested objects.

On console, if you issue pixel discard instructions or write directly to depth in your pixel shader, you can’t take advantage of early depth rejection. Those pixel shader wave fronts get run anyway even though the work is going to be thrown out at the end. 

The solution here was to add a Depth Prepass. A Depth Prepass involves rendering the scene in advance to depth only, using very light shaders, that can then be the basis of more intelligent depth rejection where you’ve got your heavier Gbuffer shaders bound.

The HDRP includes a Depth Prepass for all alpha tested objects, but you can also switch on a full Depth Prepass if you want. The settings for controlling HDRP, what render passes are used, and features enabled, are all made available via the HD Render PipeLine Asset

If you search in a HDRP project for the HD Render PipelineAsset you'll find a great big swath of checkboxes that control everything that HDRP is doing. 

For BOTD, using Depth Prepass was a great GPU win but keep in mind that  it does have the overhead of adding more batches to be drawn on to the CPU.

Unity-콘솔 게임 그래픽스 최적화하기

8. Reduce the size of your shadow mapping render targets

As mentioned earlier the shadow maps in this scene are generated against a single shadow casting directional light. Four Shadow map splits were used and initially they were rendering to a 4K Shadow map at 32-bit depth, as this is the default for HDRP projects. When rendering to Shadow maps the resolution of the Shadow map is almost always the limiting factor here; this was backed up by analysis in Pix and Razor.

Reducing the resolution of the Shadow map was the obvious solution, even though it could impact on quality. 

The shadow map resolution was dropped to 3k, which provided a perfectly acceptable trade-off against performance. The demo team also added an option specifically to allow developers to render to 16-bit depth Shadow maps. If you want to give that a go for yourself download the project assets. 

Finally, by changing the resolution of their Shadow map, they also had to change some settings on the light. 

At this point, the team had made their shadow map revisions and repositioned their shadow mapping camera to try and get the best utilization out of the newly-reduced resolution they had. So, what did they do next?

Unity-콘솔 게임 그래픽스 최적화하기-GPU 프레임

9. Only draw the last (most zoomed-out) Shadow map split once on level load

As the shadow mapping camera doesn’t move much, they could get away with this. That most zoomed-out split is typically used for rendering the shadows that are furthest from the Player camera. 

They did not see a drop in quality. It turned out to be a very clever optimization because it saved them both GPU framerate time and reduced batch numbers on the CPU.  

After this series of optimizations, their shadow map creation phase went from 13ms to just under 8ms; lighting pass went from 4.9ms to 4.4ms, and atmospherics pass went from 6.6ms to 4.2ms. 

This is where the team was at the end of the shadow mapping optimization. They were now within the boundary where they could run at 30fps on PS4 Pro.

10. Utilize Async Compute

Async Compute is a method available for minimizing periods of underutilization on the GPU with useful compute shader work. It’s supported on PS4 and has now become available on Xbox One with the 2019 cycle. It's accessible through Unity's Command Buffer interface. It's a method meant to be used with the SRP mainly, though not exclusively. Code examples are available in the BOTD assets and the HDR PSOS. 

The depth only phase, which is what you’re doing with shadow mapping, is traditionally a point where you’re not making full use of the GPU's potential. Async Compute allows you to move your compute shader work to run in parallel with your graphics queue, thereby making use of resources that the graphics queue is underutilizing.

BOTD uses Async Compute for it's tiled light list gather which is part of the deferred lighting, all of which is mostly done with compute shaders on console in HDRP. It also uses it for its SSAO calculations. Both of these overlap with the shadow map rendering to fill in the gaps in the wave front utilization.

For a run-through of some conceptual code where Async Compute is employed, tune into Rob’s Unite session at 35:30. 

Unity에서는 최적의 웹사이트 경험을 제공하기 위해 쿠키를 사용합니다. 자세한 내용은 쿠키 정책 페이지를 참조하세요.