How to design your multiplayer game for scale
Mastering your multiplayer game launches
In our on-demand webinar, Principal Partner Engineer Aaron Moon discusses how to master your multiplayer game launches – from server architecture to scale testing.
Or, read on below for insight into:
- Designing a game architecture that scales
- Designing your game loop for scale
- Why your backend matters
How to design a game architecture that scales
The key to designing game architecture is considering efficiency right from the beginning. Focusing on metrics, logging, telemetry, and monetizing too early may result in excessive bloat in game architecture. Packing all of that into your game server at runtime can actually reduce return on investment (ROI) and increase total cost of ownership.
Instead, start with a minimum viable product (MVP) as early as possible. Get the server running, the client working, your netcode and matchmaker system locked in, and then build from there. You should also consider locking in your hosting provider early – they may provide prebuilt solutions and infrastructure that can save you development time.
When it comes to your game design, what does simple look like? The image above shows a complex design with not just a game server instance, but many ancillary processes on the same machine, including matchmakers, logging, and metrics executables.
This complexity doesn’t necessarily become apparent until you start getting into scale testing, which is when you may start realizing that these things don’t work well together when there are thousands of players. Also, ancillary services might not have any resource fencing, so they may start cannibalizing resources on the machine and impacting player performance. Keeping things simple can help mitigate these issues.
In terms of game server design, simple is packing things into a single server instance, like a single executable, and resilient matchmaker (maybe hosted on a backend service rather than on a machine). That way, when one of the game servers is lost, it doesn’t take other players with it and the affected area remains small. At scale, if you have a game server with this design that malfunctions, it won’t impact your player experience.
To help minimize cost risk, don’t keep ancillary processes like debugging, watchers, and matchmaking on the same machine. If you have a situation where a game server dies, and you have cloud machines that it’s scaled into, you’re leaving the ancillary processes “zombied.” They’re still consuming resources, which costs your studio money, and you have no way to shut them down.
Instead, consider making the ancillary processes such as matchmaking and debugging sub-processes of the game server – keep it simple. Then, if you lose the game server, it takes the sub-processes with it, rather than leaving them running in the background. In this instance, if the server stops working, you can spin up another one without the extra resources and costs associated with “zombied” processes.
Designing your game loop for scale
It’s a good idea to keep in mind how your game loop interacts with the infrastructure, and how the infrastructure supports your game. For example, if your game has lobbies and matchmakers, there should be a reason to match in and out of the lobbies and sessions from those lobbies.
Think about what kind of session design you have – are you building a persistent game like an MMO, or a short-session game where runtimes reboot every time? Each game loop design can have risks and rewards. Here are some key considerations when it comes to designing short, long, and persistent game sessions.
In multiplayer games with long-running sessions, there can be issues like memory leaks, escalating RAM usage, and more – which may not appear until you’re running your game at scale.
Here are some risks associated with long-running game sessions:
- DDoS attacks: Since the IP of the game instance doesn’t change with a persistent game session model, there’s a possibility of DDos attacks.
- High cloud costs: It takes a lot of costly resources to maintain a game that’s always, or almost always, active – even if players aren’t playing the game.
- Interruptions due to patching: Patching can be difficult to manage, because there might be active matches that would need to end in order to patch the game, resulting in a poor player experience.
There are still risks and player experience considerations for short game sessions. Even if your two-player game is only two minutes, facilitating hundreds of thousands (or more) of those simultaneous matches can be costly and present risk.
Here are some considerations for short game sessions:
- High backend load: Constant API calls to your backend to facilitate short game sessions can be a huge load , so it needs to be resilient.
- Hard on infrastructure: When the server restarts between short sessions, you may see the CPU and RAM spike at the beginning of loading the processes – which can be expensive.
- Requires robust matchmaker: With short sessions, you need an effective matchmaker to handle the matchmaking tickets for new sessions, reconnections, and more.
- Quality of Service (QOS): You’ll need a provider that will let you send your game client a list of IP addresses in real-time to determine what that server region should be based on real telemetry and data, rather than physical location.
- Needs metrics and telemetry data: If something goes wrong, a player is more likely to quit and start another session rather than report an error. If you’re not getting metrics and telemetry from those short sessions, you might miss things that are going wrong in the game.
Here are the pros and cons of short session-based games:.
Pros:
- No need to stabilize for long runtimes
- Enables faster patching
- Ease of scaling down
- No idle state
- Log files for each session enables easier troubleshooting
Cons:
- More callbacks, which means more load at scale
- Cloud compute cost for restarts is not trivial at scale
- Race conditions
- MEM/CPU spikes on startup
- More complexities for matchmaking
- Machine performance issues can be hidden
In multiplayer games that are persistent (like MMOs), there can be certain problems and risks. For example, supporting situations like player migrations between servers means that you need a more robust backend system – including costly servers and powerful hard drives.
Here are some considerations for persistent game session designs:
- Budgeting may be a concern: More robust backend systems are required to make persistent sessions perform, so you’ll need to calculate how much the maintenance costs will be based on users to understand if this is the right session design for you.
- Cloud might not be feasible: Due to the high demands of a persistent game session, cloud may be too costly for your project.
- Network quality is extremely important: Latency and latency management will make or break your persistent game session for players, so testing your network early and rigorously is essential for a good player experience.
- Patching is difficult and risky: Maintenance-related downtime is never a great experience for players – even if it’s necessary to improve the overall experience. Communicating with your community about maintenance time is key. You may also want to consider having multiple game versions running concurrently and bleeding in patches over time.
Here are the pros and cons of persistent session-based games:
Pros:
- Servers are always up and running
- Shorter game loop possible
- Less load on your backend
Cons:
- More difficult to design
- Instability over time
- Sessions cleanup
- Idle state design necessary
- Needs to be matchmaker aware
- A/B patching might take longer
It’s a good idea to prepare for potential player experience issues at scale. Launching, running, and updating a multiplayer game can be chaotic, so it’s important to do situational testing for “chaos resilience” and ensure that your game’s backend is set up to handle that chaos.
For example, what happens if everyone crashes out of your game and attempts to come back into matchmaking at the same time? Figuring out the backend’s response to that situation and having it set up to handle that issue can save you headaches (and help protect your reputation) in the long run.
It’s likely that you will have to patch your game during launch. That’s why it’s important to build your infrastructure with the ability to patch during production and launch. This can help make the chaos of launch day and on-the-fly patching much quicker and smoother, and the impact on players can be limited.
One way to approach this is to run multiple versions of your game simultaneously. However, you’ll also have to make sure that your infrastructure can handle multiple versions. Additionally, you’ll need to have a sandbox with all of the different versions.
If you have already built in the ability for multiple versions to run simultaneously, there’s no downtime or disruption to players when you patch.
Frequent scale testing is critical, so finding a provider that can help you with it should be a major consideration when selecting your services.
One major thing to scale test is server tessellation and defragmentation. Server tessellation is an important cost consideration. Essentially, you want to use inexpensive metal machines for hosting first. As your player base fluctuates, you’ll also want to remove more expensive cloud machines quickly, which is more cost effective.
Game Server Hosting (Multiplay) avoids allocating to machines which are more costly, which enables us to remove them more quickly when the number of players is declining.
The ability of our system to do this depends on the lifespan of your match. Shorter match durations allow us to more quickly end allocations on costly machines. Longer matches mean that we can’t shut down the machines until the match finishes.
Ready to build your next multiplayer game? Here are a few resources to get you started – learn more about scaling with Game Server Hosting, check out our on-demand webinar on Game Server Hosting and Matchmaker, and explore the multiplayer solutions we offer below.