When demand on a server spikes dramatically, sometimes you need to improvise to keep things online. An interesting example is provided by CCP Games, which operates EVE Online, a science fiction gaming universe in which faction of players battle with fleets of space ships.
When an enormous battle recently broke out on a node with limited resources, the engineers at EVE Online managed extraordinary server loads by using “time dilation” – altering time within the game universe to effectively throttle activity to match system resources.
EVE Online is unusual in that it functions as a single game environment, with a single copy of its universe on a massive cluster of servers. Resources for specific solar systems are supported by a particular server, with players and spaceships able to move between solar systems. That means that a burst of activity in a particular sector of the EVE Online universe can create scalability problems. Administrators can shift load by moving activity to other servers, but that interrupts the player experience, and so is not ideal when large space battles break out.
One Bad Click Tests Capacity
A single misclick would test the system. On Jan. 27 a player accidentally “warped” an extremely valuable Titan spaceship into the midst of a large enemy fleet (more details at Penny Arcade and PC Gamer). Both sides called in reinforcements, and in short order more than 2,750 players were waging a hectic battle on a server that doesn’t normally see anywhere near that level of activity.
“The customer service duders (GMs) keep an eye out for gigantic fights like this,” recounted CCP Veritas, an engineer at CCP. “We’ve got a cluster status webpage that shows big red numbers when a node gets overloaded like it was by this fight, so it’s pretty easy to see what’s up.”
Admins isolated the battle by quickly moving non-combatants to other servers. That’s where time dilation comes in.
“A large majority of the load in large engagements is tied to the clock – modules, physics, travel, warp-outs, all of these things happen over a time period, so spacing out time will lower their load impact proportionally,” writes CCP Veritas. “So, the idea here is to slow down the game clock enough to maintain a very small queue of waiting tasklets, then when the load clears, raise time back up to normal as we can handle it. This will be done dynamically and in very fine increments; there’s no reason we can’t run at 98% time if we’re just slightly overloaded.”
The Jan. 27 event, known in EVE as the Battle of Asakai, tested that approach, but kept the game functioning until the battle was completed. ”Even though Time Dilation was pushed to its configured limit of 10%, it still allowed a more graceful degradation than the unpredictable battles of old,” CCP Veritas shared. “We’re pretty sure that without the recent efforts on the software and hardware front, such a fight of this scale would simply not have been possible.”