Kris Beevers is the CEO & Founder of NSONE.
The concept of load shedding in web infrastructure is a pretty simple one. If a particular system in an application's delivery stack is overloaded, "shed" some of the load so the system can at least continue to provide service for a subset of requests, rather than falling over and dying.
In today's application stacks, which are often multi-faceted and distributed across data center environments for resiliency and performance, load shedding plays a powerful role in efficient application delivery, preventing outages and maximizing the efficiency of the infrastructure. To support modern application architectures, today's most advanced DNS and traffic management providers build load-shedding capabilities directly into their services, making it possible for any application to easily take advantage of this powerful technology.
Here are five ways that load shedding can ensure your distributed application is delivered reliably and efficiently to your users.
Load Shedding Prevents Cascading Failures
A cascading failure is an incident that starts in one system or data center in your architecture and causes a chain reaction of failures in your other systems and data centers. For example, if a load balancer fails in one of your delivery facilities, a naive failover approach might be to shift all that traffic to the "next best" data center, say, the next closest. The sudden flood of failover traffic may overwhelm the load balancer in the failover data center, and so on.
A traffic management platform that supports load shedding can take in data from your systems, like system load metrics or connection counts from your load balancers, and ensure none of your systems are pushed beyond their limits. With load shedding, when a load balancer in one of your data centers fails, the bulk of its traffic can be shifted over to the next closest data center, up to a load watermark or threshold for that secondary facility. After that, the rest of the traffic can be shifted to a tertiary data center to avoid overloading the secondary one. Load shedding can cascade your traffic across a number of facilities and avoid overloading any of them.
Load Shedding Prevents 'Flapping'
The "brute force" way to deal with an overloaded data center is to shut it off and handle the traffic elsewhere. This binary approach is ill-suited for today's traffic spikes, and in fact causes painful "flapping" in some situations. For example, if DC1 gets more traffic than it can handle and is brought offline, traffic is routed elsewhere. When DC1 recovers and comes back online, traffic is shifted back, quickly overloading DC1 again. Rinse and repeat.
Instead of a binary "on-off" approach to overload situations, load shedding enables you to use a data center to capacity but not beyond, bleeding away the traffic that causes an overload to your other facilities. There is no need to shut off traffic altogether, as overflow traffic that causes delays or failure is easily shifted elsewhere, keeping all systems humming along at capacity.
Load Shedding Makes Efficient Use of Infrastructure
In a distributed application, there's a push and pull between optimizing for performance and optimizing for heavy traffic. If you distribute your spend (and traffic) effectively across a number of data centers and push your content to the "edge," users will then see reduced response times and have a better experience with your application. Spreading your infrastructure too thin can be disastrous when there is a localized traffic spike or attack, causing flapping or cascading failures.
With load shedding, it's okay to run lean in your edge facilities and build mostly for the common case, optimizing delivery performance. Abnormally high and localized traffic is smoothly transitioned to your other facilities in case of a spike.
Load Shedding = Infinite Capacity, or at Least, Graceful Failures
Even if you're stuck in a single data center, or with a limited infrastructure budget, load shedding can help you deal with large traffic spikes without falling over entirely. When your global capacity is maxed out, load shedding can shift overflow traffic to a lightweight, "failwhale" page hosted by a third party or in on-demand infrastructure. You'll continue to provide good service to most of your users, and a better failure experience to the rest.
Load Shedding Plays Nicely with Other Traffic Management Tools
Modern DNS services provide many different advanced traffic management technologies in addition to load shedding, such as geographic routing, network-based fencing, latency-based routing and more. Chances are, you'll want to take advantage of these tools to distribute your "nominal" traffic across your data centers and leverage load shedding to help deal with traffic spikes or other extraordinary situations. The good news is, platforms with advanced traffic management tools, like load shedding, make it easy to mix and match routing algorithms to achieve complex behaviors, helping you improve your application's performance, maximize your infrastructure ROI and delight your users.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.