“Twitter experiences outage during Rhianna’s Super Bowl Halftime Show.”
“Microsoft suffers three-hour outage, shutting down core products like Teams, Outlook and M365.”
“Gamers report that Xbox Live is offline.”
These recent headlines are nightmare fuel for data center managers. If these outages occur within the biggest tech companies, how do cloud-based startups, SMBs, and other large-scale enterprises protect against similar scenarios?
While there are multiple failovers that can prevent data center outages, the key to ensuring network resiliency is intentional diversity. Organizations that are intentional about their diversity protocols both into and within their data centers offer the greatest protection against costly downtime -- keeping customers happy and operations running smoothly.
What Is Intentional Diversity?
For every data operator, the highest priority is uptime. Customers have come to expect “five-nines” of service availability -- operational reliability 99.999 percent of the time.
Intentional diversity is critical to keep organizations running during service interruptions. It includes a comprehensive plan designed to ensure uptime across an organization’s entire network, including the design of the data center itself. Intentional diversity views an organization’s network system from a holistic perspective that is centered on resiliency.
Historically, providers did not focus on secondary provisioning within the data center. “Diversity” typically referred to a single meet-me room and a single cross-connect over to a customer. So organizations may have completed some heavy lifting to provide backup coming into the facility, but dropped the ball within the data center.
Today, data center providers are more strategic about how they configure the routes within the data center itself. Whether the data is coming from the fiber landing area to the meet-me room or another configuration, they will plan for alternate routes and ways to cross-connect to a customer cage or cabinet.
Intentional plans consider questions such as “How do I get to those routers, servers, and switches in a diverse manner?” Designing with failover plans in mind so that you do not have any overlap or cross with your ladder rack adds resiliency and ensures availability of data.
Why is operational reliability important? It all comes down to the bottom line.
The Rising Cost of Downtime & New Solutions
According to The Uptime Institute Global Data Center Survey 2022 the number of data center outages is declining. However, the cost of data center outages is rising sharply.
Among the survey’s respondents, 25% reported that data center outages cost their organization more than $1 million in both direct and indirect expenses, a significant jump from 2021, when 15% reported similar outages. Uptime attributes these rising costs to a variety of factors -- including inflation, fines, labor costs, call outs, and replacement parts. More importantly, service outages directly impact businesses through customer dissatisfaction and lost revenue.
The causes of downtime are wide-ranging, from hardware failures and routine maintenance to natural disasters and human error. Regardless of what causes connections to go down, the results are the same: your critical applications are not accessible, causing business disruption and possible customer service problems. Cable mining is one example of how human error can cause service disruption. Data centers use cable mining to remove unwanted and unused cables and sometimes active cables are inadvertently cut during that process. Without adequate failovers, these fiber plant cuts can cause outages lasting eight to 12 hours. And without secondary provisions, this sort of outage translates to a full day of lost business and potentially many unhappy customers.
Reflexively, when an outage occurs, the focus turns to answering the question of how safeguards failed. Even organizations who thought they had a diverse data center configuration discover a missing element when service is interrupted.
Mesh networks are gaining popularity as one solution for safeguarding network connections. Architecture featuring multiple routes adds an extra level of protection to ensure uptime.
Data centers that feature multiple service providers offer additional value because their customers enjoy alternate routes and an added layer of backup. For example, FiberLight provides a diverse fiber footprint between data centers across Texas. That gives our customers redundancy on our network and enables better performance.
Four steps to ensure intentional diversification
1. Understand your architecture.
It is very important to understand the underlying service provider's entrances: how they come in and how they land physically. Look at your layer zero connectivity (layer zero refers to the physical routing/path on the fiber that the customer data is traversing).
2. Create an internal diversification plan.
Include a detailed plan that ensures your cross paths get to your underlying customer and stay completely diverse to protect against human error. While this step is fundamental, it is often overlooked when setting up a brand new data center.
3. Communicate your goals.
Another critical piece of the process includes integrating input from all partners when designing your failover plan. At FiberLight, we provide transport -- whether that is fiber or lit services from point to point. But we are not just providing circuits and connections, we make it a priority to work in partnership with our customers to deeply understand their architecture. We approach our service as part of a consultative design process where we strive to enable a truly diverse data center, so our customers do not end up with an outage and wonder what failed.
4. Organize your physical assets.
While this seems obvious, too many in our industry have experienced an outage when they believed they had secondary provisioning only to discover there were two cross-connects in the same ladder tray. Being mindful about ordering, managing, and organizing physical assets -- even with something as basic as color coding for cross connects -- can make all the difference in establishing a thorough backup plan. Additionally, reviewing the way lines are drawn on a Visio document with a qualified data center operator and scrutinizing maps are excellent strategies for exceptional uptime. Be deliberate about organizing and managing physical assets to save time when repairs are needed and ensure failovers will function when needed.
Once you have designed your data center, the next step is finding partners that provide diversity and enable the resiliency and uptime your organization expects.
Advice for customers looking for data center partners
When researching data center partners, take note of the questions they are asking. Are they asking about your architecture? Can they offer solutions that meet your diversity and latency needs? Be prepared to provide as much detail as possible about your performance metrics up front so you can determine if your partnership meets your business needs. The more intentional you are with your diversification plan, the better your data center team can sleep at night knowing you are protected against outages.
About the Author
Jay Anderson is the Chief Technology Officer at FiberLight. FiberLight builds one-of-a-kind fiber networks that ignite digital transformation and has decades of experience in designing, engineering, building, and optimizing fiber optic networks. Jay is responsible for evolving FiberLight’s infrastructure and technical capabilities to ensure the company can respond quickly to the changing digital ecosystem needs of its customers.