Brandon Whichard is director of product management at Boundary.
In a recent adaptation of the bestseller book about high-speed trading, “Flash Boys: A Wall Street Revolt,” author Michael Lewis reports on a five-millisecond window in late 2013 during which General Electric (GE) experienced heavy bid and offer activity resulting in 44 trades. Just keep in mind, these are milliseconds. Human beings perceive one second as a very short amount of time. A computer sees one second as a long time during which thousands of things can (and do) happen. One minute of computing can produce tens of thousands of application flows and in an hour, millions.
Research in the book showed that once a few curious traders realized that the mind-blowing speed at which trades were happening was costing them money, they developed a trading strategy that helped guarantee they could get the trade they wanted without having the stock price artificially inflated by high-frequency traders. While milliseconds may not matter outside the stock market, seconds do matter–and to more industries than you might think.
More than Just Investing
If you are streaming a movie or music or video game and the content buffers out for a few seconds, you’re going to be annoyed, right? If it happens for a few minutes, you’re going to shut it down altogether. HBO Go recently buckled during the April 7 Season Four premiere of “Game of Thrones,” leaving many anxious viewers in the dark as it tried to unclog the network.
Take a more staid industry, such as hotels. It’s spring break, and dozens of families are checking in at the same time. Suddenly, your reservation system stalls and people are waiting, impatiently. The line grows longer. Will those people come back next year? Perhaps not. We can only guess at what (if any) checks and balances and redundancies that the ObamaCare website had when it launched. Real-time alerts of application and network status may have been in the strategy, but the site and its underlying infrastructure were clearly not designed for Web scale.
For years, companies have been operating under the presumption that monitoring system health every few minutes is adequate. In the days of client server and more recently, three-tier web applications, application models were relatively static and engineered for highly predictable workloads in terms of load, usage patterns and application functionality. Now applications are dealing with unpredictable loads and are dynamically scaling to meet the needs of that load. The applications themselves are changing at a faster rate than ever before, as time to market becomes a critical component in business success.
Monitoring Time Frames
This means that today, monitoring performance metrics every minute or two is woefully inadequate because you are missing a bunch of data points that indicate a brewing problem (see graph below). The bottom line is the longer it takes for you to detect a problem, the larger the impact it will have on your customers. You have to monitor more frequently to detect problems faster and prevent issues from affecting users. Waiting to address the problem after the fact is too late.
Operating differently, using second by second knowledge across a fault-tolerant, open-source architecture is sometimes called “Web Scale” IT. Gartner describes it as “a pattern of global-class computing that delivers the capabilities of large cloud service providers within an enterprise IT setting.” The research firm predicts that by 2017, Web Scale IT will be an architectural approach found in 50 percent of global enterprises, up from less than 10 percent in 2013.
Organizations that focus on the tenets of Web Scale IT—open architecture, full automation, rigorous testing and continual monitoring of their service not only prevent problems – but can influence buying decisions faster and grow sales, move markets or even elect a new president. President Obama’s 2012 reelection campaign staff made daily decisions affecting fundraising, messaging and many other strategies relying upon the ability to crunch massively large data sets all the time. The campaign’s website, databases, and overall technology infrastructure was hosted on Amazon and incorporated a slew of open source technologies. You can bet that frequent testing and modern operations management tools were used to keep the machine running smoothly.
Web Scale IT Requires Rethinking How We Run IT
Here are the core methods that progressive companies are using to build a Web-Scale IT operation:
1. Architect for horizontal scalability. The high-volume, web-based architecture is built on peer-to-peer relationships, not a single control point. The new architecture provides a high tolerance of failure, as you have many components working with each other but completely independent from each other. Therefore, if one node goes down, the whole system doesn’t come to its knees but dynamically adjusts to accommodate the change. Open-source software models are intrinsic to this architecture.
2. Automate everything. IT automation and change management tools like Puppet and Chef are popular these days, because they help companies deploy tens of thousands of instances in a few seconds. That’s not a task that anybody can do manually, and if you are a Web Scale business, automation is your friend. Such tools can auto scale components based on load, so that while you sleep, the system is scaling up and down as needed to maintain performance levels. These corrective actions are based on scripted, well-defined processes, and require detailed upfront work and ongoing maintenance. Yet automating all facets of operations is how Web Scale IT is possible.
3. DevOps all the way. DevOps enables rapid release cycles and that is part of the Web Scale attribute set. Without a well-managed DevOps culture enabled by modern, collaborative tools, Web Scale isn’t possible.
4. Use the cloud. Web Scale infrastructure is viable thanks to cloud computing. The cloud, with its global reach and elasticity, is the ideal place to house a Web Scale application. You need a provider that can let you spin hundreds or thousands of instances to accommodate spikes in demand. You also want to make sure you can spread your instance across multiple availability zones so problems in one geographic region don’t take down your whole site. Amazon Web Services (AWS) is the preferred cloud platform for many organizations today and is a logical place to start.
It won’t be long before every business will need to operate at Web scale, just to stay alive. Being prepared begins by having conversations with your team to assess just how far along you are in the journey. With more and more business being done online and consumer expectations growing all the time for speed, seconds are fast becoming the lifeblood of IT.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.