Engineers at SlideShare, the popular online service for hosting and sharing slide decks LinkedIn acquired three years ago, have moved their application stack out of a managed hosting provider’s data center into a LinkedIn data center – a project that took more than one year to finish.
Data center consolidation after one company buys another is never quick and easy, and companies usually wait a long time to start moving systems between facilities. In another recent example, Instagram moved its application stack from Amazon Web Services into Facebook’s own data centers last year – two years after the social networking giant bought it.
In addition to consolidating infrastructure, in both cases the acquired companies cited the opportunity to use the new parent company’s technological resources as a reason to move.
LinkedIn site reliability engineer Anatoly Shiroglazov described the migration process in detail in a blog post this week. “It was clear that to sustain growth and integrate the best parts of both products, SlideShare needed to move to LinkedIn data centers,” he wrote.
Linkedin had a growing data center infrastructure and needed its systems to work across multiple sites, while SlideShare appears to have hosted its stack in a single location. The parent company also had a much larger engineering team that could make bigger investments in technology and had already built sophisticated search and analytics systems. It also had large reliability engineering and database administration teams.
LinkedIn’s data center needs are growing rapidly. The company recently made changes to its data center strategy, switching from retail colocation to wholesale facilities, where it is for the first time using a custom infrastructure design.
The social network’s storage and compute requirements grew 30 percent over the last 12 months. It currently uses 30 MW of data center capacity in the US and overseas, working to add more capacity in Oregon and Singapore.
The SlideShare team had to change a lot to adjust to the way LinkedIn’s infrastructure was set up. The parent company, for example, doesn’t allow all of its servers to have access to the internet for security purposes. Only servers in the demilitarized zone, also known as DMZ, had external access for security purposes. A network DMZ acts as a buffer between a company’s internal network and the rest of the world.
All SlideShare hosts had access to the internet, and the company’s software development cycle depended heavily on this capability, Shiroglazov wrote.
Another example is the operating system. LinkedIn was using a much more recent Red Hat distribution of Linux than SlideShare was, and a lot of SlideShare’s code had to be recertified on the new OS. The team also had to change its Puppet code for infrastructure management and database operation practices.
To make sure the migration didn’t bring the service down, the SlideShare team first deployed the new stack in the managed hosting data center they had been using to make sure it worked, and then started diverting traffic to LinkedIn data centers, read traffic first, write traffic second.
Timeline of the transition from SlideShare’s hosting data center to a LinkedIn facility over 14 days. Click image to enlarge. (Source: LinkedIn Engineering blog)
Once the transition was competed, the company decommissioned its infrastructure in the managed hosting facility. The next step is to modernize its software stack even further. SlideShare is now working on breaking its monolithic application down into microservices, phasing out legacy components as they are replaced by LinkedIn equivalents, according to Shiroglazov.