Twitter is expanding its use of Google Cloud, moving more of its computing infrastructure from its own data centers to the cloud platform and using more of Google’s data tools to empower more of its staff to innovate.
Google announced that the two are taking their few-years-long relationship to a new level Thursday.
"As Twitter continues to scale, we're excited to partner with Google on more industry-leading technology innovation in the data and machine learning space," Twitter CTO Parag Agrawal said in a statement.
Twitter is one of the most prominent applications of the hybrid multi-cloud approach to technical infrastructure that’s gaining in popularity. It’s keeping a portion of its infrastructure on premises, while using different cloud providers for their specific capabilities it can benefit from the most.
Last year the social network signed a cloud deal with AWS to use Amazon’s cloud infrastructure for serving its timelines to users around the world. The point of that deal was to improve performance for users by serving the timelines from data centers closer to where more people live, Agrawal said at the time.
In the latest Google Cloud deal, Twitter will be moving its “offline analytics, data processing, and machine learning workloads” to Google’s cloud. It will adopt Google’s data tools including BigQuery, Dataflow, Cloud Bigtable, and machine learning tools (the announcement didn’t specify which).
The point is to give Twitter employees who aren’t data scientists or engineers the ability to query its vast trove of data using SQL. These things have traditionally been done by technical staff who developed large custom processing jobs, the company said.
Twitter engineers have looked into moving all of its infrastructure to the cloud about five years ago and decided that such a move would be too disruptive. So they agreed on a more targeted approach, moving only parts that would benefit from cloud capabilities the most.
In a deal announced in 2018, Twitter moved two of the four types of Hadoop clusters it operates to Google Cloud: cold storage clusters and “ad-hoc” clusters, ones that run occasional one-off analytics jobs, Joep Rottinghuis, a senior engineering manager at Twitter, wrote in a blog post.
Now, Twitter is moving a third category of Hadoop clusters, its “processing clusters,” which run regularly scheduled production jobs and have dedicated capacity.
That leaves only the fourth category of Twitter’s Hadoop clusters running in the company’s own (actually leased) data centers. These are “real-time clusters,” which are the first point of arrival for data generated by users as they tweet, re-tweet, comment, like, share, and block each other.