Traditionally, data ingress and egress challenges have been the Achilles’ heel of cloud-based data analytics. Google says it is solving that issue through a new offering, BigQuery Omni, that will make it possible to analyze data across multi-cloud and hybrid cloud infrastructure without having to move the data first.
Google announced BigQuery Omni on July 14, pitching it as “a flexible, multi-cloud analytics solution that lets you cost-effectively access and securely analyze data across Google Cloud, Amazon Web Services (AWS, and Azure (coming soon).”
In other words, the tool extends Google’s BigQuery data warehouse platform by adding support for data that is stored in clouds other than Google’s.
Analyze Multi-Cloud Data
It’s easy to see the nominal value of this type of offering: Traditionally, if you had a multi-cloud architecture that included data stored on each cloud, and you wanted to analyze that data comprehensively, you had two less-than-ideal options available for doing so.
One was to run separate analytics operations on each cloud. That would require you to use multiple data analytics tools, one for each cloud. It would also leave your data siloed, making it difficult to identify trends that stretched across all datasets on all clouds.
The other option was to move all of your data into just one cloud, then analyze it there. The drawback here is that it could take a long time if you have massive amounts of data to work with, given the bandwidth limitations involved in moving data from one cloud to another over the internet. In addition, because public cloud providers typically charge fees whenever you move data out of their clouds, consolidating multi-cloud data into a single location is not particularly cost-efficient.
BigQuery Omni offers a third approach: You can use a single platform -- BigQuery -- to analyze data in multiple public clouds at once. You don’t need to move the data first, or deploy different analytics tools for each cloud.
BigQuery Omni Limitations
That said, some data scientists, developers, and cloud admins might see limitations in the Omni offering.
For one, it only works with BigQuery. If you prefer other data warehousing platforms, like Amazon RedShift, you’re out of luck. In this sense, although BigQuery Omni might seem to be a move by Google to become friendlier toward other providers’ clouds, you could also interpret it as an effort to steal market share from competitors’ cloud-based data platforms. The story would be different if Google were making its own cloud storage compatible with third-party data warehousing tools, but it’s not.
I also wonder how many organizations there are that have significant amounts of data distributed across multiple clouds. Typically, if you build a multi-cloud architecture, you use one cloud for one type of service (like data storage) and another for another one (like compute). Storing some data in one cloud and the rest in another complicates management and makes security and compliance more difficult, because you have to manage these needs separately for each cloud. For that reason, it would be rare for a cloud architect to say, “Hey, let’s store a third of our business-critical data in AWS, another third in Azure, and the rest in GCP.”
Granted, there are some organizations out there whose data is siloed across multiple clouds due to legacy systems that make it hard to centralize data in one cloud, or simply as a result of poor architectural planning. But overall, I suspect the number of companies with a clear use case for BigQuery Omni in its current form is limited.
Public Cloud Data Warehousing for Private Data Centers?
There is, however, another potential use case for BigQuery Omni that is more interesting than analyzing data across multiple public clouds: Extending public-cloud data warehousing into private data centers.
BigQuery Omni is powered by Anthos, Google’s Kubernetes-based solution for unifying workloads that are spread across multiple clouds or a hybrid cloud architecture. With Anthos, you can use the same management interface and tooling to deploy a workload even if the underlying infrastructure spans a private data center and a public cloud, or more than one public cloud.
It’s easy to see how BigQuery Omni and Anthos go hand-in-hand: Anthos provides the abstraction layer that Omni uses to manage data stored on any public cloud.
But because Anthos can also integrate with private data centers, it could make it possible to use Omni in conjunction with data stored in a private data center, or with data hosted in a colocation facility. Although Google’s announcement didn’t mention this type of use case, there’s no clear reason why it cannot be supported as well, given Anthos’s support for hybrid cloud architectures.
And there would seem to be more logic behind using the tool for this purpose. Right now, a major limitation of BigQuery and similar cloud-based data warehousing solutions from public cloud vendors is that they require your data to be in the public cloud. If you want to keep your data in a private data center -- which you may well have a good security-, compliance-, or performance-related reason to do -- you can’t use a tool like BigQuery to interact with it. If Omni extends support to private data centers through a hybrid cloud model, however, that would change, because the tool could be used in conjunction with data stored in a private data center.
There are probably many more companies out there with data inside private data centers that would find a solution like this useful than there are organizations that have large volumes of data spread across multiple public clouds.
For now, this isn’t a use case Omni enables. The tool “does not yet support on-premises (and therefore hybrid) data sources,” a Google Cloud representative told Data Center Knowledge in an email. BigQuery Omni users will have to settle for integrating just their data stored on GCP and AWS, given that those are the only clouds that the tool supports currently. (Azure support remains in development, presumably because Anthos itself does not yet offer full support for Azure.)
But perhaps sometime in the relatively near future, you’ll be using BigQuery to help analyze data in your private or colocation data center, too.