How Does Google Decide Where to Build Its Data Centers?
Google chooses the locations of its data centers based on a combination of factors that include customer location, available workforce, proximity to transmission infrastructure, tax rebates, utility rates and other related factors. Its recent focus on expanding its cloud infrastructure has added more considerations, such as enterprise cloud customer demand for certain locations and proximity to high-density population centers.
The choice of St. Ghislain, Belgium for a data center (which opened in 2010) was based on the combination of energy infrastructure, developable land, a strong local support for high tech jobs and the presence of a technology cluster of businesses that actively supports technology education in the nearby schools and universities.
A positive business climate is another factor. That, coupled with available land and power, made Oklahoma particularly attractive, according to Google’s senior director of operations when the Pryor Creek site was announced. In Oregon, the positive business environment means locating in a state that has no sales tax. Local Wasco County commissioners also exempted Google for most of its property taxes while requiring it to make a one-time payment of $1.7 to local governments and payments of least $1 million each year afterward.
Proximity to renewable energy sources is becoming increasingly important, too. Google is strategically invested in renewable resources and considers its environmental footprint when siting new data centers.
Do Google Data Centers Use Renewable Energy?
Google buys more renewable energy than any corporation in the world. It 2016 it bought enough energy to account for more than half its energy usage. In 2017 the company expects to completely offset all its energy usage with 100 percent renewable energy. To do that, Google has signed 20 purchase agreements for 2.6 gigawatts (GW) of renewable energy. This means that, while renewable energy may not be available everywhere or in the quantities Google needs, Google purchases the same amount of renewable energy as it consumes.
Google also has committed $2.5 billion in equity funding to develop solar and wind energy that is can be added to the power grid throughout the world. That willingness to fund renewable projects is in an attempt to gradually expand the renewable energy market in terms of available, as well as by changing the ways renewable energy can be purchased. In the process, using renewable sources becomes easier and more cost effective for everyone.
Sustainability is a focus inside data centers, too. The St. Ghislain, Belgium, data centers were Google’s first to rely entirely on free cooling. And, that facility’s on-site water purification plant allows the data centers there to recycle water from an industrial canal rather than tapping the region’s fresh water supply.
How Much Energy do Google Data Centers Use?
Data center energy use represents a sizeable chunk of the 5.7 terawatt hours its parent company, Alphabet, used in 2015. With an average PUE of 1.12 (versus the industry average of 1.7), Google says its data centers uses half the energy of a typical data center. A growing portion of this is renewable, supplied through power purchase agreements.
What Kind of Hardware and Software Does Google Use in Its Data Centers?
It’s no secret that Google has built its own Internet infrastructure since 2004 from commodity components, resulting in nimble, software-defined data centers. The resulting hierarchical mesh design is standard across all its data centers.
The hardware is dominated by Google-designed custom servers and Jupiter, the switch Google introduced in 2012. With its economies of scale, Google contracts directly with manufactures to get the best deals.
Google’s servers and networking software run a hardened version of the Linux open source operating system. Individual programs have been written in-house. They include, to the best of our knowledge:
- Google Web Server (GWS) – custom Linux-based Web server that Google uses for its online services.
- Storage systems:
- Colossus – the cluster-level file system that replaced the Google File System
- BigTable – a high performance NoSQL database service for large analytical and operational workloads
- Spanner – a globally-distributed NewSQL database
- Google F1 – a distributed, relational database that replaced MySQL
- Chubby lock service – provides coarse-grained locking and reliable, low-volume storage for loosely coupled distributed systems.
- Programming languages – C++, Java and Python dominate
- Indexing/search systems:
- Caffeine – a continuous indexing system launched in 2010 to replace TeraGoogle
- Hummingbird – major search index algorithm introduced in 2013.
- Borg – a cluster manager that runs hundreds of thousands of jobs from thousands of applications across multiple clusters on thousands of machines
Google also has developed several abstractions that it uses for storing most of its data:
- Protocol Buffers – a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more
- SSTable (Sorted Strings Table) – a persistent, ordered, immutable map from keys to values, where both keys and values are arbitrary byte strings. It is also used as one of the building blocks of BigTable
- RecordIO – a file defining IO interfaces compatible with Google’s IO specifications
How Does Google Use Machine Learning in Its Data Centers?
Machine learning is integral to dealing with big data. As Ryan Den Rooijen, global capabilities lead, insights & Innovation, said before the Big Data Innovation Summit in London (March 2017), “Most issues I have observed relate to how to make this data useful…to drive meaningful business impact.” Therefore, in addition to using machine learning for products like Google Translate, Google also uses its neural networks to predict the PUE of its data centers.
Google calculates PUE every 30 seconds, and continuously tracks IT load, external air temperature and the levels for mechanical and cooling equipment. This data lets Google engineers develop a predictive model that analyzes the complex interactions of many variables to uncover patterns that can be used to help improve energy management. For example, when Google took some servers offline for a few days, engineers used this model to adjust cooling to maintain energy efficiency and save money. The model is 99.6 percent accurate.
In July 2016, Google announced results from a test of an AI system by its British acquisition DeepMind. That system had reduced the energy consumption of its data center cooling units by as much as 40% and overall PUE by 15%. The system predicts temperatures one hour in advance, allowing cooling to be adjusted in anticipation.
Does Google Lease Space in Other Companies’ Data Centers?
Yes. Google leases space from others when it makes sense. Not every Google data center has its name on the door. Instead, the company uses a variety of strategies to meets its data center needs. It leases space for caching sites, for example, and uses a mixed build-and-lease strategy for its global cloud data center rollout.