Algolia gives a website its own Google-like search engine, with microsecond-speed response and search-as-you-type capabilities.
It has a lot of competitors, including giants like Microsoft Azure and Amazon Web Services, as well as a multitude of startups. There is also the open source Elasticsearch project. Competition is tough, and speed is perhaps the second most important search performance factor that makes a product stand out after quality of the results.
And speed is what brought Algolia, which has offices in Paris and San Francisco, to San Jose, California, where it recently took some space at Equinix’s SV3 data center through the hosting company LeaseWeb. This is the startup’s first data center on the West Coast. Its servers also live on the East Coast, as well as in Europe and Asia.
“We have an important user base [in California] and we want to reduce latency of search as much as possible,” Julien Lemoine, Algolia CTO and co-founder, explained.
Algolia’s entire hardware and software stack is optimized to keep that response time within a few milliseconds, and once you optimize that, the only way to make it even faster is to get physically closer to the users.
LeaseWeb is also hosting the company’s infrastructure on the East Coast and in Asia. In Europe, Algolia is using OVH. It is currently using only half a rack in San Jose, but scales weekly, according to Lemoine. “We already have several racks in US-East,” he wrote in an email.
The company has an interesting way of scaling its infrastructure. It never adds capacity in one location, always deploying servers in sets of three, across three different availability zones. The three servers are perfect clones of each other.
One Algolia bare-metal host server has:
- Intel Xeon E5-2643v2
- 128G of RAM
- Three high-endurance 400G SSDs, (1.2TB per host)
- 1Gbps connectivity in and out
Algolia developed its own protocol that replicates data across the hosts and keeps them synchronized using the Raft consensus algorithm. Raft is also used by etcd, a key value store used for keeping servers in a cluster synchronized. Built by CoreOS, a hot San Francisco web-scale infrastructure startup, open source etcd is part of Google’s Kubernetes (its open source Docker container management software) and Pivotal’s open source Platform-as-a-Service system Cloud Foundry.
All customer data is stored in memory and replicated across all three hosts, but there is enough SSD capacity reserved for each customer to accommodate 10 times the size of their data for re-indexing.
Algolia’s secret sauce, its search engine, sits on an nginx (pronounced “engine-x”) web server, running as a C++ module. “All the backend technology was developed from the ground up with performance in mind,” Lemoine wrote.