Hulu surpassed the 6-million-paid-subscriber mark last April. The site is now accessible on 400 million internet-connected devices ranging from new consoles, Chromecast, to set-top boxes, such as Amazon’s Fire TV. Andres Rangel, the company's senior software development lead, spoke of the intricacies of scaling the infrastructure to support this user-base growth.
The application has to support a lot of users generating lots of data, as well as maintain sessions for these users across devices. In a three-way competition between Cassandra, HBase and Riak, Hulu turned to Cassandra as the database of choice to ensure high-quality user experience.
Hulu depends on the open source distributed database management technology to store and provide real time-access to subscriber watch history. Cassandra keeps Hulu's content infinitely scalable and always available.
“Two years ago, we decided to re-write our service,” Rangel said. The challenge was to make the service scale to keep up with user growth . The experience has to follow a user from device to device, saving the previous session and starting up a show where it was left off.
“The problem was we couldn’t scale the writes,” he said. “The boxes couldn’t keep up with the growing database demands, and it wasn’t easy to increase the hardware without a lot of effort. It wouldn’t scale.” Rangel had a small team of people, and they needed something that would scale, but wasn’t overly complicated to manage.
Hulu's primary Cassandra cluster consists of 32 nodes split between two data centers, one on the east coast and the other on the west coast. Its watch history keyspace contains several billion CQL3 rows with approximately 1TB of unreplicated data per data center.
It’s tapped into every time someone watches a video or uses Hulu for a recommendation. It's also used for saving sessions across devices.
Path to Cassandra
“We looked at HBase and Riak at first,” said Rangel. “Cassandra was an afterthought.”
Rangel says his experience with Riak was that it was able to scale, but performance wasn’t as great as with Cassandra. “We also needed it to be able to do range queries, and Riak didn’t do this at the time. Another problem was that no-one in the team was experienced with Erlang [the basis of Riak].” Rangel and his team exemplify a trend among modern IT teams, where a handful of people have immense responsibilities across massive scale.
While Riak was powerful, it didn’t quite fit Hulu’s needs because it wasn’t easy enough to use and didn’t fit the real-time needs.
The team then looked to HBase as the front-runner. “Hadoop instances take a lot of work to set it up,” said Rangel. "Hbase runs on top of hdfs, and the Name Node was a single point of failure. Also it was more complex to set up and maintain than Cassandra." There were some concerns with how Hbase handles failures (e.g., the team saw cascading failures take down all region servers).
When they turned to Cassandra they found a match. “With Cassandra, it managed to handle the load, it’s very reliable, it allows range queries without limitations, and it’s easy to maintain,” said Rangel. “It’s night and day compared to HBase.” The team had to do some hardware changes because Cassandra specs are different. Cassandra is optimized for SSDs, which improved performance. Rangel also said that Cassandra was better at replication.
Hulu does use a Hadoop cluster for long-term storage, but Cassandra was the right choice when it came to maintaining real-time access. “We played around with the new version of HBase for high availability,” said Rangel. “The problem is you still need a lot of babysitting. If you have an already existing Hadoop cluster, than HBase makes sense.”
There are now four other services running on Cassandra besides real-time access across devices. It handles social data from the user, some messaging, and you can now use your cell phone to send traffic to your connected device like a remote.
“If there are two things I’d like to say about Cassandra, it’s that we haven’t had any bad experiences with it, and it’s been better than expected,” said Rangel.