Microsoft-Powerset: All About the Data Centers?
June 30th, 2008 By: Rich Miller
Venture Beat is reporting that Microsoft may acquire the semantic search company Powerset for about $100 million. Both companies are mum about the reports, but a Microsoft-Powerset deal could solve the primary roadblock to real-world deployment of semantic search – the extraordinary amount of computing resources required to build a semantic index of the entire web.
One of Powerset’s challenges is data center resources, and whether it can afford to buy or rent the computing power needed to apply its indexing technology to the entire World Wide Web. From its inception, Powerset has acknowledged that its approach to “natural language” contextual search requires much more horsepower to compile than the keyword driven indexes built by Google, Yahoo and Microsoft. Here’s an explanation from our conversation last year with Powerset co-founder Steve Newcomb (who has since left the company):
We capture each sentence rather than just the key words in that sentence, so our index size is many times larger than a keyword index. That has a couple of large impacts on the data center. To parse a single sentence is much more computationally expensive for us than for Google. Because of that, we have to create a massively parallel infrastructure on the processing side. We need a lot more compute power than any keyword search engine, so we need more powerful machines, and the cost of our data center is potentially a very significant cost.
Powerset has its own data center infrastructure, but turned to Amazon and its EC2 utility computing service to build its web index.
When Powerset began using Amazon EC2 in November 2006, founder and CEO Barney Pell noted that building traditional data center infrastructure would be “a significant barrier to seriously competing with companies like Google and Yahoo.” In mid-2007, Newcomb projected that using Amazon would allow Powerset to complete building its index by early 2008, after which the company would use its own data centers to maintain the index – which requires less computational horsepower than building it.
But when Powerset launched in May, it did so with a proof-of-concept index of Wikipedia, rather than the web-wide search originally envisioned by its founders.
Was this all Powerset was able to come up with after more than 18 months of indexing using Amazon EC2? The company says it still intends to eventually offer a semantic index of the entire web, but hasn’t indicated how much of that task has been completed (beyond Wikipedia). But the progress to date suggests that it will be difficult for Powerset to complete the task quickly on its own – hence the rumors that the company is seeking suitors.
With its enormous data center infrastructure, Microsoft has the resources to build a semantic index of the entire web using Powerset’s technology, and to do so faster than anyone else.