Being able to quickly extract data from multiple sources for processing has become a crucial business capability. AirByte, a startup that's just about eight months old, is building open source solutions to make that extraction easier for enterprises. The company recently raised a $5.2 million seed round, with participation by major VC players, to go after the opportunity.
AirByte's goal is to take the pain out of building and maintaining the pipelines needed to carry data from sources such as data warehouses, data lakes, and databases to destinations including cloud data warehouses, like Amazon Redshift, Snowflake, and BigQuery, or to on-premises storage for local processing.
This is becoming increasing important today, as data is being collected and stored at every retal branch, factory, telco central office, cell tower, and so on. Much of that siloed data needs to be moved to an on-premises data center or a centralized cloud data center to be exploited through AI analytics or made available to accountants, human resources departments, or marketing organizations.
These pipelines are centered on connectors, which are software written for extracting data from the source or for loading it at the target destination. The connectors are specialized according to things like device type or workload, and if any of those things change in any significant way, the connector has to be rewritten or otherwise changed. That makes maintaining these pipelines very costly.
The pipeline process is known as "extract, load, transform," usually referred to as either "ETL" or "ELT," depending on the order in which the data is being moved and processed.
"We're building an open source data integration platform, focusing mostly on the EL part," John Lafleur, AirByte's co-founder and COO, told DCK. "We're helping you replicate data from any source, should it be APIs, databases, anything, to your data warehouse or databases as well."
The $5.2 Million In Seed Round
AirByte's seed round was led by the venture capital firm Accel, followed by 8VC, and the Y Combinator.
Individual investors also took part, including Calvin French-Owen, co-founder and CTO of the customer data platform Segment (who recently walked away from it with reportedly $3.2 billion in exit money); Charles Zedlewski, former general manager of the cloud data company Cloudera, who is now a partner at the private equity company Symphony AI; and Alain Rossmann, founder and chairman of the ML-as-a-service startup Machinify.
Two other investors, Auren Hoffman, CEO of the location data company SafeGraph, and Travis May, co-founder and CEO of the healthcare data platform Datavant, are both former CEOs of the SaaS data connectivity platform LiveRamp, where AirByte's co-founder and CEO Michel Tricot worked for more than five years, rising through the ranks from senior software engineer to director of engineering and head of integrations.
Five of his former LiveRamp coworkers are now part of AirByte's engineering team. Tricot told us that during at his tenure LiveRamp, he and his team built 1,000 data ingestion connectors and another 1,000 data distribution connectors that moved more than 100TB per day.
His co-founder Lafleur, who previously founded three other startups, is no stranger to the data integration business.
"At my first startup and my third startup we had to build ETL pipelines for one year," he said. "So, I wanted to solve that problem as much as him."
For an eight-month-old startup that's successfully raised more than $5 million in get-started money, the company doesn't seem to be in a great hurry to start signing up customers. So far, it doesn't even have a product that customers can buy.
For the time being, AirByte is offering its containerized connectors, ready for deployment in a cloud-native environment, as an open source community project, with software available for free, licensed under the permissive MIT license. This means other vendors can use the software in their own proprietary products, which seems to be fine with both Tricot and Lafleur, both of whom said in our conversation that they "want to become the standard."
Eventually the company will offer a proprietary enterprise edition that will include features such as hosting management, data quality protocols, privacy compliance with regulations such as the GDPR and CCPA (California Consumer Privacy Act), role and access management, and single sign-on. The proprietary software will also be issued with the source code available, which can not only help for compliance auditing, but for troubleshooting performance issues as well.
Further down the road, the company also plans to offer a hosted solution for teams that don't want to manage the connectors in their infrastructure themselves.
"There are already some companies that exists today [in the data connector space] that are closed source," Tricot said. "The thing is, these companies are limited in the number of connectors that they can ship, because they'll just focus on the 60 percent of connectors that are the most used and that's basically it."
With an open source process, he said, companies will be able to build any unavailable connectors they need using AirBytes specifications and contribute them back upstream to be maintained by AirByte and the community around its open source platform.