Data Portability: Shortcomings of Containers and PaaS

Luke Marsden is the founder and CTO of ClusterHQ. Prior to this, he co-founded Digital Crocus and was an engineer at TweetDeck Inc.

Support for portable and resilient data volumes is a missing piece of the puzzle for Docker, and a significant challenge for PaaS offerings as a whole. Why should the heart of most applications live outside the platform?

By bringing data volumes back into the center of the application architecture, more and more workloads will be able to take advantage of the portability Linux containers provide.

To do this a production-grade solution would need to do the following:

Enable applications and their data services to automatically scale (with disk, network, CPU load, etc.)
Eliminate single points of failure in our architectures, even where those single points of failure are whole data centers or cloud providers
Reduce total cost of ownership and reduce the complexity of the platforms we're deploying/building so we can manage them easily

Hidden inside these requirements are some really hard computer science problems. Current PaaS platforms and container frameworks don't handle these requirements very well. In particular, the current approach to both PaaS and containers is to punt the hard problem of data management out to "services," which either ends up tying us into a specific cloud provider, or forces the data services to get managed orthogonally to the stateless application tier in an old-fashioned way. What's more, having different data services in test, dev and production means that we violate the principle that we're working hard to establish: real consistency of execution environments across test, dev and production.

In practice, the application (as in a distributed application, across multiple containers, across multiple hosts) should include its dependent data services, because they are an integral part of their execution environment. The status quo — with data services in one silo and the scalable app tier in another — is a radically sub-optimal solution.

It's possible to develop something more like Google's data management layer (which includes the concept of a number of replicas) which we believe should become embedded firmly within our container platforms ("orchestration frameworks"), in order to capture both the stateful data tier (left) and the stateless app tiers (right) in our applications.

What we have vs. what we need

Green: What Docker can capture today

Green: What we believe the entire application should consist of
Red: The parts that need to be captured

Consistently managing containers: challenge < opportunity

To deliver on the promise of infrastructure as code, and portability of entire applications we need a way of safely and consistently managing the stateful as well as the stateless components of our apps in an ops-friendly way, across dev, staging and production environments. Managing stateful and stateless containers in a consistent way simplifies operations by unifying what would otherwise be at least two systems into one.

In a cloud infrastructure world it's necessary to do this with unreliable instances and effectively ephemeral "local" storage EBS volumes have a failure rate which, at scale, you have to plan for. Shared storage, e.g. NFS, doesn't work well in the cloud and introduces a single point of failure.

We should be able to treat our data in the same way we treat our code: cheaply tag and branch it, and pull and push it around among locations. This allows us to become more our agile with our data.

Managing containers with persistent state is a significant problem, and one that must be solved for Docker to scale and the full promise of containerization to be realized. Containerized databases are not suitable for production workloads without solutions for data migration, cloning and failover. Developing these solutions presents some of the greatest challenges, and opportunities, in the Docker ecosystem and the PaaS market today.

Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.

Comments

Plain text