Popular open source software for large cluster management Apache Mesos has been updated with new functionality. The 0.23.0 release has new features around managing resources at large scale, expanding its capabilities in an automated data center.
Mesos plays an important role in modern data centers. Fewer people manage an increasing number of servers at companies that provide services at web scale, acting as conductors of a distributed systems symphony. Mesos abstracts disparate resources, bundling them into virtual pools. The new features are around better predictions and automation of resource utilization, including oversubscription, persistent volumes, and dynamic reservations.
Mesos is used to power the backend of some heavy-duty services, such as Apple’s Siri and Twitter. Mesos is credited for killing the “Fail Whale” that was often seen in Twitter’s early years. Mesos is also the kernel of Mesosphere’s Data Center Operating System (DCOS), which gains the new functionality as well.
The new oversubscription feature allows taking advantage of temporarily unused resources to execute best-effort tasks on lower-priority jobs, such as background analytics and video or image processing.
“High-priority user-facing services are typically provisioned on large clusters for peak load and unexpected load spikes. This means that for most of the time the cluster is under-utilized,” wrote Mesosphere data center application architect Michael Hausenblas. “In the context of the DCOS, this means that you will be able to get even more bang for your bucks: more applications with the same hardware investments or lower TCO.”
Oversubscription adds two new slave components: resource estimator and a quality of service controller. The resource estimator hooks into resource monitor to get usage statistics and identify the amount of oversubscribed resources. The QoS controller informs when corrective actions need to be made.
Mesos also now provides a mechanism to create a persistent volume for stateful services. Previously, distributed file systems, message queues, and other functions had to use out-of-band techniques.
“With persistent volumes, these stateful services have now become first-class citizens, making the life easier for DCOS service developers and providing a more robust experience for DCOS end-users,” wrote Hausenblas.
The 0.23.0 release also includes experimental support for dynamic reservations. Dynamic reservations means a human operator doesn’t need to specify fixed, static reservations on startup – instead resources can be reserved as they are being offered without a slave restart.
“This new feature means for DCOS operators less manual interventions when business requirements change priorities concerning the cluster usage,” wrote Hausenblas.
Other features include SSL encryption in Mesos between master, slave, and services, and per-container network isolation. "Network isolation prevents a single container from exhausting the available network ports, consuming an unfair share of the network bandwidth or significantly delaying packet transmission for others," wrote Hausenblas.