Dave Bermingham is Microsoft Cloud and Datacenter Management MVP at SIOS Technology, and David Klee is Founder & Chief Architect at Heraflux Technologies.
The first article in this three-part Industry Perspectives series examined the different perspectives Sys and DB Admins have regarding high availability, with storage area networks as an example of how these differences can create conflict. This second part dives deeper into potential conflicts involving the high-availability provisions for SQL Server, VMware and public cloud offerings.
As the dominant database for mission-critical applications, SQL Server naturally has its own HA provisions. The most robust of these is the Always On Availability Group feature found in the Enterprise Edition. The problem is: Licensing fees for Enterprise Edition are significantly higher than for the Standard Edition.
In a Windows environment, the Always On Failover Cluster Instance (FCI) feature in the Standard Edition integrates with Windows Server Failover Clustering (WSFC) to form the foundation of an affordable and acceptable HA solution for all but the most mission-critical database applications.
But the situation is quite different in a Linux environment where most distributions give IT departments two equally bad choices for HA: Either pay more for SQL Server Enterprise Edition to implement Always On Availability Groups; or struggle to make complex, do-it-yourself open source software configurations work well—something that can be extraordinarily difficult to do. Sys Admins prefer the former; DB Admins prefer the latter.
VMware presents a similar conundrum. VMware’s vSphere HA offering will appeal to Sys Admins based on its ease of implementation and operation. But as is often the case, the HA devil is in the HA details.
In a VMware or any other virtualized environment, the layers of abstraction complicate the way virtual machines (VMs) interface with physical devices, including in a storage area network (SAN) where the storage is also virtualized. These layers can adversely impact on performance, creating potential problems for DB Admins.
And there is another devil lurking within those layers. To facilitate full compatibility with certain SAN and other shared-storage features, such as I/O fencing and SCSI reservations, vSphere utilizes Raw Device Mapping (RDM) to create a direct link through the hypervisor between the VM and the external storage system. The requirement for using RDM with shared storage exists for any cluster, including those using SQL Server FCIs and/or WSFC.
Although RDM makes a storage device or subsystem appear to the guest operating system as a virtual disk file in a VMware Virtual Machine File System (VMFS) volume, it does not support disk partitioning. This makes it necessary to use “raw” or whole LUNs (logical unit numbers), and mapping is not available for direct-attached block storage and certain RAID devices. And because RDM interferes with VMware features that employ virtual machine disk (VMDK) files, DB Admins may be unable to fully utilize desirable features like snapshots, Virtual Consolidated Backups, templates and vMotion.
Even worse is how RDM exacerbates performance problems. When configured for maximum compatibility with shared SCSI and SAN storage (something Sys Admins prefer), database applications are unable to use performance-enhancing capabilities like Flash Read Cache (something BD Admins use regularly).
Only for environments leveraging the new Virtual Volume (VVOL) capabilities do the RDM requirements disappear. But even then, VVOL support is rare among storage array vendors, and the usage of VVOLs is still only supported on the newest version of vSphere.
In addition to encountering these same issues with SQL Server, Linux and VMware, public cloud services can have their own peculiarities that complicate creating highly available configurations. Examples of these Platform-as-a-Service (PaaS) limitations include a lack of support for the shared storage used in some HA solutions, master instances only being able to create a single failover replica, requiring backups be performed on the master dataset, failovers being triggered by zone outages only and not other common failures, and the use of event logs to replicate data, which create a “replication lag” that results in temporary outages during a failover.
While these limitations might be acceptable as part of a disaster recovery strategy for some enterprise applications, they are unacceptable for mission-critical database applications. The challenge, therefore, is to find one approach capable of working cost-effectively across all applications in public, private and hybrid clouds for both Windows and Linux.
Part 3 of this series will examine how one such approach, SANless failover clustering software, can accommodate the HA and other needs of both Sys and DB admins.
Opinions expressed in the article above do not necessarily reflect the opinions of Data Center Knowledge and Informa.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating.