SSD or No SSD? That is the Question

Wai T. Lam is co-founder and CTO of Cirrus Data Solutions.

If Shakespeare were writing plays today, he would probably pen a drama or perhaps a comedy about the imbroglios storage administrators get tangled up in every day. The debates over whether to implement SSD for storage are great plot material.

As SSD technology continues to drop in price every few months, many administrators see these disks as viable replacements for the spinning variety. The main advantage of SSD is its superb performance. Even with all the promises of SSD, reality tells us nothing changes overnight in the commercial world of technology, especially in enterprise SAN storage.

For proof, we do not need to look further than the history of tape. In my previous life, the virtual tape libraries (VTLs) we built were bestsellers; yet, ironically, one of the most important sales features of a VTL was its ability to move data to physical tapes.

For all the putative magical properties of SSD, users still face many logistical considerations. The following are just a few examples:

What to do with the vast infrastructure of existing storage technologies?
What is the cost and effort required to migrate all the data?
And, that most nightmarish question: What if I switch over and the application performance does not increase?

Clearly, the allure of SSD performance is irresistible for people looking to improve the performance of their systems. There is plenty of compelling data proving SSD can be very effective. For those who are ready and can afford to switch over, SSD is a very promising paradigm. On the other hand, for those who lack the luxury of switching immediately, the best way to get a taste of SSD’s advantages — while avoiding a frantic scramble with a new infrastructure — is through using a small amount of SSD to accelerate spinning disks (i.e., caching).

This way, without committing to a huge expenditure and a gargantuan effort, one can test the waters before jumping in.

This sounds good, but embracing SSD caching is easier said than done, as cache comes with its own labyrinth of different issues, namely:

Installing SSD caching in the application servers will introduce many risks simply because there will be new hardware and software.
Factoring in the downtime required.
Typically, the SSD cache data cannot be shared.
Adding SSD caching will be costly, and will likely involve vendor lock-in.
Not all storage systems provide the cache option.
Inserting a cache appliance into a SAN sounds better — but this raises the question of "how much change will be required to accommodate it?"

I think centralized caching is the most palatable option, particularly if the cache appliance can be inserted easily into a SAN environment. This solution will make SSD storage available to all applications in different hosts, without forcing the storage administrator to replace the entire storage system — and without costing too much.

Yet burning questions remain. What impact will caching have on an already convoluted SAN environment? Will it disrupt production? What if the performance is not better? How does one know whether the cache scheme is working? Is it possible to undo negative changes made to the storage environment?

Pondering all those questions, we can start to create a dream specification for a centralized cache appliance for SAN. The following points are a good start:

A centralized cache appliance should provide a very large amount of SSD storage as cache — at least 10 percent of the existing storage.
It should be transparently inserted into the storage links, such that nothing in the SAN environment needs to be changed, including LUN masking, zoning, application host configurations, and so on.
The appliance should be removed as easily and transparently as it is inserted. This is especially important if one finds the system is not conducive to caching.
It allows I/O traffic to be analyzed in detail, including the individual paths, initiators, targets, and LUNs — and it delivers complete historic profiles.
The appliance enables individual hosts or LUNs to be identified and selected for caching.
It provides detailed I/O access and data read/write patterns in real-time and over time, and clearly describes the reasons for cache hits and misses. All of this functionality helps pinpoint the exact amount of cache needed for each LUN.
The appliance should be highly available, without a point of failure.
The cost of the appliance should be substantially lower than switching existing storage to all-SSD.

The bottom line is that the cache appliance should just plug in with minimal effort. Whether the application performance is improved or not, the appliance must provide definitive information on why the cache is efficacious or why it is not.

Based on this information, users will at least have a clear direction on what needs to be done. It may very well turn out that all one needs to do is to balance the data paths a bit, or redistribute the load of certain LUNs to eliminate bottlenecks. And if the appliance works out well, one may use it strategically — to defer switching over to SSD by methodically planning the best moves.

With today’s advanced technologies a hard-working cache appliance (as described above) should be readily available. Such a device would remove the Hamlet-like angst storage administrators feel about “SSD or no SSD.”

Comments

Plain text