HA is not enough – The Demand for 24 x 7 Continuous Access with Solid State Storage

HA is not enough – The Demand for 24 x 7 Continuous Access with Solid State Storage   

Storing data on hard drives with RAID protection has been the norm for enterprise data storage for a very long time. High availability (HA) means keeping the hard drive based system up and running even if a component within the system fails.  These components include the network interface, hard drives, power supplies and fans.

But in the new 24×7 world of millennials, HA is not enough. The demand for 24×7 Continuous Access is forcing storage system suppliers to have “100% system level redundancy”. In other words two or more 100% fully synchronized identical data sets each with independent controllers and network connections.

System availability is measured in “9’s”. The number of 9’s indicates the percentage of availability.    Standard storage systems with RAID can guarantee 99.9% of availability or “3 nines”. The current standard for high availability is 99.99% or “four nines 9’s”.  The current standard of continuous access is 99.999% or “five nines”.

(Below is a chart of availability downtimes)

ha-table-1

Note on six nines – 99.9999% availability. 

This generally refers to mirrored independent systems in different geographies. This type of infrastructure is expensive if the interconnect between systems is not within a private campus LAN or MAN and requires purchasing lines from commercial carriers.  Industries that require this type of continuous access would include banking, national security and defense.  They would maintain continuous access even if an entire data center experiences an outage or lose its network connection.

99.9% available storage systems generally offer only disk level redundancy using RAID. The next level, 99.99%, has RAID and also includes redundant power supplies, network interfaces and cooling fans. Five nines, also known as the “holy grail” of continuous access, moves beyond component redundancy and adds 100% redundant data where there are two or more copies of the data being stored, in real-time, on two or more independent RAID sets. 99.999% systems also include 100% redundant system controllers to manage the client requests and RAID supported data sets. A 99.999% system is basically two independent, synchronously mirrored systems within the same chassis. The only way a 99.999% system could fail is if the backplane or controller interconnect were to fail.

Hard drive storage system manufacturers were faced with a huge problem when developing 99.999% systems – hard drives were too slow. To give you an example, in a two controller configuration, with synchronous data mirroring between the two controllers and RAID

sets, the host would write to one controller within the system and that controller had to write the data to its hard drives and send a copy of the write to the second controller. The second controller would write the data to its hard drives and then acknowledge the write back to the first controller before a complete write acknowledgement could be sent back to the host. Although the data remained consistent between both systems and continuous access was greatly improved, the latency in writing to hard drives was far too slow.

Some hard drive systems use memory to cache the writes. But during spikes or continuous heavy workloads, the cache is quickly overrun with write and read commands and becomes slow and unusable.

Hard drive systems are simply too slow to reliably support real time (synchronous) data mirroring across two or more storage controllers each with their own set of hard drives and data sets. To make 99.999% availability practical, especially under heavy workloads, a much more responsive storage technology is needed.

Solid State Drives and All Flash Storage Systems were the Answer

SSDs have been around for over 18 years and have become one of the most trusted technologies for storing data. SSDs are found in all environments as the preferred data storage technology when performance is critical.

In contrast to traditional hard drives which can handle about 300 I/Os per second, SSDs are able to handle up to 450,000 random I/Os per second. Solid state drives are also more energy efficient, consuming only one-half to one-third of power compared to HDDs. They also have extremely low latency and can deliver over 1000 times more I/Os per second, greatly improving operational efficiency by transacting substantially more client requests in a fraction of the time.

“SSDs – They will continue to rapidly replace HDDs into PCs and notebooks, and up to high-end storage systems. Without moving parts, they are more and more reliable, much faster, now even offering more capacity than HDDs in smaller form factors —-,  StorageNewsletter, January 9, 2017)

All flash arrays, sometimes referred to as Solid State Arrays (SSAs) are magnitudes faster than hard drive based systems. The typical write latency for an enterprise hard drive storage system is roughly 10ms (millisecond – thousandth of a second) but for a Solid State Array, the write times are measured in microseconds. SSAs have been tested to have a latency of approximately 50μs (microsecond – millionth of a second),   200 times faster on writes when compared to a similar hard drive system.

With its amazing low latency, solid state is the only practical technology for delivering real time 99.999% continuous data access under any workload. Solid state can deliver 99.999% availability across two synchronously mirrored independent controllers and mirrored data sets without compromising performance.

Note on the Interconnect between Controllers and Mirrored Data Sets

In addition to using solid state, it’s also important to use a low latency interconnection between the independent controllers. Most 99.999% mirrored solutions use GbE to perform synchronous mirroring between controllers and data sets. But there is a better choice.

InfiniBand (IB), is a computer-networking communications standard used in high-performance computing. IB features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also utilized as either a direct, or switched interconnect between servers and storage systems. InfiniBand is the clear choice for interconnecting mirrored controllers and data sets. As you can see on the chart below IB is magnitudes faster than GbE.

ha-table-2

With IB, interconnect latencies are greatly reduced by a factor of 10x.

SSD storage systems combined with IB overcome the challenge of synchronous mirroring across multiple independent controllers and mirrored data sets. Clients and data center managers are ensured 99.999% continuous access without severely impacting overall system responsiveness under different workloads.

Writer: Zophar Sante, Business Development, BiTMICRO Inc.

Date: 1/5/2017