SAN vs OpenSource

san_vs_eboshi_by_crazymonkeyx-d3fhsn6

In this blog post I would like to discuss SAN (scale-up) vs Scale-out storage but better yet, SAN (proprietary) vs Open Source option available, and why should you care?

Am I saying that SAN is all closed source?

Yes, that’s exactly what I am saying. There is are some tool-sets that one may want to use to build a SAN system (like ZFS), however there is nothing that you can deploy and consider it to be the SAN solution. With that we are back into proprietary space where big guys rule, and little guys… not that much.

So what is going to take over proprietary SAN’s?

In my opinion the take over will be done by the storage that is easy to use, easy and inexpensive to scale while provides all the features and stability of the traditional enterprise storage. Perhaps asking for too much, but I believe we are not too far away from that dream to come true.

Are we there yet?

Probably four years ago I would say No, however now I believe we are getting closer and closer, and specifically what impressed me the most is Ceph distributed storage (block storage to be more specific). It is open source, therefore driven by the community, however if you want to get some support for it from the authors, then there is Inktank available to help you there. Great group of experts in the field of storage btw! But my point here is not to discuss Ceph, because there are plenty resources for you to access to understand it, plus my prior post Ceph vs Gluster. Let’s stick with the topic.

SAN vs Open Source or Scale-out (distributed systems) vs SAN  (scale-up) especially begin to shine once companies start rolling out its virtualization strategies (YES  surprisingly, not everyone there yet! ;) ) and cloud strategies and I will talk more about technical and performance advantages later in this post.

So why should you care?

The bottom line is Total Cost of Ownership (TCO). Now let’s dive deeper to understand what really matters to the customers.

Hardware SAN solutions vs Ceph (as an example and great representative of the open source alternative for those who looking forward)

To answer this question, first let’s take a look at the key features of Ceph (very high level):

  • Scalability
    • Ability to scale storage and compute resource of the storage heads horizontally (that’s the key differentiation from SAN/vSAN)
  • 3 solutions in 1 cluster – Block, Object, File
  • Unique CRUSH algorithm and its advantages for scale out
  • Integration with OpenStack/CloudStack
  • Thin-provisioning
  • Protection (snapshotting and cloning) besides the fact that it is a clustered storage (i.e. by definition is highly available, no single point of failure)
  • Comprehensive  caching capabilities
  • Tiering (Due. Feb. 2014) – While Ceph allows tiering to be configured through the CRUSH map, allowing specific data types to be hosted on different types of hardwares, this feature will allow migration of data from one hardware pool to another based on a set of policies. – See more at: http://www.inktank.com/about-inktank/roadmap/#sthash.wHwK6Bk8.dpuf
  • Multi-Datacenter capabilities for H/A (released recently in Dumpling)

As you can see it very much resembles the features that most SANs provide minus scale-out (will be covered later in this blog). As a result the value prop. is really all about money, and therefore TCO. Such solutions can provide a very significant reduction in cost.

Price of the feature

“I have SAN but cannot afford HA”

Yes software features of the SAN provided to a customer via the enabler are extremely expensive. There are many enterprise and cloud providers that use SAN. But also many realize that while SAN itself is very reliable (redundant storage subsystems, dual ports for the target exposed, etc.) it is a single point failure. In addition on the bigger scale Data Center itself is a single point of failure while building the cloud infrastructure that must provide high-availability of the hosted resources. To address both of these issues there are features of SAN that provide replication of the data between the SAN instances (even between the data centers), however it is extremely expensive and many providers simply cannot afford it. As a result when the decision comes to significantly reducing the cost while adding all the features that customer has plus the ones that were out of reach, it becomes a simple decision.

Price of scale-out

Yes, there are software SAN solutions where the price point is less and perhaps some of the features are similar, but at that point it is a price of scalability (which becomes even more extreme when it comes to hardware SAN vendors).

  • Compute Capacity vs. Storage Scale-out: When customers build “cloud” solutions (which could be just a subset of such in a form of vitualiation) scalability of the infrastructure becomes critical and in a lot of cases it involves data and questions around scalability of SAN. So, one may have a LUN(s) exposed out of the SAN and overtime the compute capacity of the storage subsystems will get fully consumed while the storage may still be available. What’s next? Certainly adding another SAN is a possibility but it is expensive (keep in mind that you also have to make it HA), however you still have workloads running on the existing SAN that is over its compute capacity, so you have a storage system that is under provisioned, i.e. underutilized for its purpose (i.e. storage). That contributes to significant latency around SAN.
  • Which leads to another problem of SAN – latency: The time it takes for data to travel across the SAN and network to the compute nodes where the CPU and RAM is doing all the work is significant enough to dramatically affect performance. It’s really not a case of bandwidth; it’s a problem of latency. For this reason SANs create an upper boundary level to storage performance by virtue of the time it takes data to move back and  forth between the compute nodes and the SAN.

So what is the fundamental difference between Scale-out (like Ceph) and Scale-up (SAN) storage?

Both of the above use cases outline the issues with SAN that right on the money of the customers, and makes SAN an old solution to a new problem and as a result not a great fit for cloud infrastructures. If the cloud is to deliver high performance, reliable and resilient storage, the cloud needs to move beyond the SAN.

Distributed storage is clustering storage technology that provides massive scalability, and ease of growth.  So, it speaks to the fundamental difference in scale-up architecture in SANs (limited number of controller heads with large number of disks “under it” scaling up) to the scale-out of the cluster.  Scale up will inherently slow down with static number of controllers with ever increasing number of disks below.

The scale-out approach maintains ease of scale out for growth – but also maintains performance since the performance scales out as the cluster grows maintaining the performance at any scale.  Many networked controllers maintaining consistent controller-to-disk ratio (performance) as it scales out.

Tiering feature allows an SLA of performance for high I/O loads and ILM functionality with policy driven migration from Tier 0/1 SSD/15K down to 7.2K as the data is less accessed over time.

Let’s dig a bit deeper…

For those who would like to explore beyond the mentioned above are some other points that unwrap distributed storage vs SAN:

  • Distributed block storage takes each local storage system and, in much the same way as RAID combines multiple drives into one single array. It combines each storage/compute node into one huge array that is cloud wide. Unlike a SAN, management of a distributed block storage array is federated so there is no single point of failure in the management layer. Likewise any data stored on any single node is replicated across other nodes completely. If any physical server were to fail, there would be no loss of data or access to data stored on that machine, virtual machines and/or hypervisors would simply access the data from other nodes in the cluster. Essentially there is no single points of failure in storage, delivering a high availability solution to customers.
  • Distributed block storage provides ability to create live snapshots of drives even if they are in active use by a virtual server. Rollbacks to previous versions in time are also possible in a seamless way. In essence backups become implicit in the system through replication with the added convenience of snapshots.
  • Concentrating on the performance side (which I have mentioned earlier), distributed block storage has ability to spread the load from a heavy use virtual drive across multiple storage nodes in a cluster. Whereas currently local storage means the load for a particular drive can only be spread within one RAID array, distributed block storage spreads the load from any one drive across a great many servers with separate disk arrays. The upshot is that drives in heavy use have a much more marginal impact on other cloud users as their impact is thinly spread across a great many physical disk arrays. The key lesson here is that in the future, cloud storage will be able to deliver a much more reliable, less variable level of performance. This will make many customers happy who currently suffer from wide variations in their disk performance.

Conclusion

In this post I have explained the fundamental difference of scale-out storage (Ceph as representative of such) and Scale-up (SAN) at the technology level as well as when it comes to business, which significantly reduces the TCO which is important to the customers as they start strategizing and/or building the next generation of the scalable infrastructures that must be very competitive in terms of the feature set it provides and most importantly the cost.

As always drop a note on your thoughts, visions, strategies, etc.

Leave a Reply