Choosing Data Storage - A difficult dance

IT has come a long way in the past 15 years, and definitely has advanced into the realm of commodity service. But there are still complexities under the hood of this commodity service. One of the most underestimated in complexity is data storage - it is taken for granted by everyone. For example, i frequently talk to a high ranking manager in a software company and he constantly states that all that is needed is another disk.



At the end of the day, data storage is very far from simple. Every organization needs to provide storage service for it's requirements. But storage is not only capacity, and one must be careful when choosing the appropriate solution for storage. There are three basic options at the moment:

  • Cloud storage services
  • Open Source based storage systems
  • Commercial enterprise storage systems

We will evaluate each service from the following key parameters of a storage system

  • Capacity - The first (and usually only) thing we think about when we talk about storage - and the easiest to achieve. Regardless of option for data storage, capacity is upgradeable. In open source storage systems which are based on commodity hardware, upgrades are limited to the abilities of the host server/box. The enterprise systems are much more upgradeable, but at high costs. For a cloud storage provider, capacity upgrade is nearly infinite (at least on paper). It is wise to plan ahead and consider whether future ability will support your requirements.
  • Input/Output Operations per Second (IOPS) - The usually forgotten and very difficult to assess parameter, but nonetheless very important. The IOPS should present the amount of operations that the system can perform on a storage within a time-frame of 1 second. But since read and write operations on a storage can vary (sequential or random, read or write, even there are front-end and back-end IOPS when using RAID configurations). Cloud storage services do not publish IOPS, Enterprise manufacturers always publish the IOPS number that is most beneficial to them and the open source solution mostly leaves the IOPS to the builder of the system. In any case the end result is, DO NOT TRUST THE NUMBERS. There are some nice estimation calculators online, like wmarow's iops calculator, but use them only for reference. The smart solution is to test the storage service in a configuration as close to the one you wish to use, and assess whether performance is acceptable.
  • Access Bandwidth - This is not disk bandwidth, which is calculated via the IOPS. The access bandwidth is the bandwidth between the server and the storage itself. Naturally, you want this to be as high as possible. For enterprise storage systems, discussing access bandwidth is moot, since such storage is mostly connecting through Fibre Channel which has multiple links of 2, 4 or 8 Gbps. For open source storage systems, which are mostly iSCSI based, the access bandwidth starts with 1 Gbps with Ethernet overhead. For cloud storage services, access bandwidth is a significant factor - cloud services are accessed through WAN links, where access bandwidth is limited and may be prone to congestion. When choosing a storage system, test your application with the bandwidth you are planning on using.
  • Redundancy and high availability - What kinds of failures and incidents can a storage system survive? Cloud services claim that they can survive a lot - short of a cataclysmic event or a nuclear bombing - but such claims should be tested. Enterprise storage systems are designed to survive nearly any hardware issue within them, and provide abilities to replicate to other systems which are at a distance of tens of kilometer (naturally, at a high high price). Open source storage systems redundancy is dependent on actual hardware redundancy of the box the customer built, and provide some technologies for replication, which are in a different level of maturity. Always consider placing the data based on the importance to the company - can you survive without it?
  • Actual hardware - storage systems are comprised of well known components - hard drives, controllers, interfaces, power supplies. For both enterprise storage systems and for cloud service the customer does not need to bother too much with the hardware - the provider constructs and combines the required hardware. On the other hand, when preparing an open source storage, the customer usually builds the hardware which means finding appropriate hard drives, RAID controllers, redundancy in power supplies, caching mechanisms, LAN and FC interfaces. Building a system from scratch is a great experience, but commodity devices may be prone to much more failures then specially built hardware. Testing is not very useful here, but think ahead of the very possible risk of failure of commodity components.
  • Reporting - Once the storage system starts working, reporting becomes an immediate issue. The customer will want to know the load on the system, on individual hard drives and logical devices, response times, utilization trends etc. Again, enterprise storage systems shine in this area with an excellent portfolio of reporting tools, albeit usually with exorbitant prices. Cloud storage services may provide some reporting but not too in-depth, and the open source systems usually lack poorly, since the open source project is focused on functionality, not reporting. When choosing any storage system, always ask to look at the live reports from the service/system you are planning on using.
  • Support - Again, once the storage system starts working, there will be problems. And I guarantee you - the problems will not be simple: either it works or it doesn't. There will be all kinds of complicated and seemingly impossible combinations of issues. And this is exactly where the customer will need support. But there is no clear-cut answer to which type of storage system has the best support. One must tread carefully here, because good support is about having trained support personnel, but also having very dedicated support personnel. By definition, enterprise storage systems have a great advantage in this area, but this advantage can easily be ruined by a support team that juggles many projects, is used as presales or is simply not dedicated to supporting a customer. Cloud services fall in much the same category, but it can be difficult to discuss storage issues with a cloud storage service: the engineers are impossible to reach, there is insufficient data to support an issue (reports, analysis) and the cloud service provider has usually a well crafted SLA to protect themselves from most issues. The open source systems are an issue of support in a different way - since the systems are built with software which is written by many, there are rarely any real experts to support such a system, unless you pay someone - and even then it may be a risk.
  • Vendor lock-in - Cloud storage services are the strongest player in this area - if the customer chooses a cloud storage system as an important part of your infrastructure, it will adjust it's operation to the cloud system and create a 'symbiotic' bond, thus making the migration very costly. Enterprise systems are much easier to migrate from, since they are basically just huge hard drives. If all else fails, an operating system level copy command will provide a very crude but always successful migration. Open source storage systems have no lock-in: simple hard drives, where migration is a copy-paste operation.

Conclusions
There are multiple pros and cons across our storage systems parameters, but at first glance, the enterprise storage systems have the upper hand. Bear in mind though, such systems always come with exorbitant pricing, especially on any upgrades after the initial purchase. Therefore, such systems may be well suited for the mission critical applications, but are too price prohibitive to be used for every and any use within a company.
The cloud services are extremely flexible in expansion capacity and redundancy (at least on paper). But quality of service and support may be lacking, as well as issues in speed of access. So cloud based storage may be only logical if you rent the full package - server plus storage in the cloud, to guarantee an overall service level. The remaining issue is lock-in: once you start using a cloud provider, leaving it may be a challenge, since you have adjusted your operation to it's service and it may be costly to shift providers.
The open source systems are an interesting project, and can provide a very cheap solution for a lower tier functions. But in order to actively use such a system would mean to dedicate an employee or a team of homegrown experts on the open source storage system, to properly support the system. Also, redundancy and high availability can become an issue in such systems.

In summary, do not choose only one storage solution: The enterprise system is well suited for the business support, but it is a huge overkill for a test or proof of concept systems. Cloud storage services are a good choice for a cloud based infrastructure, but the lock-in issue requires careful strategic approach before lock-in occurs. So use everything, and always evaluate any solution for at least 3 months before committing to it.


Talkback and comments are most welcome


Related posts
RAID and Disk Size - Search for Performance

Designed by Posicionamiento Web