Saturday, March 30, 2013

Why no cloud?

So I promised I'd explain why I was setting up normal Linux-based storage and normal KVM/ESXi compute servers for our new small business's network rather than an OpenStack private cloud, so I'll do so.
  1. One risky technology per deployment. It's about risk management -- the ability to manage risks in a reasonable manner. If you have multiple risky technologies, the interactions between risks rise exponentially and cause risks to be unmanageable. Normal Linux-based storage is a mature technology with over a decade of active deployment in production environments with the exception of the LIO iSCSI target. I concluded that the LIO iSCSI target was a necessity in our environment because the TGTD target provided with stable Linux distributions has multiple serious deficiencies (see earlier postings) that render it nothing more than a toy, and our legacy infrastructure was based around iSCSI talking to that pile of ancient Intransa blue boxes. So I've reached my limit on new technologies. Meanwhile OpenStack is multiple immature technologies under active development. Add that to LIO and the existing VMware ESX/ESXi servers' need for block storage and I'd require multiple storage networks to mitigate the risks. Which brings up...
  2. Power and space budget. My power and space budget allows for one storage network with a total of 8U of space and 1000 watts of power consumption. I don't have power and space for two storage networks, one for OpenStack and one for ESX/ESXi.
  3. Performance. The majority of what my network provides to end users is file storage via NFS and CIFS. In an OpenStack deployment file servers run as virtual machines talking to back end storage via iSCSI. This scales very well in large installations, but I don't have the power and space budget for a large installation so that's irrelevant. Running the NAS storage stack directly on the storage boxes results in much better responsiveness and real-world performance than running the NAS storage stack on a virtual machine talking to the storage boxes via iSCSI, even if the theoretical performance should be the same. The biggest issue is that this limits the size and performance of any particular data store to one storage box, but the reality is that this isn't a particularly big limitation for our environment, since we have far more iops and storage on a single storage box than any single data store in our environment will use for quite some time. (My rule of thumb is that no ext4 data store will ever be over 1Tb and no xfs data store will ever be over 2Tb, due to various limitations of those filesystems in a NAS environment... any other filesystem runs into issue #1, one risky technology per deployment, and I already hit that with LIO)
  4. Deep understanding of the underlying technologies. The Linux storage stack has been mature for many years now, with the exception of LIO. I know its source code at the kernel level fairly well. If there is an issue, I know how to resolve it, even to the point of poking bytes into headers on disk drive blocks to make things work. Recovery from failure thus is low risk (see #1). OpenStack is a new and immature technology. If there is an issue, we could be down for days while I chase around in the source code trying to figure out what went wrong and how to fix it.
Note that this is *not* a slam on OpenStack as a technology, or saying that you should not use one of the OpenStack cloud providers such as RackSpace or HP. They have massive redundancies in their OpenStack deployment and people on staff who have the expertise to manage it, and do not have to deal with legacy infrastructure requirements such as our ESXi servers with their associated Windows payloads. Plus they are based around a totally different workload. Our in-house workload is primarily a NAS workload for workstations, and our compute workload is primarily a small number of virtualized test servers or build servers for our software in a variety of environments as well as a handful of infrastructure servers to e.g. handle DNS. What OpenStack mostly gives you is the ability to manage massive numbers of storage servers and massive numbers of compute servers and massive numbers of virtual machines on those compute servers, none of which is our local workload.

The workload that RackSpace etc. are supporting is mostly about Big Data and Big Compute in the cloud or about web server farms in the cloud. All of that has far larger space and power requirements than our little two-rack data center can ever provide, and the reality is that we simply use their infrastructure when we have those requirements rather than attempt to replicate their infrastructure in-house. It simply isn't reasonable for a small business to try to replicate RackSpace or Amazon AWS in-house. We don't have the space and power for the massive amount of infrastructure they use to achieve redundancy and reliability, we don't have the requirement for our local workload, and we don't have the in-house expertise. In the end, it's a case of using the appropriate technology for the appropriate task -- and for what I'm attempting to achieve for the local infrastructure of a small business, using NAS-based Linux storage was more appropriate than attempting to shoe-horn our workload into an infrastructure that would give us no more capability for our needs but would cost us in terms of power, space, performance, and maintainability.

-ELG

No comments:

Post a Comment