Showing posts with label technology industry. Show all posts
Showing posts with label technology industry. Show all posts

Sunday, September 27, 2015

SSD: This changes everything

So someone commented on my last post where I predicted that providing block storage to VM's and object storage for apps was going to be the future of storage, and he pointed out some of the other ramifications of SSD. To whit: Because SSD removes a lot of the I/O restrictions that have held back applications in the past, we are now at the point where CPU in many cases is the restriction. This is especially true since Moore's Law has seemingly gone AWOL. The Westmere Xeon processors in my NAS box on the file cabinet beside my desk aren't much slower than the latest Ivy Bridge Xeon processors. The slight bump in CPU speed is far exceeded by the enormous bump in IOPS that comes with replacing rotational storage with SSD's.

I have seen that personally, myself, in watching a Grails application max out eight CPU cores while not budging the iometer on a database server running off of SSD's. What that implies is that the days of simply throwing CPU at inefficient frameworks like Grails are limited. In the future efficient algorithms and languages are going to come back in fashion to use all this fast storage that is taking over the world.

But that's not what excites me about SSD's. That's just a shuffling of priorities. What excites me about SSD's is that they free us from the tyranny of the elevator. The elevator is the requirement that we sweep the disk drive heads from bottom to top, then from top to bottom, in order to optimize reads. This in turn puts some severe restrictions on how we lay out storage for block storage -- the storage must be stored contiguously so that filesystems layered on top of the block storage can properly schedule I/O out of their buffers to satisfy the elevator. This in turn means we're stuck with the RAID write hole unless we have battery backed cache -- we can't do COW RAID stripe block replacement (that is, write altered blocks of a RAID stripe at some new location on the device then alter a stripe map table to point at those new locations and add the old locations to a free list) because a filesystem on top of the block device would not be able to schedule the elevator properly. The performance of the block storage system would fall over. Thus why traditional iSCSI/Fiber Channel vendors present contiguous LUNs to their clients.

As a result when we've tried to do COW in the past, we did it at the filesystem level so that the filesystem could properly schedule the elevator. Thus ZFS and BTRFS. They manage their own redundancy rather than using RAID at the block layer to handle their redundancy, and ideally want to directly manage the block devices. Unfortunately that really doesn't map well to a block storage back end that is based on LUNs, and furthermore, doesn't map well to virtual machine block devices represented as files on the LUN -- virtual machines all have their own elevators doing what they think are sequential ordered writes, but the COW filesystems are writing at random places, so read performance inside the virtual machines becomes garbage. Thus VMware's VMFS, which is an extent-based clustered filesystem that, again, due to the tyranny of the elevator, keeps the blocks of a virtual machine's virtual disk file located largely contiguously on the underlying block storage so that the individual virtual machines' elevators can schedule properly.

So VMFS talking to clustered block storage is one way of handling things, but then you run into limits on the number of servers that can talk to a single LUN that in turn makes it difficult to manage because you end up with hundreds of LUN's for hundreds of physical compute servers and have to schedule the LUNs so they're only active on the compute servers that have virtual machines on that specific LUN (in order to avoid hitting the limits on number of servers allowed to access a single LUN). What is needed is the ability to allocate block storage on the back end on a per-virtual-machine basis, and have the same capabilities on that back end that VMFS gives us on a single LUN -- the ability to do snapshots, the ability to do sparse LUN's, the ability to copy snapshots as new volumes, and so forth. And have it all managed by the cloud infrastructure software. This was difficult back in the days of rotational storage because we were slaves of the elevator, because we had to make sure that all this storage ended up contiguous. But now we don't -- the writes have to be contiguous, due to the limitations of SSD, but reads don't. And it's the reads that forced the elevator -- scheduling contiguous streams of writes (from multiple virtual machines / multiple files on those virtual machines) has always been easy.

I suspect this difficulty in managing VMFS on top of block storage LUNs for large numbers of ESXi compute servers is why Tintri decided to write their own extent-based filesystem and serve it as a NFS datastore to ESXi boxes, rather than as block storage LUN's. NFS doesn't have the limits on number of computers that can connect. But I'm not convinced that, going forward, this is going to be the way to do things. VSphere is a mature product that has likely reached the limits of its penetration. New startups today are raised in the cloud, primarily on Amazon's cloud, and they want a degree of flexibility to spin virtual machines up and down that make life difficult with a product that has license limits. They want to be able to spin up entire test constellations of servers to run multi-day tests on large data sets, then destroy them with a keystroke. They can do this with Amazon's cloud. They want to be able to do this on their local clouds too. The future is likely to be based on the KVM/QEMU hypervisor and virtualization layer, which can use NFS data stores but they already have the ability to present an iSCSI LUN to a virtual machine as a block device. Add in some local SSD caching at the local hypervisor level to speed up writes (as I explained last month), and you have both the flexibility of the cloud and the speed of SSD. You have the future -- a future that few storage vendors today seem to see, but one that the block storage vendors in particular are well equipped to capture if they're willing and able to pivot.

Finally, there is a question as to whether storage and compute should be separate things altogether. Why not have compute in the same box as your storage? There's two problems with that though: 1) you want to upgrade compute capability to faster processors on a regular basis without disrupting your data storage, and b) density of compute servers is much higher than density of data servers, i.e., you can put four compute blades into the same 2U space as a 24-bay data server. And as pointed out above, compute power is now going to be the limiting factor for many applications, not IOPs. Finally, you want the operational capability to add more compute servers as needed. When our team used up the full capacity of our compute servers, I just added another compute server -- I had plenty of storage. Because the demand for compute and memory just keeps going up as our team has more combinations of customer hardware and software to test, it's likely I'm going to continue to have to scale compute servers far more often than I have to scale storage servers.

So this has gone on much too long but the last thing to cover is this: Will storage boxes go the way of the dodo bird, replaced by software-defined solutions like Ceph on top of large numbers of standard Linux storage servers serving individual disks as JBOD's? It's possible, I suppose -- but it seems unlikely due to the latency of having to locate disk blocks scattered across a network. I do believe that commodity hardware is going to win everything except the high end big iron database business in the end because the performance of commodity hardware has risen to the point where it's pointless to design your own hardware rather than purchase it off the shelf from a vendor like Supermicro. But there is still going to be a need for a storage stack tied to that hardware in the end because pure software defined solutions are unable to do rudimentary things like, e.g., use SES to blink the LED of a disk bay whose SSD has failed. In the end providing an iSCSI LUN directly to a virtual machine requires both a software support side that is clearly software defined, and a hardware support side where the hardware is managed by the solution. This in turn implies that we'll continue to have storage vendors shipping storage boxes in the future -- albeit storage boxes that will incorporate increasingly large amounts of software that runs on infrastructure servers to define important functions like, e.g., spinning up a virtual machine that has a volume attached of a given size and IOPs guarantee.

-ELG

Thursday, April 25, 2013

Irresponsible

I must admit that I have a low opinion of journalists, tech journalists in particular. I've been interviewed several times over the years and only once has the result been accurate. In all the other cases, what I said was spun to fit the journalist's preconceived notion of what the story should be, and to bleep with the truth.

What I cannot understand is why, if a tech journalist cannot interview people in the know because they had to sign a NDA in order to obtain certain assets for a specified price, said journalist would go ahead and publish a story based entirely upon speculation and a single source that may or may not know the details of whatever legal agreements were signed. It's not professional, it's not ethical, and it's not right. But it's the way tech "journalism" is done here in the Silicon Valley. I guess making a living by being unprofessional and unethical doesn't bother some people. So it goes.

-ELG

Tuesday, November 2, 2010

Microsoft in a nutshell

It's no secret that Microsoft is a company in trouble. At one time they had a significant portion of the smartphone market, now they're an also-ran with single-digit market share. Their attempt to buy consumer marketshare in the gaming console market has generated some marketshare, but also significant losses. Their Zune Phone experiment lasted only two months before ignoble abandonment. The only things they have that make money right now are their core Windows and Office franchises -- the entire rest of the company is one big black hole of suck, either technologically, financially, or both. And while their market share in desktop operating systems is secure for the foreseeable future, with no viable competitor anywhere in sight (don't even mention Linux unless you want to cause gales of laughter, Linux on the desktop is a mess), Office faces a threat from OpenOffice. Plus, their very profitable Windows Server franchise, which accounts for a small percentage of their unit sales but a large percentage of their revenue, is steadily eroding as it becomes clear to almost everyone who isn't tied to Microsoft Exchange that Linux rules the world. Amazon EC3 runs on Linux, not Windows -- as does every other cloud play on the Internet. 'Nuff said.

Today something happened which epitomized this suck. I opened up an email in Microsoft Hotmail. At the top of the email, in red, was the following message: "This message looks very suspicious to our SmartScreen filters, so we've blocked attachments, pictures, and links for your safety."

The title of the email: "TechNet Subscriber News for November".
The sender of the email: technote@microsoft.com

Siiiiigh... even their own spam filter thinks they suck.

-ELG

Saturday, July 31, 2010

The world's most reluctant Linux advocate

In the fall of 1995, I had successfully brought to completion the project to comply with new federal and state reporting standards for school discipline for the consortium of school districts that our consulting firm served, and was busy cleaning up the master student demographics suite to properly incorporate the new discipline screens rather than have it be a stand-alone subsystem reaching into the student database. It was a hard slog -- the code was a mess. The guy who had written it, who I had replaced, was a math guy, not a computer guy, and he had no inkling of simple things like comments or code reuse, the product life cycle was a mystery to him. His notion of code reuse was to cut and paste the same code multiple places, and there were some significant bugs that I was cleaning up. The only good news was that the code was heavily componetized -- it was basically a hundred small programs tied together by a menu system and a common database, though some of the programs seemed to be bigger programs because they forked out to other programs to provide more screens to school secretaries. All of this was running on SCO Xenix or Unix, depending upon the school and when it had bought our software suite.

So during all this my boss calls me into his office and says, "one of our districts has asked if we're investigating Linux as a possible way to bring down costs for school districts. What do you think?" Now, one thing to remember is that my boss was a big old ex-IBM bull, a no-BS kind of guy, and we got on about like you'd expect from two people with strong opinions but mutual respect. "Linux is freeware downloaded off the Internet," I replied. "Don't we have enough trouble maintaining our own code right now without having to maintain some freeware downloaded off the Internet too?" And that was pretty much that. Still, I thought, "hmm, I have that new Windows 95 machine at home, I bet it'd run Linux." I'm a geek, and what geek wouldn't want to play with a free operating system?

So, after work I headed off to the local Barnes & Noble to grab a book about Linux. The one I bought had something called "Slackware 95" on CD in the back. I took it home with me that night and installed it on a partition on my home computer. So I installed it and figured out how to get "X" running and... well, it worked okay. fvwm was ugly and crude and limited, and there wasn't much desktop software, no real word processor, but I knew LaTex and it had LaTex, so that was good. It drove my laser printer fine too. So the next day at work, I went ahead and installed it on our eval machine at work, where we'd also installed Windows 95 to see what we could do about porting our UI to it. I compiled our source tree on Linux and... hmm, it just compiles, just like it compiles on SCO Unix? And it actually ran!

So I started developing on Linux instead of on SCO Unix, mostly because it was much easier to get Emacs up and going and I prefer Emacs to 'vi' (let the flame wars begin!), not to mention that the GNU tool suite is a lot nicer than the old-school Unix tools. When I finished a module and did initial smoke testing I'd then copy the code over to SCO Unix and compile again there. From time to time I'd also go into the menu system and create a Linux version of one of the SCO Unix system administration programs that we'd accumulated over the years to allow school technology coordinators to manage the system. But I still hadn't considered actually deploying Linux at schools. While it seemed the technology held up okay -- our software actually ran faster on Linux than on SCO Unix -- the business objections were formidable. "We don't want to trust our critical student data to some hackerware downloaded off the Internet!" was the least of it.

That changed in the Spring of 1996, however, when Red Hat Software came out with their 3.0.3 version of Red Hat Linux, which they marketed as "Linux for business". It came in a box! With a manual! From a real company! Complete with a shadow businessman logo wearing a red hat marching off to do business with his briefcase! For the first time, the possibility of actually using Linux as part of our business was not ridiculous. The only thing I really didn't like about 3.0.3 was that all of the system administration tools were TCL/TK GUI scripts, but given that I'd already written a number of menu-based system administration scripts, that didn't seem a fatal objection. I switched from Slackware to Red Hat 3.0.3, and kept on developing under Linux rather than SCO Unix.

So, early June 2006 came along, and we got another school district as a customer. My boss called me into his office again. "What would it take to port our software to Linux?" he asked. "I pretty much already have it ported," I said. "Maybe two weeks to do a thorough job of testing and filling in any system administration scripts that aren't yet rewritten, and it would be ready." "We have this new customer. With our winning bid we could make more money selling Linux rather than SCO Unix, the OS isn't specified in the bid, should we do SCO Unix or Linux?" "Well, Linux has some risks involved in it," I replied. "We still haven't tested it with real data, it should work, but there's no guarantee." He then said that we hadn't won the hardware bid, and there was no guarantee that it would work with SCO. I then suggested a dual-OS strategy -- plan on using *either* of the operating systems, depending upon which one worked with the hardware when the hardware came to us for us to install the OS and administrative software. Given the wholesale pricing I'd gotten from Red Hat Software, we could purchase an official copy of Red Hat Linux 3.0.3 for each school to counter the "hackerware downloaded from the Internet!" objection and basically it was lost in the noise compared to the significant cost of SCO Unix.

So the hardware came in, and the tape drive was supported by Linux, while it was not supported by SCO Unix. We had two options at that point -- delay the deployment until tape drives supported by SCO Unix could be procured, or deploy with Linux. We were scheduled in two weeks to have the machines at a high school gymnasium at the school district to train the school secretaries on how to use the software. It would take at least two weeks to argue with the school district and the hardware vendor about tape drives.

"We go Linux," I told my boss. And we did. I spent the next two weeks sweating the details, making sure everything worked, using real data from a real school district (with the student ID information masked out) to validate that all functions of the software itself did what they were supposed to do, going through all the management screens to make sure they worked properly with the hardware on the systems, and so forth. On the appointed day I drove the main Linux development machine to the school district myself, and stayed on hand while the secretaries all booted their machines, just in case something broke, and... it didn't. Everything Just Worked, without a hitch, all the demos went off as planned, and my inservice training on the discipline system went on as usual, the secretaries were quite attentive, laughed at the right points (i.e., when I produced an official state discipline form with a ridiculous discipline infraction for them to punch into their computers and made an offhand humorous comment about it), and... phew!

That's always the moment of truth: when the product hits the customer's hands. You either pass or fail at that point. I'm proud to say that we passed, and became one of the first of what eventually became a thundering storm of people migrating away from proprietary Unix systems to Linux. Over the next three years we transitioned all of our schools to Linux -- it simply made things easier only having to maintain one set of administrative tools, and it wasn't as if it cost any money, we usually did it when they were upgrading old hardware so we were getting paid for that service and Linux came along for the ride, and it Just Worked. And what more can you say?

I suppose there's a couple of lessons there. First, don't dismiss Open Source software just because it's "some hackerware downloaded off the Internet." Secondly: don't use Open Source software just because it's Open Source if you can't make a business case for it in terms of risks vs. benefits. We couldn't make a business case for it in the fall of 1995, we simply did not have the engineering cycles to handle a transition to Linux given the state of Linux at that time, the risks outweighed the possible benefits. By the summer of 1996, when the code base issues had been resolved, the primary objection of customers about using Linux had been resolved, and the issues of hardware compatibility and profit became key, Linux simply Made Sense. It still wasn't the safe choice. But the risks were limited enough at that time compared to the benefits to justify taking the risk.

-ELG

Thursday, July 15, 2010

The value of an education

For some reason Americans seem to believe education is something people receive. People go to college to "receive an education". This implies that students are simply receptacles. The professor opens up their heads and drops knowledge in, then sews them back up. This worries me when I'm looking at the quality of the young people entering the computer science field today, because while they've had exposure to a lot of technology, by and large it's been as users, not as participants. The actual technology is something they don't even think about -- it's transparent, just a part of their world, not something they actually see and think about.

The problem is that education isn't something you receive. Education is something you do. I graduated from a middle-tier university. Which means nothing at all, actually, because I knew my **** when I left there, I'd actually designed bit-slice CPU's and microcode for them and built hardware and written programs in microprocessor assembly language *for fun* while the guy across the street with the 4.0 GPA knew nothing except what was on the test, I mean, he'd been writing software on Unix minicomputers for four years and he didn't even know what 'nroff' was or that he was using Unix! So yeah, it's all about what use you make of the experience. I spent as much time in professor's offices talking about my latest projects as I spent studying, which hurt my GPA, but (shrug). I'm still employed in the computer field today. The guy across the street? Nope.

That is one reason why Open Source is exciting to me, and why people who have a background in the Open Source community interest me far more than people who have a 4.0 average from Big Name University. I'm looking for doers, not regurgitators. What gets software shoved over the transom isn't the ability to memorize what's going to be on tests, it's what, for lack of a better term, I call "get'r'done". The problem I see is that the technology has become so capable, so complex, so difficult to grasp, that the number of people who could learn the basics of some simple technology like a Commodore 64 then build up to writing significant Linux kernel subsystems has basically slowed to a dribble. Simple and relatively open technology like the Commodore 64 where you could grasp the entire design all by your lonesome (the programming manual came with a schematic of the computer in it!) simply doesn't exist anymore. For good reason in most cases, today's computers have far better functionality, but how are we going to get the people with the "big picture" today when there's no "little picture" like a Commodore 64 to build up from?

So anyhow: That's a problem. It's a problem I find with a lot of the younger software engineers. I've managed some very bright youngsters, but that lack of what I'll call big picture thinking hinders them greatly. They simply don't understand why a busy loop waiting for input is not acceptable unless there is no alternative and why it should have a timer to put the process to sleep between samples, or what the hardware looks like and how to program the front panel that's driven by a PIC processor. They're like ferrets, it's all "oooh, shiney!" to them, with no rhyme or reason or understanding of what's actually happening under the surface. And I have no idea at all what's going to happen when all us older farts get put out to pasture either via corporate executives calling us "too old and expensive" or simply getting too tired and retired... there just isn't enough of the young folks who have the slightest clue. Not that we were the majority even when I was 21, but at least there was a sizable number who *did* have a clue then... and you can find a lot of their names looking at the early Linux kernel patch sets. But even the Linux kernel crowd is graying today... and what happens when we're no longer around, given that the number of young people today who understand technology at the same comprehensive level we do -- or that we did at age 21 -- is essentially zero?

-ELG

Tuesday, February 23, 2010

Standards and rent seeking behavior

Rent-seeking behavior is defined by economists as behavior intending to gain competitive advantage by manipulating the environment to your benefit, rather than through profiting from production of goods and services. An example of this would be if a company X managed to get a law passed specifying that all goods purchased by the government must comply with ISO standard 3.052345.32431, where company X happens to hold a critical patent on the technology in that ISO standard. In this case company X is profiting not because they produced goods and services, but, rather, because they manipulated the environment (got a law passed) which says that everybody wanting to do business with the government must pay rent (patent fees) to company X.

William Vambenepe complains that cloud standards are being created in a secretive manner. He complains that this means that those of us actually implementing cloud computing software are being locked out of the process. And this is true. Yet this is not unusual. Why? Well, because there are certain large corporations who, for some reason, still believe that rent-seeking behavior is useful when it comes to the standards process -- i.e., that, as with creating a law dictating that everybody pay rent to them, that they can set a standard that dictates that everybody pays rent to them.

Let me explain: The more complex the standard (and the more BigCorp-patented technologies included in it of course!), the more resources it will take to fully implement it. The goal is to make the resources and patent licenses needed to fully implement the standard so onerously huge that only large organizations will have the resources to do so, meaning they are the only ones who are “standards-compliant” and they can slam any potential upstart competitors as not being “standards-compliant”. Not going to name names here, but I’ll just point out that simpler standards tend to drive out the more complex standards, thereby leaving the big companies high and dry with a product that nobody wants to buy. Has anybody here used the complex X.25 protocol lately? What, you’re using the simpler TCP/IP protocol instead? Exactly.

Which points out why rent-seeking behavior is invariably self-defeating when it comes to standards. Unlike compliance with the law, compliance with standards is generally voluntary. If a standard is too complex or too expensive to implement, people simply won't use it, and a “standard” that nobody uses — or that only customers of a few large corporations use — is hardly a real standard. And keeping the standards discussions secretive is hardly in the best interests of anybody also, it means that real problems with “standards” will be overlooked until the “standard” is actually published, at which point all the effort used to produce the “standard” is useless because nobody will create products that implement the “standard” (thus rendering it *not* a standard). Yet we still see this sort of rent-seeking behavior on the part of certain large corporations that seem convinced that it actually works. Inexplicable…

-ELG