So anyhow, I was scoping out the tape format that we were going to use. I had a pile of tape hardware that filled an entire lab, and a pile of tape drive vendor's SCSI programming manuals that was stacked about four feet high on my desk. We had decided that we were going to keep the tape catalog in a MySQL or PostGreSQL database, but we also wanted a copy of the tape catalog on the tape itself so that tapes could be quickly re-imported into a new install of the software in the event, say, of the building burning down and the off-site backups needing to be pulled in on a new server. Without an on-tape copy of the catalog importing a tape would be a slow and painful process of reading the entire tape from front to back. The question was, where would we put the tape catalog? My boss said, "what about a partition at the beginning of the tape?" That seemed rather interesting, because usually once you start writing a tape, all you can do is either overwrite it, or append to the end of it -- you can't come back to the start and start writing again (this is because of the way the tape drives' internal compression works, amongst other things). Could we in fact do this?
So I investigated that, and found that the tape partitioning ioctl in Linux only worked with Seagate 4mm DAT tape drives. It didn't work with Exabyte drives, or Tandberg drives, or that brand new HP LTO drive, just with Seagate drives. I investigated the CDB that was being sent by the tape driver and looked at the Seagate manual, Exabyte manual, and Tandberg manual, as well as at the SCSI SMC standards document itself, and realized: That "standard" had enough holes to drive a bus through. There were basically two different ways you could interpret it, and Seagate interpreted it one way, and everybody else interpreted another way. I came up with a way of detecting in real time which partition CDB format that the tape drive used based on reading the partition table via the mode page for that, and modified the tape driver to do it correctly for every tape drive that implemented tape partitioning support -- every single SCSI drive, other than Quantum's DLT. Which, uhm, was the most popular enterprise drive. Oh well. So we didn't use tape partitions, instead we simply set a tape mark at the end of a data stream and appended the catalog to the end of the tape instead.
So anyhow, I contacted the maintainer of the Linux tape driver with my patch and an explanation of what it was doing and why the old way only worked with DDS and this would work for (long list), and he looked at it and he didn't like it, so he re-wrote it a slightly more elegant way and sent it back to me. I tried it out, it didn't work on one of the drives, I fixed one thing he'd slightly mis-interpreted and sent it back to him. I think we did a couple of iterations of this, and finally it passed both my tests for whether it actually worked on my tape drives, and his taste for what he wanted his tape driver code to look like, and it went into the next version of the Linux kernel that was released.
The other issue with tape format was that we wanted to be able to quickly retrieve a single file. It turned out that even the primitive DLT drives implemented both block location inquiry and block seek. Our archive format already aligned headers at the start of tape blocks so you could start seeking at the start of a tape block and locate the file, this wasn't a gzip'ed stream, we did things a bit smarter than that (we did something similar to what ZFS does when you turn on ZFS compression). So I hacked our data engine to issue a block position ioctl at the start of each file (and also print the file name and position) and hacked it to accept a block location and issue a block seek ioctl to find a file, as a proof of concept to verify that this would work. This worked fine on Linux and worked fine on SGI Irix. It did not work at all on Solaris -- Sun hadn't implemented *any* of the modern features of tape drives in their tape driver. We hacked our data engine to use the Solaris sgen (SCSI Generic) device directly (i.e. issue raw SCSI CDB's to read, write, seek, etc.) rather than use the tape driver, but never were satisfied with that and put that aside.
Then there was FreeBSD. I liked FreeBSD. I ran FreeBSD on my own server at home. I tried this on FreeBSD and writing tapes was very... very... slow when I had the 'print position' code enabled. Ridiculously slow. So I looked at the FreeBSD tape driver and realized what was happening. There was a call to cache flush right before issuing the position request CDB. This caused every tape drive in our lab to basically come to a screaming halt every time the ioctl happened. I commented out that single line of code, and everything worked at full speed on every single enterprise-class SCSI tape drive that was currently shipping to paying customers anywhere in the Western world. So I contacted the FreeBSD tape driver maintainer to remove this unnecessary cache flush, and...
"But the SCSI standard says that the reported tape position may not be accurate unless you issue the cache flush."
"We've tested every single enterprise drive currently shipping on the planet. They all stall and become unusably slow when you issue a cache flush anywhere but at the end of a dataset, and they all report the position accurately without the cache flush, we can go back to the reported position and always find the data we expected there."
"But... the SCSI standard..."
"Have you found any drive that actually complied with the SCSI standard? The fact is that the cache flush makes the ioctl useless because it makes it too slow, and every tape vendor seems to ignore that little note in the SCSI standard and just do what we expect." (Followed with performance data on actual drives with / without cache flush and noting that this worked properly *without* the cache flush on both Linux and Irix, neither of which had the cache flush in their tape position ioctl).
It went on and on and on. I was an interloper from outside their little community, so I had to be put into my place, despite the fact that all I wanted was for the stupid ioctl to Just Work, as it did on Linux and Irix, so that we could support FreeBSD as a Tier 1 platform rather than relegated to a Tier 2 platform (one which could not host tape drives but merely be backed up to another server that was hosting tape drives). Eventually after much grumbling about "well okay, it is wrong according to the SCSI standard but if that's how the tape drives actually work, we'll accept this change" they queued it up for a future release of FreeBSD, but by that time it was rather too late in the process -- we'd moved on to actually implementing our product, and FreeBSD wasn't scoped because there was no version of FreeBSD currently shipping that had the tape driver functionality we needed for our product.
Whenever I think about that difference between the Linux and FreeBSD communities in 2000, the reason Linux rules the world and FreeBSD has been relegated to a side note becomes a bit clearer. At the beginning, FreeBSD was clearly technically superior to Linux. For example, the FreeBSD scheduler could handle heavy loads without falling over, and the FreeBSD memory management system could handle heavy memory pressure without falling over, and neither was true of Linux in 2000. But the deal is, the world is not a meritocracy. The majority of the world will accept "Good Enough" if it means they don't have to put up with a lot of friction. The FreeBSD community in 2000, reeling from loss after loss to Linux with their biggest deployment (Hotmail) being converted to lowly Windows, was defensive and paranoid and not very pleasant to deal with (it's considerably better now BTW). Experienced IT people going into that environment and getting that kind of attitude simply rolled their eyes and went with that Linux thing, which was Good Enough for a lot of things even if it wasn't as good as FreeBSD.
So what are my takeways? Well, if you're involved in an Open Source community, whether as a code contributor, supporter, or just general advocate, here's my takeaways:
- Stowe the attitude. If you appear defensive and hostile, experienced IT folks simply won't use your stuff.
- If someone contacts you with a problem, and a proposed solution, don't dismiss them as an idiot until you actually evaluate the technical content of their message. If you don't agree with them, give technical reasons why their solution doesn't work, don't make personal attacks upon the person calling them an idiot or stupid or incompetent or whatever.
- If it's clear that someone is entirely off base, either reply "Thank you for your contribution, we will consider it" or simply hit the DEL key.
- Police your fanboys if they're trashing your mailing list or forums giving attitude to folks who have problems, suggestions, or contributions. One of the things that hindered uptake of Linux in that era was the Linux fanboyz with their "Linux rulez Windows drulez" attitude. IT people want to solve problems, they're not interested in penis measuring contests, and they're going to use the solution to their problem that comes with the least baggage. It doesn't matter whether your project really *is* the best or not, I repeat once more: The real world is not a meritocracy. If some other product is Good Enough, Just Works, and comes without the baggage, guess whose product is going to win?
And that's my takeaway. Sorry about the long history lesson, but there was a purpose in it -- and hopefully you've figured that out by now. So it goes.
-ELG
No comments:
Post a Comment