Friday, November 9, 2012

How to upgrade to a bigger SSD

Okay, on a 17" HP Envy, here's how to upgrade from a small SSD to a new SSD:
  1. Make a Windows system repair disk via the control panel's Backup/Restore item.
  2. Put the new SSD into an external USB case
  3. Boot into a Linux live CD that supports your hardware, and dd the internal drive to the external drive.
  4. Unplug the external drive, reboot back into Windows.
  5. Now to extend the partitions on the new drive so you have your new C:. Plug in the external drive, run Easus Partition Master and move your rescue and tools partitions to the end of the drive and resize your C: partition on the new drive.
  6. Shutdown the system
  7. Remove the old SSD from the computer, replace it with the new SSD.
  8. Boot into the repair disk. It'll then whine that your boot needs repairing. Let it.
  9. Boot into the system, and then go to the control panel's 'System' item. Select "System Protection", then scroll down to the end of the list and you'll see something that says "C: (Unavailable)" that says System Protection is enabled. Click on it, then click "Configure", then "Disable".
  10. Click your new "C:" partition, select 'Enable', then whatever percentage you want to enable for System Restore snapshots.
There you go. All done.

Wednesday, November 7, 2012

UDP or TCP for lossy networks?

So I was discussing video streams today and how typical video streaming protocols like RTP react to loss on the network -- i.e, by dropping frames -- and someone made the statement, "TCP would be better for lossy networks." He was wrong, of course. But I hear you saying, "but... but... TCP recovers from data loss!" Well, yes... if you don't care about timing or throughput, only about reliability. But if you're talking video streaming, TCP could result in loss of not only a few frames, but could result in loss of seconds worth of frames, or even cause video cameras, switches, or servers to crash.

At which point you're saying, "say wha?!" But here's the deal. If you lose a video packet with UDP, you lose a frame. RTP handles reassembling the packets into frames, if the next frame has come in and there's no packet to fill in slot X in the previous frame, RTP just throws the frame away. If you're getting 28 frames per second, you lose one frame out of that, big deal. Just replay the previous frame on replay, nobody will notice.

Now let's say you're using TCP. A missing packet occurs. TCP is expecting packet N... and doesn't get it. The TCP window goes SLAM at that point -- 4 megabytes of data backs up behind the missing packet as TCP tries to build its window and has a missing piece that it is waiting for, then TCP flow control assures that no more data is accepted until after the 1.5 second timeout causes a retransmit on the sender side. What this does is add 1.5 seconds worth of delay. You have a 1.5 seconds of delay, 4 megabyte window, and a gigabit Ethernet line, and what that means is that one lost packet every 1.5 seconds basically causes your gigabit Ethernet line to become a 21 megabit/sec Ethernet line. Congratulations, you just returned to 1985. And if you have 200 megabits/sec of incoming data, all that data is piled up in the network stack somewhere between you and the cameras, and you're in big trouble.

For streaming data, UDP thus handles a lossy network much more gracefully than TCP does. At least, that's true if you don't care about data loss due to redundancy in the data stream. I.e., frame 1000 in a video stream is very similar to frame 999 and frame 1001, so if frame 1000 goes missing it can be recontructed as simply the duplicate of the preceding or following frame. So what if you *do* care about data loss?

The reality is that if you *do* care about data loss, you're much more likely to get good results when streaming data if you use out-of-band data recovery. That is, if you've tried filling your frame, you already have packets for a couple more frames incoming, and you're missing a couple of packets, request packet retransmission directly via a NAK rather than wait for the sender to timeout waiting for an ACK. Thus *one* reason why most reliable multicast protocols rely on recipients proactively sending a NAK rather than time out waiting for an ACK before sending a packet. You can combine the two if you wish -- in this case, ACK would let the sender know that all packets in the frame were received thus it doesn't need to retain the frame in order to potentially service a NAK -- if you're really wanting both reliability and performance. But TCP doesn't do that. Which is why, on a lossy network, TCP is probably the worst protocol you could use for streaming data, and why the person who said "TCP is best for lossy networks" was wrong, at least for this application.


Monday, November 5, 2012

The importance of strong product management

There's a lot of folks who whine about Windows 8, "Why did Microsoft have to change the UI? I like the old one!" The thing is, the old one simply isn't working well for a lot of people anymore. Hard drives have gotten so big, and people have installed so many programs on their systems, that the Start menu has achieved a depth that nuclear submarines would envy. Because the population is aging and eye-hand coordination is declining, both seeing all the tiny print on that Start menu and navigating it through several levels of sub-menus has become increasingly hard for a large percentage of the population. And finally, the Start menu paradigm simply doesn't work for tablets. If eye-hand coordination diving through the menu is an issue with a mouse, with a tablet touchscreen it would simply be impossible.

In other words, the notion that the Windows 8 UI change is all about "marketing" is pretty much nonsense. It's been well known that the Cairo user interface introduced with Windows 95 has reached its logical limits for quite some time, and ideas for changing the UI to meet the new challenges of the 21st Century have been floating around inside Microsoft for years, if a look at the Microsoft Research web site is any guide. I'm sure that Marketing told engineering, "we need a UI that will be usable on a tablet! And oh, make it usable on a desktop too!", but at worst Marketing merely hurried what was already in progress, rather than being a direct cause of the changes in Windows 8. The writing was on the wall for the Cairo UI, and sooner or later it would have been consigned to the dustbin of history regardless of Marketing's frantic panic about tablets.

So unlike a lot of people, I'm not surprised at all that Windows 8 has a significant shift in UI functionality. What I *am* surprised at is that it was done so badly. Microsoft has a lot of good people, and Windows 8 has all the raw tools in it to be a great operating system. Yet there's some needless complexities in its operation that shouldn't be there, and some important missing functionality that should be, such as IOS or Android style icon folders (without that, you're in endless sideways-scrolling territory to get all your most-used programs onto the start screen). So what gives?

In my opinion, the biggest issue with Windows 8 is caused by a clear failure of product management. Good product managers are hard to find because the job requires an understanding of customers at an intuitive level such that you can devise workable requirements to meet their needs, yet sufficient technical chops to understand what is doable and guide engineering toward producing the product that is going to meet those requirements. It also requires taste -- the ability to look at a product and say, "yes, that is tasteful and will please our customers" or look at a product and say "that is a pile of garbage, get it out of my sight until you do X, Y, and Z to it." Furthermore, product managers have to be empowered to be able to make those sort of judgements and have them stick. For better or for worse, Steve Jobs provides the template for what a strong product manager looks like -- opinionated, tasteful, with an intuitive understanding of the customer, with enough technical chops to understand what can be done, and power to make it stick.

Thing is, it's hard to find product managers like that because the geeks and nerds who typically run engineering departments wouldn't know good taste if it bit them on their bum, while the sales flunkees who typically run marketing departments wouldn't know technical chops if said chops bit off their ear. You almost need a Steve Jobs to do it. Unfortunately Microsoft doesn't appear to have a Steve Jobs to find good product managers, or if they do have good product managers, haven't empowered said product managers to make critical decisions about the product. Which is a shame. Because Windows 8 has a lot of good ideas, and the underlying technology is good. It just fails because of a lack of good taste (and courage, but see my prior blog on that), not because of a lack of technical chops.

Which just goes to show that putting out a great product isn't a matter of having great technology. It has to be a team effort, and if you don't have that, what you'll get is either a product that doesn't meet the needs of the marketplace, or a product that's far less great than it should be. Something to think about, if you're thinking about forming or joining a new startup. Do you have the kind of team that it will take? Does the company you are thinking of joining have such a team? Important questions, yet pretty much every startup I've encountered is all about the technology, and the rest of what it takes to have a great product is completely ignored. Which is probably why so many startups fail. So it goes.


Thursday, November 1, 2012

Adding certificates for Windows 8 Mail

I run my own email server and of course SSL-encrypt both imap and smtp, but have a self-signed certificate, not a certification authority signed certificate. Outlook has no problem with that -- it whines about the certificate, but then gives me a dialog where I can import it. Once I import it, fine. But I don't have Outlook installed on my Windows 8 evaluation for a variety of reasons. So I tried Windows 8 Mail and rather than offer to import the certificates, I got a message that I needed to contact my system administrator to import some certificates. Erm, I *am* my system administrator! Hrm. So...

My email server is running Debian "squeeze" Linux with Exim4 as the smtp server and dovecot as the imap server. The first thing I needed to do was verify on the mail server that there were valid (self-signed) certificates for both exim4 and dovecot. This can be done with:

  • openssl x509 -in some.crt -text -noout
This will give you a bunch of information about the certificate, so you may wish to pipe it to 'less'.

The exim4 certificate was expired so I regenerated it with:

  • /usr/share/doc/exim4-base/examples/exim-gencert --force
So then I located the two certificates:
  • Dovecot - /etc/ssl/certs/dovecot.pem
  • Exim4 - /etc/exim4/exim.crt
I downloaded the two certificates to my Windows 8 system via sftp and renamed the dovecot certificate from dovecot.pem to dovecot.crt . The next thing I did was open the Microsoft Management Console by pressing Windows-R and typing 'mmc'. I then selected Add/Manage Snapons, and selected to add the Certificates snapon.

Next, open up the Certificates tree until you see "Trusted Root Certification Authorities" and open it up to see "Certificates" underneath it. Right-click on Certificates and click "All Tasks" then "Import". Import your two certificates and there you are -- now all your self-signed certificates coming from your email server look as valid as any certificates, and are accepted by Windows 8 Mail just fine.

So, uhm... why does Microsoft make this so hard? I dunno, they're Microsoft, I guess. But the "ask your system administrator" is just BS, because Windows 8 Mail will *never* be used by anybody who actually has a system administrator other than themselves -- all businesses will be using Outlook Mail as their email client for a number of reasons. Oh well, just another example of how Windows 8 is half-baked and characterized by an utter lack of understanding of, well, actual customers.


Tuesday, September 25, 2012

BTRFS vs ZFSonLinux: How do they compare?

  • Integration with Linux
    • ZFS: Not integrated. Has its own configuration database (not /etc/fstab), has its own boot order for mounting filesystems (not definable by you), cannot be told to bring a filesystem up after iSCSI comes up or down before iSCSI goes down.
    • BTRFS: It's just another Linux filesystem as far as the system is concerned. You bring a pool up by mounting it (preferably by label) in /etc/fstab and can define the mount order so it comes up after iSCSI.
  • Snapshots
    • ZFS: Full snapshot creation and removal capabilities, well exploited by the FreeBSD port 'zfs-periodic'. Snapshots appear in a special "dot" directory rather than cluttering up the main filesystem. This script is relatively easy to port to Linux.
    • BTRFS: Snapshots are created as "clones" of subvolumes, and destroyed as if they were subvolumes. They can be created either read-write or read-only.
  • RAID: Both of these use filesystem-level RAID where filesystem objects are stored redundantly, either as entire clones (RAID1) or, in the case of ZFS, via RAID
    • ZFS: Raid1 (mirroring) and RaidZ (similar to RAID5, except that it never does partial-stripe writes because it does variable stripe size -- the size of an object is the size of a stripe). Note that due to ZFS's COW implementation, an update to a RAID stripe cannot be corrupted by a power loss halfway through the write (see: RAID5 write hole)-- the old copy of the data (prior to the start of the write) is instead accessed when power comes back on.
    • BTRFS: Raid1 (mirroring). BTRFS currently has nothing like RaidZ. Note that putting a BTRFS filesystem on top of a software mdadm RAID5 will not give you the same reliability and performance as RaidZ, since you will still have the random write hit of partial-stripe writes and will still have the RAID5 write hole where if a stripe update fails due to power loss halfway thru the stripe write, the entire stripe is corrupted.
  • Portability
    • ZFS: A ZFS filesystem can be read / written on: Linux (via either ZFS/Fuse or ZFSonLinux), FreeBSD, OpenIndiana, and MacOS (via Zevo). Requires extra 3rd party software to be installed on Linux and MacOS, comes standard with FreeBSD and OpenIndiana.
    • Linux: Any recent Linux distribution (one with a 3.x vintage kernel) has BTRFS built in. Your BTRFS pools will be immediately available when you upgrade to a newer kernel or a newer Linux distribution, with no need to install any additional software. However, BTRFS doesn't run on any other OS.
  • Stability
    • On Linux, both BTRFS and ZFS are listed as "experimental". ZFSonLinux uses SEL (the Solaris Emulation Layer) as a "shim" between ZFS proper and Linux. Unfortunately this is sort of like nailing jello to a tree, while the underlying Linux block layer API hasn't changed in years, locking inside that block layer API has been in constant turmoil ever since the 2.6.30 timeframe as the last vestiges of the Big Kernel Lock were ferreted out and sent to the great bit bucket in the sky. The end result is that code that *used* to work may or may not cause deadlocks or strange races that cause an oops with current Linux kernels -- *UNLESS* it was developed as part of that current Linux kernel, as BTRFS is, in which case the person who changes the locks is responsible to make sure that all other kernel modules that are part of the next kernel release change their locks to match.
    • Summary: On Linux, this is a tie. BTRFS is under rapid development. ZFS is attempting to nail jello to a tree from outside the Linux kernel. Use for production data of either system on Linux is not recommended. If you want a production server running a production-quality modern snapshotting filesystem, use ZFS on FreeBSD.
Final summary:

If you must use Linux, and you must have a modern snapshotting filesystem, and you can live with a RAID1 limitation on data redundancy, I would strongly recommend going with BTRFS. The reason for this is that BTRFS is only going to get better on Linux, while ZFS is always going to be fighting the nail-jello-to-a-tree issue where Linux keeps changing underneath it and breaking things in weird ways. Unless ZFS is included as part of the Linux kernel -- and Oracle's lawyers will never allow changing the license to GPL in order to allow that -- there simply is no way ZFS will ever achieve stability except with specific kernel versions shipped with specific distributions. And even there I'm dubious.

If you need the stability of ZFS, I strongly recommend using FreeBSD and not using Linux. I have personal experience dealing with the issues that come with supporting an emulation layer on top of the Linux block layer, including dealing with some deadlocks and races caused by locking changes inside recent kernels that caused a six-week delay in the release of an important product, and I honestly cannot say that any current ZFSonLinux implementation will continue to work with the next kernel revision. I can reliably say that BTRFS will work with the next kernel revision. While production servers don't change kernel revisions often, only once every three or four years, if the next version of the server OS doesn't happen to be one that is well supported by ZFSonLinux's then-current SEL implementation, you have problems.

So: Linux -- BTRFS. If you need the functionality of ZFS -- FreeBSD. Enough said on that.

Sunday, September 23, 2012

Crack-smoking Linux zealots

It seems that every year, like the swallows flocking back to Capistrano, you get yet another burst of Linux zealots claiming that this is the year for the Linux desktop. As you know, I've said good things about Gnome 3 and its usability by mere mortals. Linux geeks whine about how they can't customize it blah de blah blah, but mere mortals don't customize their window manager -- other than setting the desktop background to a photo of their kids or grandkids, all they care about is whatever applications they're wanting to run. And from that perspective, Gnome 3 works just fine. But: An operating system is more than a window manager. And that is where Linux still is Utter Fail on the desktop.

Okay, let me tell you a story. I installed Fedora 17 on a 64GB SSD to manage my RAID arrays that serve data to my home network. For grins and giggles I installed Thunderbird and told it about a couple of my email accounts. I clicked through on a link in one of those email accounts that went to a page that had a video embedded. Well, that *would* have a video embedded, on Windows or MacOS. On Linux? Just a blank box and a message about a plugin that simply doesn't exist for Linux that needs to be installed, probably a Quicktime plugin. Now I'm sure you're saying, "but that's not Linux's fault, that's Apple's fault for patenting the algorithm used for that plugin!" But see, the deal is, end users don't care whose fault it is. All they know is that they can't see that embedded video with Linux, and they can see it with Windows or MacOS. BTW, same applies to playing MP3 files -- they simply won't play. Again, a patent issue, but Joe Sixpack doesn't care why his music files won't play on his Linux system -- all he cares about is that they don't play on his Linux system. End of game.

Now let's look at another issue, one where the excuse of patents does not hold. The SSD came out of an old Asus Aspire One netbook, which I'd put it in as part of an attempt to build a mobile GPS system, one of a couple of failed attempts that eventually ended up successful with the iPad, thus rendering the need for an SSD in the Aspire One obsolete. So I grabbed an old 160GB hard drive to put into the Aspire One to take the place of the SSD, and imaged it using the rescue disks that I'd purchased to put an image onto the SSD in the first place. Except it didn't boot. Okay, no big deal, I'll just haul a Windows XP rescue disk over there ... err... I don't happen to have one burned? No biggie, I'll just burn one. And since I have my beautiful Linux desktop sitting here, I'll just use that to do so.

So I pop the blank CD/RW disk into the Asus SATA CD/DVD writer, and attempt to burn it using the stock Brasero disk burner... and it's utter fail. Won't do it. In fact, locks up the system for 30 seconds because it locks up the SATA bus while not doing it. What the BLEEP?! So I go check Google, and it looks like Brasero has been broken on Fedora FOR YEARS, it'll write DVD's (sometimes) but CD's? No way! So I try a couple of other programs, and have similar results. Then I go down to the underlying 'wodim' program and again the same results. At that point I'm, like, "okay, maybe it's broken CD recorder hardware". So I pull out an external USB CD recorder drive -- one that I recently used on another system that doesn't have an internal DVD-RW drive -- and attempt to use that. Same result.

So maybe it's a bad disk? So I sling the disk into my MacBook Pro, erase it, and record it. It Just Works. As would have been the case, I'm 99% certain, if I had been trying to do this under Windows 7 -- it would have Just Worked.

And lest you think this is just my physical chunk of hardware and the particular version of Linux here at home, I have similar (lack of) success recording CD-R's at work on my Centos 6.3 desktop system. There, again, I have to record CD's on my MacBook Pro because Linux simply refuses to do it. Which infuriates me, because I could write CD's for *years* under Linux, but it's broken now, and nobody seems to care (at least, it's been broken in every release of Fedora since Fedora 10, which was, what, four years ago? Five years ago?), so ...

So let's recap: I can't view videos, I can't play music, and I can't burn CD-R's. I *can* do all of those things on Windows or MacOS. What Linux does well is serving data to networks. It does that very well, my Linux box serves as the Time Machine backup for my MacBook Pro as well as being the central data repository for the various Windows laptops I have hanging around. But the desktop? When it can't do simple desktop tasks that Windows and MacOS have done for literally years? Crack. That's my only explanation for why anybody would ever make a ludicrous statement like "this is the year of desktop Linux!" given the current state of Linux.


Wednesday, September 12, 2012

This is the droid you're looking for

I have a new toy now. I ditched my aging iPhone 4 upon completion of its contract (and ported its number into Google Voice), and now have a brand new Samsung Galaxy S3.

So far it's mostly all good. Battery life is bad, but we already knew that. I tried several different home screen programs but I'm sticking with TouchWiz for now because the updated one for the Galaxy S3 works as well as anything else I tried, even the backported Jellybean launcher. It has lousy reception inside company HQ but so did the iPhone 4, just an AT&T thing I guess (my Verizon iPad has great reception inside company HQ). I'm still looking for a clean solution for automatically syncing my photos into iPhoto, but iSyncr is doing a reasonably good job of getting them onto my Macbook so I'm not too displeased.

Thus far I've found substitutes for everything I did on my iPhone except one: There is no good offline GPS program like the Magellan program that I used on the iPhone. Supposedly TomTom is going to be remedying that soon. We'll see.

So anyhow, I have found one bug in the Galaxy S3's ICS Android version: It does not handle exFAT very well. I found this out the hard way when my 64GB microSD card quit working and reported, "Damaged SD Card". Indeed, checking the Internet, it appears that random exFAT corruption is an epidemic on the Galaxy S3. This afflicts any microSD over 32GB, since Microsoft officially says FAT32 won't go over 32GB. This is, of course, a lie -- FAT32 is quite capable of handling terabyte-sized filesystems -- but because Microsoft enforces this limit in all their filesystem tools, nobody knew it was a lie until they actually looked at FAT32 and realized hey, this will work with bigger filesystems! (Though there is still that nasty 4GB limit on file size to contend with).

So how did I resolve this problem? First, I put the flash into a Windows 7 laptop and ran chkdisk on it. This found and fixed some problems. But when I put it back into the Galaxy S3 it *still* said "Damaged SD Card" despite the fact that Windows 7 said it was clean. So I resolved to reformat as FAT32. I copied the data off, and then had to go find a tool that would actually format a 64GB microSD card as FAT32 since the Windows disk manager won't do so: EaseUS Partition Master.

At that point it was just a matter of copying the data back on, which was very... very... slow since Windows operates SD cards in sync mode. As in, an hour slow. I know where the async flag lives in Windows and could have flipped it, but it was trash night so I did chores around the house instead. At the end of the process I inserted the microSD into the Sammy and... no more "Damaged SD Card".

Executive summary: If you buy a microSD card with greater than 32GB capacity, it is likely that it is formatted with Microsoft's proprietary exFAT filesystem and will not work well Android unless you reformat it, even if it appears to work correctly at first. exFAT support is not supported well because it is patented by Microsoft and thus does not have the magic of dozens of eyes of Open Source developers noticing and fixing bugs in it. So reformat it using the EaseUS tool above (NOT the internal Samsung formatter, it'll put the buggy exFAT filesystem back onto it) *before* you put stuff on it. Otherwise you'll be going through this whole time-consuming dance yourself sooner or later. Fun, it was not.


Saturday, August 25, 2012

Windows 8 and taste

One thing about most geeks that I've noticed: They have horrible taste. You can look at their homes, their clothes, their cars, the trinkets scattered about their cubicles, it's all a horrible mishmash of ugly. The way that Apple addressed this was via the Stalinesque concept of the Chief Designer. You may laugh at that description, but the Soviet-era Soyuz rocket and space capsule, designed under the supervision of Chief Designer Sergei Korolev, are still flying today fifty years after their design process began because he had exactly the same kind of qualifications as Apple's Chief Designer -- good engineering taste that balanced simplicity, cost, performance, and capability into a pleasing whole.

What brings this to mind is Windows 8, which I'm using to type this while eval'ing the RTM product. I'm not disclosing any NDA stuff here, it's pretty much the same product you downloaded earlier as the "Consumer Preview" with a few pieces of missing functionality filled in (and undoubtedly many bugs fixed). Windows 8 is Microsoft's attempt to re-invent the user interface, but fails primarily because of two reasons: A lack of courage, and a lack of that chief designer.

The lack of courage part is where Microsoft flinched at the notion of completely re-inventing the desktop. As a result, they have the "classic" desktop available by hitting a button on the "Modern" desktop. The end result is a bizarre mishmash of two different desktop environments in one, twice the amount of stuff to learn if you're a user because the "Classic" desktop environment doesn't *exactly* work the same as the well-known Windows 7 desktop environment, while the "Modern" desktop... well, it's entirely different, period. Twice the amount for end users to learn is user environment fail, period.

The lack of that chief designer, however, shows even more in the design of the "Modern" desktop. A good design is clean, looks simple (even if it isn't), everything's laid out in an obvious manner, there's a limited number of things for end users to learn in order to be productive, and, for lack of a better word, it is tasteful. It doesn't look like a mishmash of unrelated ideas from multiple independent teams all smashed together into one product.

That, however, doesn't describe the "Modern" desktop at all. One of the things I noted about Gnome 3 was that you had to basically know one gesture -- how to move your mouse pointer to the top left of the screen (or touch the top left of the screen on a touchscreen) -- to make it useful to you. Everything else is pretty obvious touch an icon or touch and drag an icon (or the click-on and click-on-and-drag with a mouse) or scroll up and down using the mouse wheel or two fingers. With the "Modern" desktop, every single corner of the screen does something -- and does something *different* (with the exception of the right-hand corners, which does something the *same*). Furthermore, moving to a corner, waiting for the hover timeout, then moving your mouse up and down does something even *different*. And right-clicking does something even *more* different. The confusing number of things you can do, indeed, need to know how to do to make the environment useful, are well past the three things you need to know how to do to use Gnome 3.

In essence, it's as if a bunch of geeks got together and decided to take every idea from every touchscreen environment ever created anywhere, and put them all into the same user interface. It's as if every geek critic of Gnome 3's tasteful design got together and designed their perfect touchscreen environment with every feature they could think of. It's as if Larry Wall designed the thing. Folks, Perl is many things, but clean and easy to use are not among those things -- it's an ugly, nasty, piece of work that will spit you in the eye if you look at it wrong just like the camel on the cover of the definitive book on the language. Like said camel it also happens to be very useful (thus why I wrote the virtualization management infrastructure for our virtualized product line in Perl, because it was the most reasonable way to parse the output of all the low-level virtualization-related utilities that the various virtualization systems use for their low-level management), but nobody has ever suggested that end users be given Perl as their user interface to their computers.

So the question is, will Windows 8 succeed? Well, define "success". The majority of personal computers in the world next year will ship with Windows 8 pre-installed. And because everything in post-Gates Microsoft is an API and Microsoft is quite open with their API's (Apple, not Microsoft, is the "Do Evil" company in the post-Gates era), sooner or later someone is going to come up with a means to tame this mess. But I have to say that Windows 8 is, in the end, a disappointment to me. Microsoft had an opportunity to re-define how personal computers worked, and they have all the pieces needed in Windows 8 to do so. They just needed a tasteful Chief Designer with the power to impose order and taste upon this mess -- and, alas, it appears they have no Jon Ives or Sergei Korolev to do so.


Linux block layer, BTRFS, and ZFS On Linux

Long time no blog. Lately I've been stuck way down in the 2.6.32 kernel block device midlayer, both initiating I/O to block devices via the submit_bio interface, and also setting up a midlayer driver.

What I'm finding is that things are a bit of a mess in the 2.6.32 kernel when it comes to device pulls and device removals. When I chug down into the scsi midlayer I see that it's supposed to be completing all bios with -EIO, but there's still situations where when I yank a drive out of the chassis, I don't get all of my endios back with errors because of races in the kernel between device removal and device teardown. The net result is I have no idea what actually made it to disk or not. Note that you will NOT see this racy behavior on a normal system where the completion (almost) always wins the race, I was generating thousands of I/O's per second to 48 disks with CPU usage pretty much maxed out as part of load testing to see how things worked at the limits.

Now, that's no problem for my particular application, which is somewhat RAID-like, or for the Linux MD layer, or for single standalone block device filesystems for that matter. What's on the disk is on the disk, and when the standalone filesystem is remounted it'll know its state at that point by looking at the log. For the RAID type stacking drivers, when it comes back the RAID layer will note that its RAID superblock is out of data and rebuild the disk via mirror or ECC recovery, a somewhat slow process but the end result is a disk in known state. So when I get the disk removal event I mark the device as failed, quit accepting I/O for it, and mark all the work pending for that device that hasn't already endio'ed as errored, and if an endio sneaks in afterwards and tries to mark that work item again as errored, no big deal (although I put a check in the endio so that it simply noops if the work already was completed by the disk removal event). This means I have to keep a work pool around, but that's a good idea anyhow since otherwise I'd be thrashing kalloc/kfree, and if the drive comes back I'll re-use that pool again.

So traditional RAID and standalone devices don't have a problem here. Where a problem exists is with filesystems like btrfs and zfs that do replication on a per-file-block level rather than on a per-block-device-block level. If they can't log whether a block made it or not because they never got an endio, they can get confused. btrfs appears to err on the side of caution (i.e., assumes it didn't get replicated, and replicate it elsewhere if possible) but when the missing volume comes back and has that additional replica on it, strange things may happen. ZFSonLinux is even worse, since its SEL (Solaris Emulation Layer) appears to assume that bios always complete, and deadlocks waiting for bio's to complete rather than properly handle disk remove events. (Note that I haven't really gone digging into SEL, I'm basing this solely on observed behavior).

The good news: The popularity of btrfs among kernel developers appears to be motivating the Linux kernel team to fix this situation in newer kernels. I was chugging through the block subsystem in 3.5 to see if there was something that could be backported to make 2.6.32 behave a bit better here, and noticed some significant new functionality to make the block subsystem more flexible and robust. I didn't backport any of it because it was easier to just modify my kernel module to behave well with the default 2.6.32 behavior (I'm talking *extensive* changes in the block layer in recent kernels), but it appears that the end result is that btrfs on the 3.5 kernel should be *significantly* more reliable than the backported version of btrfs that Red Hat has put into their 2.6.32 kernel on RHEL 6.3.

So that's my recommendation: If you want to run btrfs right now, go with the latest kernel, whether it's on RHEL 6.3 or Fedora 17 or whatever. And you know the reason for my recommendation now. Red Hat has *not* backported any of those block layer changes back to RHEL 6.3, so what you have with btrfs on their stock 2.6.32 kernel is the equivalent of having a knight in shining armor that's missing the breast-plate. Sort of renders the whole exercise useless, in the end.


Thursday, July 5, 2012

Open Source community and project failure

I had a nasty experience with an open source project recently where members were obnoxious and dismissive about a bug and possible work-arounds, and that reminded me of some things that happened back in must have been spring of 2000. I was working for a vendor of tape backup software then and we were scoping a new project that was going to take advantage of all the goodies that SCSI gave us. Prior to then, most tape backup for small computers was nasty QIC-80 type thingies attached to the floppy controller or parallel printer port, but SCSI gave us oodles of info and allowed us to do things the floppy streamers could never have dreamed of. Our goal was simple: To have plug-and-play use of tape drives and tape robots where the customer plugged in the backup hardware, installed our software, and it Just Worked. (There's those two words again!).

So anyhow, I was scoping out the tape format that we were going to use. I had a pile of tape hardware that filled an entire lab, and a pile of tape drive vendor's SCSI programming manuals that was stacked about four feet high on my desk. We had decided that we were going to keep the tape catalog in a MySQL or PostGreSQL database, but we also wanted a copy of the tape catalog on the tape itself so that tapes could be quickly re-imported into a new install of the software in the event, say, of the building burning down and the off-site backups needing to be pulled in on a new server. Without an on-tape copy of the catalog importing a tape would be a slow and painful process of reading the entire tape from front to back. The question was, where would we put the tape catalog? My boss said, "what about a partition at the beginning of the tape?" That seemed rather interesting, because usually once you start writing a tape, all you can do is either overwrite it, or append to the end of it -- you can't come back to the start and start writing again (this is because of the way the tape drives' internal compression works, amongst other things). Could we in fact do this?

So I investigated that, and found that the tape partitioning ioctl in Linux only worked with Seagate 4mm DAT tape drives. It didn't work with Exabyte drives, or Tandberg drives, or that brand new HP LTO drive, just with Seagate drives. I investigated the CDB that was being sent by the tape driver and looked at the Seagate manual, Exabyte manual, and Tandberg manual, as well as at the SCSI SMC standards document itself, and realized: That "standard" had enough holes to drive a bus through. There were basically two different ways you could interpret it, and Seagate interpreted it one way, and everybody else interpreted another way. I came up with a way of detecting in real time which partition CDB format that the tape drive used based on reading the partition table via the mode page for that, and modified the tape driver to do it correctly for every tape drive that implemented tape partitioning support -- every single SCSI drive, other than Quantum's DLT. Which, uhm, was the most popular enterprise drive. Oh well. So we didn't use tape partitions, instead we simply set a tape mark at the end of a data stream and appended the catalog to the end of the tape instead.

So anyhow, I contacted the maintainer of the Linux tape driver with my patch and an explanation of what it was doing and why the old way only worked with DDS and this would work for (long list), and he looked at it and he didn't like it, so he re-wrote it a slightly more elegant way and sent it back to me. I tried it out, it didn't work on one of the drives, I fixed one thing he'd slightly mis-interpreted and sent it back to him. I think we did a couple of iterations of this, and finally it passed both my tests for whether it actually worked on my tape drives, and his taste for what he wanted his tape driver code to look like, and it went into the next version of the Linux kernel that was released.

The other issue with tape format was that we wanted to be able to quickly retrieve a single file. It turned out that even the primitive DLT drives implemented both block location inquiry and block seek. Our archive format already aligned headers at the start of tape blocks so you could start seeking at the start of a tape block and locate the file, this wasn't a gzip'ed stream, we did things a bit smarter than that (we did something similar to what ZFS does when you turn on ZFS compression). So I hacked our data engine to issue a block position ioctl at the start of each file (and also print the file name and position) and hacked it to accept a block location and issue a block seek ioctl to find a file, as a proof of concept to verify that this would work. This worked fine on Linux and worked fine on SGI Irix. It did not work at all on Solaris -- Sun hadn't implemented *any* of the modern features of tape drives in their tape driver. We hacked our data engine to use the Solaris sgen (SCSI Generic) device directly (i.e. issue raw SCSI CDB's to read, write, seek, etc.) rather than use the tape driver, but never were satisfied with that and put that aside.

Then there was FreeBSD. I liked FreeBSD. I ran FreeBSD on my own server at home. I tried this on FreeBSD and writing tapes was very... very... slow when I had the 'print position' code enabled. Ridiculously slow. So I looked at the FreeBSD tape driver and realized what was happening. There was a call to cache flush right before issuing the position request CDB. This caused every tape drive in our lab to basically come to a screaming halt every time the ioctl happened. I commented out that single line of code, and everything worked at full speed on every single enterprise-class SCSI tape drive that was currently shipping to paying customers anywhere in the Western world. So I contacted the FreeBSD tape driver maintainer to remove this unnecessary cache flush, and...

"But the SCSI standard says that the reported tape position may not be accurate unless you issue the cache flush."

"We've tested every single enterprise drive currently shipping on the planet. They all stall and become unusably slow when you issue a cache flush anywhere but at the end of a dataset, and they all report the position accurately without the cache flush, we can go back to the reported position and always find the data we expected there."

"But... the SCSI standard..."

"Have you found any drive that actually complied with the SCSI standard? The fact is that the cache flush makes the ioctl useless because it makes it too slow, and every tape vendor seems to ignore that little note in the SCSI standard and just do what we expect." (Followed with performance data on actual drives with / without cache flush and noting that this worked properly *without* the cache flush on both Linux and Irix, neither of which had the cache flush in their tape position ioctl).

It went on and on and on. I was an interloper from outside their little community, so I had to be put into my place, despite the fact that all I wanted was for the stupid ioctl to Just Work, as it did on Linux and Irix, so that we could support FreeBSD as a Tier 1 platform rather than relegated to a Tier 2 platform (one which could not host tape drives but merely be backed up to another server that was hosting tape drives). Eventually after much grumbling about "well okay, it is wrong according to the SCSI standard but if that's how the tape drives actually work, we'll accept this change" they queued it up for a future release of FreeBSD, but by that time it was rather too late in the process -- we'd moved on to actually implementing our product, and FreeBSD wasn't scoped because there was no version of FreeBSD currently shipping that had the tape driver functionality we needed for our product.

Whenever I think about that difference between the Linux and FreeBSD communities in 2000, the reason Linux rules the world and FreeBSD has been relegated to a side note becomes a bit clearer. At the beginning, FreeBSD was clearly technically superior to Linux. For example, the FreeBSD scheduler could handle heavy loads without falling over, and the FreeBSD memory management system could handle heavy memory pressure without falling over, and neither was true of Linux in 2000. But the deal is, the world is not a meritocracy. The majority of the world will accept "Good Enough" if it means they don't have to put up with a lot of friction. The FreeBSD community in 2000, reeling from loss after loss to Linux with their biggest deployment (Hotmail) being converted to lowly Windows, was defensive and paranoid and not very pleasant to deal with (it's considerably better now BTW). Experienced IT people going into that environment and getting that kind of attitude simply rolled their eyes and went with that Linux thing, which was Good Enough for a lot of things even if it wasn't as good as FreeBSD.

So what are my takeways? Well, if you're involved in an Open Source community, whether as a code contributor, supporter, or just general advocate, here's my takeaways:

  • Stowe the attitude. If you appear defensive and hostile, experienced IT folks simply won't use your stuff.
  • If someone contacts you with a problem, and a proposed solution, don't dismiss them as an idiot until you actually evaluate the technical content of their message. If you don't agree with them, give technical reasons why their solution doesn't work, don't make personal attacks upon the person calling them an idiot or stupid or incompetent or whatever.
  • If it's clear that someone is entirely off base, either reply "Thank you for your contribution, we will consider it" or simply hit the DEL key.
  • Police your fanboys if they're trashing your mailing list or forums giving attitude to folks who have problems, suggestions, or contributions. One of the things that hindered uptake of Linux in that era was the Linux fanboyz with their "Linux rulez Windows drulez" attitude. IT people want to solve problems, they're not interested in penis measuring contests, and they're going to use the solution to their problem that comes with the least baggage. It doesn't matter whether your project really *is* the best or not, I repeat once more: The real world is not a meritocracy. If some other product is Good Enough, Just Works, and comes without the baggage, guess whose product is going to win?
And this is rather too long, so one final takeaway: Treat every person who contacts you as if they are the second coming of Dennis Ritchie. It doesn't matter if your first impression is that they're the biggest moron on the planet. They are your customer. They are the people who you want to use your product, assuming you're writing your open source software to be used by other people, not just as a private project for your own personal use. Even the dumbest customer has something to contribute, if only a lesson in how stupid customers can be so that you can engineer your product to survive their unkind ministrations. Treat them as if they have something to contribute, and you may be surprised to find out that they do have something to contribute. Treat them like cr*p, and they'll return the favor and your project will die on the vine.

And that's my takeaway. Sorry about the long history lesson, but there was a purpose in it -- and hopefully you've figured that out by now. So it goes.


Wednesday, June 27, 2012

End of the FreeBSD ZFS experiment

So my write performance with ZFS on FreeBSD was abysmal. The final straw was when I was copying from one pool to another and it was running at roughly the same speed as a floppy disk. I made numerous attempts to tune both ZFS and the iSCSI initiator, and nothing that I tried made any real long-term difference. Things would speed up after I tweaked stuff, then slowly settle back down to a tedious crawl.

Out of frustration I created a 64-bit Centos 6.2 image with the exact same stats as the FreeBSD image and installed the native Linux port of ZFS. This requires a stub that goes into the kernel to meet licensing requirements, then compiles against that stub code. I then shut down the FreeBSD virtual machine, installed the iSCSI initiator on the Linux machine and scanned both of my iSCSI storage arrays to tell it about the initiator, then went to the storage arrays and unassigned the volumes from the FreeBSD machine and assigned them to the Linux machine instead. Then I scanned and logged them in on the Linux machine, and did the following command at a root login:

zpool import -f -F -m -a

They imported cleanly and everything came up.

So the next thing I did was set off my copy going. I am ZFS-mirroring between the two iSCSI storage arrays and I only have a single gigabit Ethernet port from my ESXi box to the storage arrays, so this put a maximum throughput of roughly 100 megabytes per seconds for both read and write. ZFS did this throughput to the storage arrays handily.

So clearly the problem is not ZFS. And FreeBSD has been shown to have good ZFS performance with DASD (direct-attached storage devices). So the fundamental problem appears to be the FreeBSD iSCSI initiator. I don't care enough to diagnose why it's so terrible when used with ZFS despite the fact that I hit all the tuning flags to turn up the queue depth etc., but the end result is that ZFS combined with iSCSI on FreeBSD is a no-go.

On Linux, BTW, it Just Worked once I built the zfs RPM's and installed them. I'm performing at the full speed of my network. And remember, that's the ultimate role of computer technology -- to Just Work, leaving the hard stuff of deciding what's going to go onto that server to the humans. My goal was to move bytes from point A to point B as fast as my ESXi system could do so, given the fact that I need to set up Etherchannel on my antique SMC switch to do ESXi trunking to get data from point A to point B any faster than gigabit Ethernet will take me. (I don't know if the antique will even do it, this is a production ESXi box so I have to set an outage time at an oddball time before I can move the iSCSI network Ethernet wires to the supposedly Etherchannel-trunked ports and flip the ESXi vswitch bit to ip-ip hash to split the traffic between the two trunked ports). So while it sucks that I have to manually build and install ZFS on Linux, the final result appears to work far better than it really should, considering the very beta-quality appearance of that ZFS On Linux site and the rapid updates they're making to the software.


Tuesday, June 26, 2012

ZFS caveats and grumbles

This is a followup of how ZFS has worked in my test deployment.

First of all, don't even try ZFS deduplication on any real data. You require about 2GB of memory for every 100GB of deduplicated data, which means that on a 2 terabyte filesystem you'd need around 400GB of memory. If you don't have that much memory, ZFS will still work... at about the same speed as a 1985-vintage Commodore 1541 floppy drive or a 9600 baud modem. So reluctantly I have to say that ZFS's deduplication capabilities are pretty much a non-starter in most production environments.

Compression, on the other hand, appears to work much better. When I turn on compression the system gets no slower as data gets added, it still remains the same level of slow.

Finally: ZFS supposedly is a "zero startup time" filesystem. But the reality is that if a pool gets shut down uncleanly, when you start up ZFS it does a potentially very lengthy integrity check, as in, can take as long as an hour to run. While still better than NTFS or Linux ext4 (both of which can take hours for any reasonably-sized filesystem), "quick" recovery from an unclean system outage is relative -- don't rely on the system being back up within minutes if someone managed to yank both power cords out of the back of your server while trying to decant some *other* system from the rack.

Next up: FreeBSD has an iSCSI initiator. But it is not integrated into system boot in any way, and ZFS if you enable it in rc.conf comes up first thing, long before networking, so you cannot enable ZFS in rc.conf (or use it as a system filesystem) if your storage array is iSCSI-connected. My rc.local looks like this now:

bsdback# cat rc.local
iscontrol -c /etc/iscsi.conf -n tzpool1
iscontrol -c /etc/iscsi.conf -n mzpool1
iscontrol -c /etc/iscsi.conf -n tzpool2
iscontrol -c /etc/iscsi.conf -n mzpool2
sleep 5
/etc/rc.d/zfs onestart
/etc/rc.d/mountd stop
/etc/rc.d/mountd start

The mountd stop / start is necessary because mountd started up long before, noticed that the zfs mountpoints in /etc/exports didn't actually exist yet, and didn't export them. If you are exporting ZFS mountpoints via NFS on FreeBSD, this is the only way to make it happen correctly as far as I can tell -- even if you exported them via zfs, mountd starts up, looks at the zfs exports file, and refuses to export them if zfs isn't up yet.

And rc.shutdown.local looks like:

/etc/rc.d/mountd stop
/etc/rc.d/nfsd stop
/etc/rc.d/zfs onestop

This is the only way I can get the sequencing right.

Note that Red Hat Enterprise Linux has had the ability to properly sequence filesystem bringup so that iSCSI filesystems get mounted after networking (and iSCSI) comes up for quite some time -- since Red Hat Enterprise Linux 4, circa 2004, in fact. RHEL has also had the ability to automatically bring up the iSCSI targets after networking and before mounting network file systems via their equivalent of the rc.conf system (SysV Init) since 2004. This is an area in which FreeBSD lags, and should be justifiably flamed as lagging. You should not need to write custom rc.local scripting to bring up standard parts of the FreeBSD operating system in the correct order, it should Just Work(tm) after properly setting up the rc.d dependencies and ZFS flags to make it Just Work. ZFS needs to have a pre-networking and post-network two stage bringup so that any pools not located during the pre-networking stage can be searched for during the post-networking stage, and iSCSI needs to have a its own rc.d script that brings it up at the end of the networking stage.

All in all, ZFS on FreeBSD is working for me in my test deployment in my IT infrastructure, but it's not as seamless as I had hoped. When I look for IT technology I look for something that Just Works(tm). ZFS on FreeBSD would Just Work(tm) if I were using DASD's, but since I'm using network-attached storage, it's more of an erector set scenario than I'd like.

Thursday, June 14, 2012

ZFS -- the killer app for FreeBSD?

Okay, so here's my issue. I have two iSCSI appliances. Nevermind why I have two iSCSI appliances, they were what was available, so that is what I'm using. So I want my backups mirrored between them. Furthermore, I want my backups to be versioned so I can access yesterday's backup or last month's backup without a problem. Furthermore, I want my backups to look like complete snapshots of the system that was backed up as of that specific point in time. Furthermore, because my data set is highly compressible in part and highly de-duplicable in part, I want it compressed as necessary and dedupe'ed as necessary.

So, how can I do this with Linux? Well... there is a LUFS version of ZFS that will sort of, maybe, do it in a way that regularly loses data and performs terribly. There is BTRFS which is basically a Linuxy re-implementation of ZFS that does compression but not deduplication, and which is still very much a beta-quality thing at present, they didn't even have a fsck program for it until this spring. And ... that's it. In short, I can only do it slowly and buggy.

So at present I have a FreeBSD virtual machine in my infrastructure happily digesting backups and bumping the snapshot counter along. And ZFS is a first-class citizen in FreeBSD land, not a castaway in LUFS-land like on Linux. I'd love to use BTRFS for this. But BTRFS today is at about the same stage as ZFS on Solaris in 2005, when it was an experimental feature in OpenSolaris, or ZFS on FreeBSD in 2008 when the first buggy port was released. ZFS on FreeBSD is stable and rock solid today, and BTRFS, realistically, isn't going to be stable and rock solid for another three or four years at least.

So if you haven't investigated ZFS on FreeBSD to manage large data sets in a versioned, compressed, and deduplicated fashion, perhaps you should. It solves this problem *today*, not a half decade from now. And a bird in hand is worth a dozen in four years.


Saturday, March 17, 2012

Random notes on iSCSI storage

When you're using eSXI/vSphere for your virtualization host, iSCSI storage is actually useful. That's because VMWare's vmfs3 filesystem is by default a clustering filesystem. What that means is that if your iSCSI target is capable of operating in cluster mode -- i.e., accept initiators from multiple hosts connected at the same time -- iSCSI block storage can be used for ultra-quick failovers on your VMware servers (amongst other things it can be used for). And the performance is *significantly* better than NFS datastores, because VMware can store vmdk files as physically contiguous extents with vmfs3, while VMware has no control of how a NFS server physically lays out vmdk files on disk. This is important because all modern operating systems use an "elevator" algorithm for their filesystem cache flushes that assumes that the underlying block storage is physically contiguous from block 0 to block n, and if the underlying storage is *not* physically contiguous, you end up with either the possibility of lost writes (if the NFS host is running in asynchronous mode) or with the NFS host's disks thrashing all over the place and performance sucking like a male prostitute at a Republican convention.

So anyhow, just wanted to share a technique I used to rescue a failing machine. The machine involved was a Red Hat Enterprise Linux 4 machine that I wanted to migrate to virtualization for the simple reason that one of its drives had failed. 30GB of the first drive was used for actual data, most of the system was empty.

So first thing first, I created a blank virtual machine on the ESXi host and told VMware to create a drive big enough to hold all the data on the old RHEL4 machine. Then I connected that virtual machine's hard drive to a Centos6 virtual machine as a virtual hard drive. Then I exported that virtual hard drive via tgtd / iSCSI to the RHEL4 machine and connected to that target from the RHEL4 machine's iSCSI initiator. On the RHEL4 machine I then dd'ed the first hundred blocks from its physical hard drive to the iSCSI hard drive (which was something like /dev/sdc, I'd checked /proc/partitions before telling the target to scan so I could know what showed up), did a 'sfdisk -R /dev/sdc' to re-read the partition table on /dev/sdc, then copied the /boot partition (after unmounting it) as a byte by byte copy: 'dd if=/dev/sda1 of=/dev/sdc1'. Then I did

  • pvcreate /dev/sdc2
  • vgcreate rootgroup /dev/sdc2
  • lvcreate -n rootvol -L 16G rootgroup
  • lvcreate -n swapvol -L 2G rootgroup
  • lvcreate -n extravol -L 16G rootgroup
  • vgscan -a
  • lvscan -a
  • mkfs -t ext3 /dev/mapper/rootgroup-rootvol
  • mkswap /dev/mapper/rootgroup-swapvol
  • mkfs -t ext3 /dev/mapper/rootgroup-extravol
I then mounted my new volumes in their correct hierarchy (so that when I chrooted to them I'd see /boot and etc. in their right places) and did your typical pipelined tar commands to do file-by-file copies of / and /extra to their new location, and while that was going on I edited /etc/fstab and chrooted to the new environment and mounted /proc and /sys and did a mkinitrd to capture the new root volume. Though I do suggest that you have the rescue disk handy as an ISO image on an ESXi datastore so you can mount it in case of problems -- which I did, but unrelated to any of this (it was related to the failure that caused me to do the migration in the first place).

So how did this data transfer perform? Well, basically at the full speed of the source hard drive, which was a 500GB IDE hard drive.

Anyhow, having used the Linux iSCSI target daemon, tgtd, here as well as extensively for other projects, let me just say that it sucks big-time compared to "real" targets. How does it suck? Let me count the ways:

  1. Online storage management simply doesn't exist with tgtd. You can't do *anything* to manage a iSCSI target that someone's already connected to, you can't even stop tgtd!
  2. For that matter, storage management period doesn't exist with tgtd. For example, you can't increase the size of a target once you've created it by adding more backing store to an already existing up and running iSCSI target, it simply is.
  3. tgtd gets into regular fights with the Linux kernel about who owns the block devices that it's trying to export. It's basically useless for exporting block devices because of that -- if there's a md array on the block device or a lvm volume set on the block device, the Linux kernel will claim it long before tgtd gets ownership of it. Thing is, you don't have any control over what the initiator puts onto a block device, so you're kind of stuck there, you have to manually stop the target, deactivate the RAID array and / or volume group, then manually start the target in order to get control over the physical device to export it.
  4. tgtd has the most obscure failure mode I've ever encountered: if it can't do something it will still happily export the volume, just as a 0-length volume. WTF?!
My conclusion: tgtd is a toy, useful only for experimenting and one-off applications. It doesn't have the storage management capabilities needed for a serious iSCSI target. Some of that storage management could be built around it, but the fact that you cannot modify a tgtd target while anybody is connected to it means that you can't do things that the big players -- or even the little guys like the Intransa appliance that I'm using for the backing store for my eSXI host -- have been able to do for years. Even on the antique nine-year-old Intransa realm that's hosting some of our older data (which is migrating to a new one but that takes time) I can expand the size of an iSCSI target in real time, for example. I then tell my initiator to re-scan, it notices "hey, my target has gotten bigger!" and informs the kernel of such, then I can use the OS's native utilities to expand a current filesystem to fill the additional space. None of that's possible with tgtd for the simple reason that tgtd won't do real-time live storage management. Toy. Just sayin'.


Friday, March 2, 2012

Best practices for virtualization

A series of notes...

  1. vSphere/ESXi: Expensive. Inscrutable licensing scheme -- they have more SKU's than my employer, almost impossible to tell what you need for your application. Closest thing to It Just Works in virtualization. Call them the Apple of virtualization.
  2. Xen : Paravirtualization gives it an advantage in certain applications such as virtualized Linux VM's in the cloud. Paravirtualization generally is faster than hypervirtualization, though most hypervisors now include paravirtualized device drivers to ease that pain. Xen doesn't Just Work, it's more an erector set. Citrix's XenServer is the closest that Xen gets to vSphere's 'Just Works', I need to download it and try it out.
  3. KVM : The future. Integrating the hypervisor and the OS allows much better performance. That's why VMware wrote their own kernel with integrated hypervisor. Current issues: Management is the biggest difficulty. There is difficulty creating clustered filesystems for swift failover or migration of virtual machines (ESXi's VMFS is a cluster file system -- point several ESXi systems at a VMFS filesystem on a iSCSI or Fiber Channel block storage, and they'll all be able to access virtual machines on that system). Most KVM systems set up to do failover / migration in production use NFS instead, but NFS performs quite poorly for the typical virtualization workload for numerous reasons (may discuss later). Closest thing to VMFS performance for VM disks is using LVM volumes or clustered LVM (if using iSCSI block storage), but there are no management tools for KVM allowing you to set up LVM pools and manage them for virtual machine storage with snapshots and so forth. Virtual disk performance on normal Linux filesystems, via the qcow2 format, sucks whether you're talking ext4, xfs, or nfs. In short, the raw underlying bits and pieces are all there, but there is not a management infrastructure to use them. Best practice performance-wise for clustered setup: NFS share for metadata (xml description files of VM's, etc.), iSCSI or Fiber Channel block storage possibly sliced/diced with clustered LVM for the VM disks.
So what am I going to use today if I'm a busy IT guy who wants something that Just Works? VMware vSphere. Duh. If, on the other hand, I'm building a thousand-node cluster, a) it's probably my full time job so I have time to spend futzing with things like clustered LVM, and b) the cost of vSphere for a cluster that large would be astronomical so would decidedly make paying my salary to implement Xen or KVM on said cluster more palatable than paying VMware.


Random notes on automating Windows builds

  1. Install the version control system of choice. In this case, bitkeeper, but any other CLI-drivable vc system will work.
  2. Check out your directory tree of the various products you are going to be building.
  3. Install the *real* Microsoft Visual Studio 10
  4. Create a solution (Microsoft-speak for "makefile" though it isn't) for each of the individual solutions you are building as part of your overall product and make sure each solution builds. This will be saved in the file Foo.vcxproj (for some solution named Foo) in each solution's root directory.
  5. Add the directory that 'devend' and 'nmake' lives in to your PATH in your system config. Control Panel->System->Advanced SYstem Settings->Environment Variables, edit user variable Path. My Path looks like: C:\Users\\bitkeeper;C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE;C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin
  6. Create a standard Unix-style makefile in the parent directory that has a list of the subdirectories to recurse into, then in each subdirectory, a Makefile that has 'devenv /build Foo.vcxproj' to build and 'devend /clean Foo.vcxproj' to clean.
  7. Test your make file with 'make', make sure that the proper .exe files are produced in each of your subdirectories.
  8. With Visual Studio closed, install Wix and Votive
  9. Use Votive to build a WiX project file and "compile" it to XML.
Once you've done this, then you can edit the Makefile at the root of your project so that after it recurses the directory, it runs the WiX commands:
  1. candle product.wxs
  2. light product.wixobj
The output should be product.msi.

Install Jenkins to integrate with the source control system, check from time to time for new checkins, then fire off a build via use of nmake when the checkins happen. Jenkins does run under Windows, with some caveats. See e.g. Setting up Jenkins as a Windows service. Biggest issue may be getting email to go out on Windows, will have to investigate that further once I get to that point.


Wednesday, February 29, 2012

Microsoft and the epitomy of fail

Ever wonder why Microsoft lost the Internet? They did. Nobody uses Microsoft products for Internet services except a few losers that don't know any better. The cloud? It's a Linux world, baybee.

Well, here's the deal that I've found out this past six weeks or so: Developing for Microsoft products is a painful and expensive process. What that means is that Microsoft has lost the student and hobbyist markets, and that's where the people who create new stuff come from -- they don't come from stuffy old companies that have $10K hanging around for Microsoft licenses for a single engineering team.

So anyhow, I recently created my first tray app. It used a couple of MFC functions where I could not find an equivalent anywhere else in the system that would do what I needed done. Because of that, it would not compile with Microsoft's "student and hobbyist" product, Microsoft Visual C++ Express. Once I did a workaround for that (temporary install of the demo version of the full Microsoft Visual Studio just to get access to the MFC include files and library), then there was another issue: Visual C++ Express won't create a .msi installer file for distributing your program. I worked around this by using an open source program called WiX to build an XML template for my package, and successfully managed to create a .msi file. I installed it. I used the control panel and uninstalled it. Yay.

But here's the problem. I'm a hobbyist when it comes to Microsoft software. As a hobbyist I only write Open Source software (if you want closed source software out of me I'm happy doing it, just pay me $$$, but that's not my thing with hobby software). As a hobbyist I don't release software that can't be compiled unless you pay Microsoft hundreds of dollars for the full Microsoft Visual Studio, or do possibly-illegal workarounds like installing demo versions on top of crippleware. So you'll never see this tray app and it's probably the last time I write anything for Windows that I'm not paid to write.

Multiply that decision by tens of thousands of hobbyists who look at the same situation and instead go write software for Linux, and you understand why Microsoft lost the Internet. I did this because I needed to learn how to work with Microsoft's software to do some stuff at work, but most people simply aren't as driven as me when it comes to learning new things. They take the easy path... and that's Linux.

Too bad, Microsoft... your stuff actually isn't that bad, it's ugly and a bit incoherent but then so is Linux. But if you make it hard for people to get used to writing software for your platform on a hobby basis, you lose. It's that simple. Microsoft Visual Studio Express is useless -- you can't write real programs for Microsoft without MFC, and creating MSI files should Just Work rather than having to go search for a third-party tool. With Linux, it Just Works -- fire up Eclipse and start developing C++ or Java programs, that simple. With MacOS, it Just Works -- fire up XCode and start developing Mac programs in Objective C or other supported languages, that simple. Windows? It's a painful experience trying to get around cripple-ware. And once that happens, there's one word to describe your company: Fail.

What's Microsoft's future? Microsoft really has no future other than as a legacy company. They lost the Internet and the Cloud because they preferred wringing money out of their devtools division (and it can't be a huge amount of money even) to fostering the next generation of innovators with free or inexpensive tools to use to write software for Microsoft's platforms, and they'll lose the next major innovation to happen in computing too. And one of these days, the accumulated sum total of these innovations will render Microsoft as irrelevant as Unisys -- just another legacy company milking its legacy products for service income long after they've become irrelevant to the majority of the industry.


Sunday, February 5, 2012

Gnome 3, Mac OS, and the Geeks (Part 2)

So in part two of this comparison, we're going to talk about workspaces.

Both Gnome 3 and Mac OS Lion have the concept of a linear ribbon of "workspaces" or virtual desktops. Gnome 3 lays out its workspaces vertically, while Mac OS Lion lays out its workspaces horizontally. So how do they compare on some routine workspace operations? Let's see...

Switching to Workspaces:

Gnome 3: There are two basic ways to do this: 1) Press the Windows key or swoosh mouse to left top of screen in order to activate the Activities screen. Move the mouse to the right side of the screen to make the workspace list pop out and select the workspace you wish to be in. OR: 2) Press CTRL-ALT-down to go to the next workspace down, or CTRL-ALT-up to go to the next workspace up. Mac: 1. Press F3 button on a recent Mac to go into Mission Control. You can also set a multi-touch gesture to do this (mine is three fingers up on the trackpad). Your workspaces will be listed horizontally at the top of the screen, move your mouse to and click on the one you want to go to. 2. Assign a multi-touch gesture to next-workspace and previous-workspace. Mine is three fingers left/right on the trackpad. 3. Assign a key sequence to next-workspace and previous-workspace. Mine is CTRL-left and CTRL-right. Because I have an Apple laptop, I can use multi-touch gestures to move left and right and to activate the Workspace switcher. This gives me one more option on the Mac. But reality is that navigating to a workspace is ridiculously easy on both systems, either from keyboard or via mouse/trackpad.

Creating Workspaces:

Gnome 3: There is always one "blank" workspace at the end of the list of workspaces. If you move to that blank workspace and open a window there, a new blank workspace is created after it automatically, without you having to do anything.

Mac OS Lion: Press the F3 function key or use the multi-touch gesture to get to Mission Control. Move your mouse pointer to the right top of the screen. A new shadowed-out workspace will pop out of the ether. Click on it. You will now have a blank workspace to work in.

The Gnome approach can be done without touching the mouse, and actually requires no intervention on your part to create the new workspace -- it simply gets created when you need it. It Just Works, which is what a computer is supposed to be -- something that Just Works, without you having to do fiddly things to make it work. It will be interesting to see whether Apple or Microsoft copy this feature in their next release.

Deleting workspaces

Gnome 3: When you close the last window on a workspace, the workspace is deleted and you are then placed on the Activities screen, from whence you can select another workspace to work in using either CTRL-ALT-up/down or the workspace pop-out at the right of the screen. This prevents workspace clutter where you have lots of those automatically-created workspaces hanging around. (Note: I am aware of the "persistent workspace" plugin, I am comparing stock configurations).

Lion: Activate Mission Control. Move to the top left corner of the little workspace icon you want to zap. A little X will appear. Click on that X. The workspace will go bye-bye, and any windows on it will be moved to the first workspace.

Again the Gnome 3 approach to this appears to me to be much simpler than the Lion approach. There's no fine motor skill needed to move the mouse pointer to the exact point where the X will appear, you simply close your windows and poof, you're done. Nothing to remember, nothing to discover, it Just Works.

Moving Windows to a Workspace

For both Lion and Gnome 3 you simply move to the workspace containing the window you want to move, trigger the Activities or Mission Control screen, grab the window you want by clicking on it, and drop it on the icon of the workspace you want to move it to.


So that's workspaces. As you can see, Gnome 3 is quite competitive with the state of the art that is Mac OS Lion. So that brings up the next question: Why do so many of the hard-core Linux geeks hate Gnome 3? I'll discuss that in the next installation of this series.


Wednesday, February 1, 2012

Gnome 3, Mac OS Lion, and the geeks (Part 1)

I've been using Gnome 3 on Fedora 15/16 and Mac OS Lion on a new Macbook Pro for several months now. What I see are two platforms that have done a significant re-thinking of the user interface to deal with some unpleasant facts:

  1. The old paradigm of using menus to select programs has reached its expire-by date because the menus have attained a depth that nuclear submarines would love to attain
  2. The population of most major technological countries is aging, and our old eyes simply cannot read the print on those tiny little pull-down menus anymore,
  3. Fine motor control of old folks is pretty bad. We can manage to swoosh the cursor to the corner of the screen, or hit a big icon, but fiddly little menu items are hard for us to nudge a mouse into the right box to navigate a pull-down menu, and finally,
  4. Pull-down menus simply aren't compatible with small touch-screens, because fingers are too fat to select tiny little things and you can't see them on a small touch screen like on a tablet anyhow.
In the process of dealing with these unpleasant facts, Mac OS Lion and Gnome Shell have also done a major re-thinking of how you do many common tasks in order to reduce the number of mouse movements / key strokes needed to do them. So let's look at some common tasks...

Program selection, Gnome 3:

Method a: Swoosh the mouse to the top left of the screen (one movement). Swoosh the mouse to the 'Applications' tab (one movement). Click left button. Select program from list of icons (possibly using the scroll wheel to scroll up and down the list).

Method b: Press the Windows key. Type the first couple of characters of the program you want to run. Use the arrow up-down to move the highlight to the icon of the program you want to run. Press ENTER.

Program selection, Mac OS Lion

Method A: Move the mouse to the icon of a rocketship on the toolbar. Click. Move mouse pointer to icon of program to run, possibly using left-right wobble wheel on your mouse or left-right two finger swipes on trackpad to go to next page of programs. Click on program.

Method B: Set a hot corner in preferences (I set bottom left), swipe mouse to there, move mouse to program to run, click.

Method C: Set a hot key in preferences such as control-opt-l, use left-right arrow keys to navigate pages. Unfortunately Mac OS has no way that I can find of using the keyboard to actually run one of those programs, you must actually navigate your mouse pointer to it and click it.

Verdict: If using a keyboard, Gnome 3 is very easy to navigate and uses a minimum of keystrokes to locate and run a program, requiring no mouse input at all. If using a mouse, the fact that you can get to the Applications icons immediately in MacOS, vs. having to click on the Applications tab after swooping to the corner, makes MacOS require one less mouse movement and one less mouse click. Score: TIE.

Select A Window, Gnome 3

Method 1: Ye olde alt-TAB, with a kick. For applications with multiple windows, you are shown a down-arrow and a down-cursor will then open window previews. Select the window you want.

Method 2: Hit the Windows key or swoop mouse to left top of screen. The windows will then swoosh out into a thumbnail pane view. Click on the window you're interested in.

Method 3: Hit the Windows key or swoop mouse to top left of screen. Select the dock icon representing the program you want to switch to. Either click it to go to the topmost window, or right-click it to select which window you want.

Select A Window, Mac OS Lion:

Method 1: Command-tab to get a list of running programs. While still holding down Command-tab, then use the arrow keys to move left-right to program whose window you want to see. Release command-tab while that program is highlighted. Note that unlike with Gnome 3, you do *not* get to choose which exact window of the program is going to be switched to -- you get whatever MacOS feels like giving you.

Method 2: Set a mouse button or gesture (I use the Page Forward button on my Logitech mouse or a triple-finger-up gesture on the trackpad). Invoke said mouse button or gesture. The windows will zoom out to pane/thumbnail view. Navigate mouse to the window you want and click on it.

Method 3: Move mouse to bottom of screen, click the dock icon of the program you want to switch to. Right-click will allow you to choose which of the windows to switch to.

Verdict: Gnome 3 wins on command-tab, its command-tab function is full-featured and works well. It ties on mouse button or gesture. It loses on dock, but only barely because its dock is not always visible and you have to move your mouse to the top right of the screen to show it, but that's only one mouse movement extra so not a huge loss. So Gnome 3 by a nose.


So what have we learned thus far? Well, 1) For two specific tasks, both Gnome 3 and Mac OS Lion have done a lot of work on reducing the amount of mouse movements and/or keystrokes needed to do these tasks, and almost completely eliminated any necessity for fine motor movements or reading of tiny print, and 2) Gnome 3 is quite competitive with Mac OS Lion in doing these tasks. In the next part of this series I will compare some other common operations, and finally I will summarize the results and examine one of the more bizarre things that has happened since the release of Gnome 3 -- the rabid condemnation of it by early Linux developers (including Linus Torvalds) who appear to despise it, and what that means for Linux.