Wednesday, June 27, 2012

End of the FreeBSD ZFS experiment

So my write performance with ZFS on FreeBSD was abysmal. The final straw was when I was copying from one pool to another and it was running at roughly the same speed as a floppy disk. I made numerous attempts to tune both ZFS and the iSCSI initiator, and nothing that I tried made any real long-term difference. Things would speed up after I tweaked stuff, then slowly settle back down to a tedious crawl.

Out of frustration I created a 64-bit Centos 6.2 image with the exact same stats as the FreeBSD image and installed the native Linux port of ZFS. This requires a stub that goes into the kernel to meet licensing requirements, then compiles against that stub code. I then shut down the FreeBSD virtual machine, installed the iSCSI initiator on the Linux machine and scanned both of my iSCSI storage arrays to tell it about the initiator, then went to the storage arrays and unassigned the volumes from the FreeBSD machine and assigned them to the Linux machine instead. Then I scanned and logged them in on the Linux machine, and did the following command at a root login:

zpool import -f -F -m -a

They imported cleanly and everything came up.

So the next thing I did was set off my copy going. I am ZFS-mirroring between the two iSCSI storage arrays and I only have a single gigabit Ethernet port from my ESXi box to the storage arrays, so this put a maximum throughput of roughly 100 megabytes per seconds for both read and write. ZFS did this throughput to the storage arrays handily.

So clearly the problem is not ZFS. And FreeBSD has been shown to have good ZFS performance with DASD (direct-attached storage devices). So the fundamental problem appears to be the FreeBSD iSCSI initiator. I don't care enough to diagnose why it's so terrible when used with ZFS despite the fact that I hit all the tuning flags to turn up the queue depth etc., but the end result is that ZFS combined with iSCSI on FreeBSD is a no-go.

On Linux, BTW, it Just Worked once I built the zfs RPM's and installed them. I'm performing at the full speed of my network. And remember, that's the ultimate role of computer technology -- to Just Work, leaving the hard stuff of deciding what's going to go onto that server to the humans. My goal was to move bytes from point A to point B as fast as my ESXi system could do so, given the fact that I need to set up Etherchannel on my antique SMC switch to do ESXi trunking to get data from point A to point B any faster than gigabit Ethernet will take me. (I don't know if the antique will even do it, this is a production ESXi box so I have to set an outage time at an oddball time before I can move the iSCSI network Ethernet wires to the supposedly Etherchannel-trunked ports and flip the ESXi vswitch bit to ip-ip hash to split the traffic between the two trunked ports). So while it sucks that I have to manually build and install ZFS on Linux, the final result appears to work far better than it really should, considering the very beta-quality appearance of that ZFS On Linux site and the rapid updates they're making to the software.



  1. Shoot.

    On xenserver, it is a nightmare to get freebsd to be a fully-PV VM like my ubuntu servers.
    But it's so good at ZFS. But it sucks as iscsi!

    My ubuntu servers are great with iscsi. Easy in xenserver, but the zfs thing is a big question mark. We've played with ZOL and ZFS Fuse, and they both worked well but not as fast as FreeBSD.... then there's the whole "do i want to trust it on production" thing.

    I wish btrfs was ready.

  2. The problem is that the FreeBSD iSCSI initiator lacks immediate data mode. This is a one-CDB write mode. Without that, it has to do multiple TCP/IP turnarounds to do a single write, which is ridiculously slow.

    ZFS On Linux has now reached production quality, for some definitions of "production". It still has some performance issues with streaming performance (as testing elsewhere on this blog indicate), but for certain applications is quite usable. I am not, unfortunately, in a position to do testing with FreeBSD to compare performance of the two.

    I don't know whether btrfs is ever going to be "ready". I have thus far been underwhelmed by btrfs, it appears to be a bad clone of ZFS, with the exception that it has much better streaming performance on Linux. I do believe that in recent kernels (3.8/3.9) if you stay away from the new functionality it's as stable as anything else on Linux other than ext4 (which due to its fixed allocation areas will always be more reliable than something that relies on dynamic allocation).