Sunday, February 24, 2013

Part III: Enter KVM

See: The next test is envisioned to be NTFS. This will require writing a small Java program to do what I did from the shell on Unix. But before that, I wanted to quantify the performance loss caused by KVM I/O virtualization.

I installed Fedora 18 on a KVM virtual machine via virt-manager and pushed /dev/md10 (the 6-disk RAID10 array) into the virtual machine as a virtio device. I then did raw I/O to /dev/vdb (what it showed up as in the virtual machine), and found that I was getting roughly the same performance as native -- which, as you recall, was 311Mb/sec. I was getting 308Mb/sec, which is close enough to be no real difference. The downside was that I was using 130% of a CPU core between the virtio driver and kflushd (using write-back mode rather than write-through mode), i.e., using up one CPU core plus 1/3rd of another to transfer the data from the VM to the LSI driver. For the purposes of this test, that is acceptable -- I have 8 cores in this machine, remember.

The next question was whether XFS performance would show the same excellent results in the VM that it showed native. This proved to be somewhat disappointing. The final result was around 280mb/sec -- or barely faster than what I was getting from ZFS. My guess is that natively XFS tries to align writes with RAID stripes for the sake of performance, but with the RAID array hidden behind the emulation layer provided by the virtualization system, it was not able to do so. That, combined with the fact that it only had half as much buffer cache to begin with (due to my splitting the RAM between the KVM virtual machine and the host OS -- i.e., 10Gb apiece) made it more difficult to effectively schedule I/O. I/O on the KVM side was "bursty" -- it would burst up to 1 gigabyte per second, then down to 0 gigabyte per second, as shown by 'dstat'. This similarly caused I/O on the host side to be somewhat "bursty". Also, this tends to support the assertion that it's the SEL (Solaris Emulation Layer) that's causing ZFS's relatively poor streaming performance when compared to BTRFS, since the SEL effectively puts the filesystem behind an emulation layer too. It also supports the assertion that the Linux kernel writers have spent a *lot* of time working on optimizations of the filesystem/block layer interface in the recent Linux kernels. It also raises the question of whether hardware RAID controllers -- which similarly hide the physical description of the actual RAID system behind a firmware-provided abstraction layer -- would have a similar negative impact upon filesystem performance. If I manage to snag a hardware RAID controller for cheap I might investigate that hypothesis but it's rather irrelevant at present.

What this did bring out was that it is unlikely that testing NTFS throughput via a Windows virtual machine is going to produce accurate data. Still, I can compare it to the Linux XFS solution, which should at least tell me whether its performance is within an order of magnitude for streaming loads. So that's the next step of this four-part series, delayed because I need to write some Java code to do what my script with 'dd' did.


Update: My scrap heap assemblage of spare parts disintegrated -- the motherboard suddenly decided it was in 6-beep "help I can't see memory!" heaven and no amount of processor and/or memory swapping made it happy -- and thus the NTFS test never got done. Oh well.


  1. I would ask that you repeat the xfs virtual machine zfs test using a zvol from zfs on the host. I don't expect amazing results, but I'd like to see how it holds up.

  2. I would definitely like to re-test zfs now that it has reached what I consider to be release status. That said, some of the behavior I noticed from it appears to be inherent in the way it interacts with the Linux block layer. I'm sympathetic to their plight, the block layer is a pig and has been quite obviously one for quite some time (disclaimer -- I was the last maintainer of the Intransa block layer shim that integrated between the Intransa RAID subsystem and the block layer), and the reason btrfs gets excellent performance is by basically abusing the block layer via functionality specifically added to the block layer in order to speed up/make more reliable btrfs. But zfs can't modify the block layer to do their thing, they have to use it as-is due to licensing. So it goes.