Saturday, February 23, 2013

Part II: Enter XFS

So in the previous episode, I had benchmarked btrfs at 298Mb/sec total throughput on 8 simultaneous simulated video streams to disk, and set up a Linux RAID10 array on my six 2Tb 7200 rpm drives. The raw drives have a total streaming throughput of 110Mb apiece. I left the RAID10 array to rebuilding overnight, and went to sleep.

So what is the raw throughput of the RAID10 array and how much CPU does it chew up while doing so? I tested that today. The total raw throughput of three RAID0 stripes on those drives should be 330Mbytes/sec. Through the MD layer with a single full-speed stream I got 311Mb/sec, or roughly 6% overhead caused by the Linux kernel and the RAID10 layer. The RAID10 layer was using approximately 16% of one core accounted to flush-9:10, which is quite reasonable for the amount of work being done.

Next step was to put an XFS filesystem onto this RAID device. Note that I did not even consider putting an EXT4 onto a 6 terabyte filesystem, EXT4 is not suitable for video streaming for a number of reasons I won't detail here. EXT4 is a fine general purpose filesystem, far more reliable than it has any right to be ocnsidering its origin, but has significant performance issues with very large files in a streaming application.

The first question is, does putting the XFS log on a SSD improve performance? So I created an XFS filesystem with the log device on the SSD and the filesystem proper on /dev/md10 (the RAID10 device) and did my streaming tests again. This time it settled down to 303Mb/sec, or roughly 8% overhead. Also, because XFS only logs metadata changes, I noted that virtually no I/O was going to the log device.

Note that XFS is aggregating writes into far bigger writes than my raw writes to the MD10 device, so you cannot say that XFS has only 2% overhead over direct I/O to the raw devices. It reduces MD10 overhead due to its aggregation and its aligning of blocks to RAID stripes also. Still, it is clear that XFS is the king of high-performance streaming I/O on Linux -- as has been true for the past decade.

Of course XFS also has its drawbacks. XFS values speed over everything else, so XFS can, in practice, due to its aggressive write re-ordering, result in corrupted files in the event of a power failure or kernel panic or watchdog-forced reboot. XFS is quite acceptable for video recording data, where you may corrupt the last few seconds of video recorded to disk but you'll lose far more data due to the power outage. Add in the Linux MD layer and the MD write hole, where partial-stripe updates cannot be reconciled (as versus the COW updates of BTRFS or ZFS where the old data is still available and is reverted to if the new stripe did not complete, resulting in a file that at least is consistent, though missing the last update) and it is clear that XFS should be used for important data only on top of a hardware RAID subsystem with battery-backed cache, and should not be used for absolute mission-critical data like, say, payroll, unless the features that make it perform so well on streaming loads are turned off. Appropriate tools for appropriate tasks and all that...

So in any event, it is clear that XFS, BTRFS, and ZFS are at present useful for entirely different subsets of problems, but for video streaming XFS still remains king. Next, I take a look at what Windows will do when talking NTFS to that MD10 device via libvirtd and kvm... I will also compare to what Linux does when talking XFS to that MD10 device via libvirtd and kvm.

-Eric Lee Green

No comments:

Post a Comment