Tuesday, September 25, 2012

BTRFS vs ZFSonLinux: How do they compare?

  • Integration with Linux
    • ZFS: Not integrated. Has its own configuration database (not /etc/fstab), has its own boot order for mounting filesystems (not definable by you), cannot be told to bring a filesystem up after iSCSI comes up or down before iSCSI goes down.
    • BTRFS: It's just another Linux filesystem as far as the system is concerned. You bring a pool up by mounting it (preferably by label) in /etc/fstab and can define the mount order so it comes up after iSCSI.
  • Snapshots
    • ZFS: Full snapshot creation and removal capabilities, well exploited by the FreeBSD port 'zfs-periodic'. Snapshots appear in a special "dot" directory rather than cluttering up the main filesystem. This script is relatively easy to port to Linux.
    • BTRFS: Snapshots are created as "clones" of subvolumes, and destroyed as if they were subvolumes. They can be created either read-write or read-only.
  • RAID: Both of these use filesystem-level RAID where filesystem objects are stored redundantly, either as entire clones (RAID1) or, in the case of ZFS, via RAID
    • ZFS: Raid1 (mirroring) and RaidZ (similar to RAID5, except that it never does partial-stripe writes because it does variable stripe size -- the size of an object is the size of a stripe). Note that due to ZFS's COW implementation, an update to a RAID stripe cannot be corrupted by a power loss halfway through the write (see: RAID5 write hole)-- the old copy of the data (prior to the start of the write) is instead accessed when power comes back on.
    • BTRFS: Raid1 (mirroring). BTRFS currently has nothing like RaidZ. Note that putting a BTRFS filesystem on top of a software mdadm RAID5 will not give you the same reliability and performance as RaidZ, since you will still have the random write hit of partial-stripe writes and will still have the RAID5 write hole where if a stripe update fails due to power loss halfway thru the stripe write, the entire stripe is corrupted.
  • Portability
    • ZFS: A ZFS filesystem can be read / written on: Linux (via either ZFS/Fuse or ZFSonLinux), FreeBSD, OpenIndiana, and MacOS (via Zevo). Requires extra 3rd party software to be installed on Linux and MacOS, comes standard with FreeBSD and OpenIndiana.
    • Linux: Any recent Linux distribution (one with a 3.x vintage kernel) has BTRFS built in. Your BTRFS pools will be immediately available when you upgrade to a newer kernel or a newer Linux distribution, with no need to install any additional software. However, BTRFS doesn't run on any other OS.
  • Stability
    • On Linux, both BTRFS and ZFS are listed as "experimental". ZFSonLinux uses SEL (the Solaris Emulation Layer) as a "shim" between ZFS proper and Linux. Unfortunately this is sort of like nailing jello to a tree, while the underlying Linux block layer API hasn't changed in years, locking inside that block layer API has been in constant turmoil ever since the 2.6.30 timeframe as the last vestiges of the Big Kernel Lock were ferreted out and sent to the great bit bucket in the sky. The end result is that code that *used* to work may or may not cause deadlocks or strange races that cause an oops with current Linux kernels -- *UNLESS* it was developed as part of that current Linux kernel, as BTRFS is, in which case the person who changes the locks is responsible to make sure that all other kernel modules that are part of the next kernel release change their locks to match.
    • Summary: On Linux, this is a tie. BTRFS is under rapid development. ZFS is attempting to nail jello to a tree from outside the Linux kernel. Use for production data of either system on Linux is not recommended. If you want a production server running a production-quality modern snapshotting filesystem, use ZFS on FreeBSD.
Final summary:

If you must use Linux, and you must have a modern snapshotting filesystem, and you can live with a RAID1 limitation on data redundancy, I would strongly recommend going with BTRFS. The reason for this is that BTRFS is only going to get better on Linux, while ZFS is always going to be fighting the nail-jello-to-a-tree issue where Linux keeps changing underneath it and breaking things in weird ways. Unless ZFS is included as part of the Linux kernel -- and Oracle's lawyers will never allow changing the license to GPL in order to allow that -- there simply is no way ZFS will ever achieve stability except with specific kernel versions shipped with specific distributions. And even there I'm dubious.

If you need the stability of ZFS, I strongly recommend using FreeBSD and not using Linux. I have personal experience dealing with the issues that come with supporting an emulation layer on top of the Linux block layer, including dealing with some deadlocks and races caused by locking changes inside recent kernels that caused a six-week delay in the release of an important product, and I honestly cannot say that any current ZFSonLinux implementation will continue to work with the next kernel revision. I can reliably say that BTRFS will work with the next kernel revision. While production servers don't change kernel revisions often, only once every three or four years, if the next version of the server OS doesn't happen to be one that is well supported by ZFSonLinux's then-current SEL implementation, you have problems.

So: Linux -- BTRFS. If you need the functionality of ZFS -- FreeBSD. Enough said on that.

No comments:

Post a Comment