Showing posts with label virtualization. Show all posts
Showing posts with label virtualization. Show all posts

Sunday, September 27, 2015

SSD: This changes everything

So someone commented on my last post where I predicted that providing block storage to VM's and object storage for apps was going to be the future of storage, and he pointed out some of the other ramifications of SSD. To whit: Because SSD removes a lot of the I/O restrictions that have held back applications in the past, we are now at the point where CPU in many cases is the restriction. This is especially true since Moore's Law has seemingly gone AWOL. The Westmere Xeon processors in my NAS box on the file cabinet beside my desk aren't much slower than the latest Ivy Bridge Xeon processors. The slight bump in CPU speed is far exceeded by the enormous bump in IOPS that comes with replacing rotational storage with SSD's.

I have seen that personally, myself, in watching a Grails application max out eight CPU cores while not budging the iometer on a database server running off of SSD's. What that implies is that the days of simply throwing CPU at inefficient frameworks like Grails are limited. In the future efficient algorithms and languages are going to come back in fashion to use all this fast storage that is taking over the world.

But that's not what excites me about SSD's. That's just a shuffling of priorities. What excites me about SSD's is that they free us from the tyranny of the elevator. The elevator is the requirement that we sweep the disk drive heads from bottom to top, then from top to bottom, in order to optimize reads. This in turn puts some severe restrictions on how we lay out storage for block storage -- the storage must be stored contiguously so that filesystems layered on top of the block storage can properly schedule I/O out of their buffers to satisfy the elevator. This in turn means we're stuck with the RAID write hole unless we have battery backed cache -- we can't do COW RAID stripe block replacement (that is, write altered blocks of a RAID stripe at some new location on the device then alter a stripe map table to point at those new locations and add the old locations to a free list) because a filesystem on top of the block device would not be able to schedule the elevator properly. The performance of the block storage system would fall over. Thus why traditional iSCSI/Fiber Channel vendors present contiguous LUNs to their clients.

As a result when we've tried to do COW in the past, we did it at the filesystem level so that the filesystem could properly schedule the elevator. Thus ZFS and BTRFS. They manage their own redundancy rather than using RAID at the block layer to handle their redundancy, and ideally want to directly manage the block devices. Unfortunately that really doesn't map well to a block storage back end that is based on LUNs, and furthermore, doesn't map well to virtual machine block devices represented as files on the LUN -- virtual machines all have their own elevators doing what they think are sequential ordered writes, but the COW filesystems are writing at random places, so read performance inside the virtual machines becomes garbage. Thus VMware's VMFS, which is an extent-based clustered filesystem that, again, due to the tyranny of the elevator, keeps the blocks of a virtual machine's virtual disk file located largely contiguously on the underlying block storage so that the individual virtual machines' elevators can schedule properly.

So VMFS talking to clustered block storage is one way of handling things, but then you run into limits on the number of servers that can talk to a single LUN that in turn makes it difficult to manage because you end up with hundreds of LUN's for hundreds of physical compute servers and have to schedule the LUNs so they're only active on the compute servers that have virtual machines on that specific LUN (in order to avoid hitting the limits on number of servers allowed to access a single LUN). What is needed is the ability to allocate block storage on the back end on a per-virtual-machine basis, and have the same capabilities on that back end that VMFS gives us on a single LUN -- the ability to do snapshots, the ability to do sparse LUN's, the ability to copy snapshots as new volumes, and so forth. And have it all managed by the cloud infrastructure software. This was difficult back in the days of rotational storage because we were slaves of the elevator, because we had to make sure that all this storage ended up contiguous. But now we don't -- the writes have to be contiguous, due to the limitations of SSD, but reads don't. And it's the reads that forced the elevator -- scheduling contiguous streams of writes (from multiple virtual machines / multiple files on those virtual machines) has always been easy.

I suspect this difficulty in managing VMFS on top of block storage LUNs for large numbers of ESXi compute servers is why Tintri decided to write their own extent-based filesystem and serve it as a NFS datastore to ESXi boxes, rather than as block storage LUN's. NFS doesn't have the limits on number of computers that can connect. But I'm not convinced that, going forward, this is going to be the way to do things. VSphere is a mature product that has likely reached the limits of its penetration. New startups today are raised in the cloud, primarily on Amazon's cloud, and they want a degree of flexibility to spin virtual machines up and down that make life difficult with a product that has license limits. They want to be able to spin up entire test constellations of servers to run multi-day tests on large data sets, then destroy them with a keystroke. They can do this with Amazon's cloud. They want to be able to do this on their local clouds too. The future is likely to be based on the KVM/QEMU hypervisor and virtualization layer, which can use NFS data stores but they already have the ability to present an iSCSI LUN to a virtual machine as a block device. Add in some local SSD caching at the local hypervisor level to speed up writes (as I explained last month), and you have both the flexibility of the cloud and the speed of SSD. You have the future -- a future that few storage vendors today seem to see, but one that the block storage vendors in particular are well equipped to capture if they're willing and able to pivot.

Finally, there is a question as to whether storage and compute should be separate things altogether. Why not have compute in the same box as your storage? There's two problems with that though: 1) you want to upgrade compute capability to faster processors on a regular basis without disrupting your data storage, and b) density of compute servers is much higher than density of data servers, i.e., you can put four compute blades into the same 2U space as a 24-bay data server. And as pointed out above, compute power is now going to be the limiting factor for many applications, not IOPs. Finally, you want the operational capability to add more compute servers as needed. When our team used up the full capacity of our compute servers, I just added another compute server -- I had plenty of storage. Because the demand for compute and memory just keeps going up as our team has more combinations of customer hardware and software to test, it's likely I'm going to continue to have to scale compute servers far more often than I have to scale storage servers.

So this has gone on much too long but the last thing to cover is this: Will storage boxes go the way of the dodo bird, replaced by software-defined solutions like Ceph on top of large numbers of standard Linux storage servers serving individual disks as JBOD's? It's possible, I suppose -- but it seems unlikely due to the latency of having to locate disk blocks scattered across a network. I do believe that commodity hardware is going to win everything except the high end big iron database business in the end because the performance of commodity hardware has risen to the point where it's pointless to design your own hardware rather than purchase it off the shelf from a vendor like Supermicro. But there is still going to be a need for a storage stack tied to that hardware in the end because pure software defined solutions are unable to do rudimentary things like, e.g., use SES to blink the LED of a disk bay whose SSD has failed. In the end providing an iSCSI LUN directly to a virtual machine requires both a software support side that is clearly software defined, and a hardware support side where the hardware is managed by the solution. This in turn implies that we'll continue to have storage vendors shipping storage boxes in the future -- albeit storage boxes that will incorporate increasingly large amounts of software that runs on infrastructure servers to define important functions like, e.g., spinning up a virtual machine that has a volume attached of a given size and IOPs guarantee.

-ELG

Sunday, August 11, 2013

The killer app for virtualization

The killer application for virtualization is... running legacy operating systems.

This isn't a new thought on my part. When I was designing the Intransa StorStac 7.20 storage appliance platform I deliberately put virtualization drivers into it so that we could run Intransa StorStac as a virtual appliance on some future hardware platform not supported by the 2.6.32 kernel. And yes, that works (no joke, I tried it out of course, the only thing that didn't work was sensors but if Viakoo ever wants to deliver a virtualized IntransaBrand appliance I know how to fix the sensors). My thought was future-proofing -- I could tell from the layoffs and from the unsold equipment piled up everywhere that Intransa was not long for the world, so I decided to leave whoever bought the carcass a platform that had some legs on it. So it has drivers for the network chips in the X9 series SuperMicro motherboards (Sandy/Ivy Bridge) as well as the virtualization drivers. So there's now a pretty reasonable migration path to keep StorStac running into the next decade... first migrate it to Sandy/Ivy Bridge physical hardware, then once that's EOL'ed migrate it to running on top of a virtual platform on top of Haswell or its successors.

But what brought it to mind today was ZFS. I need some of the features of the LIO iSCSI stack and some of the newer features of libvirtd for some things I am doing, so have ended up needing to run a recent Fedora on my big home server (which is now up to 48 gigabytes of memory and 14 terabytes of storage). The problem is that two of those storage drives are offsite backups from work (ZFS replication, duh) and I need to use ZFS to apply the ZFS diffsets that I haul home from work. That was not a problem for Linux kernels up to 3.9, but now Fedora 18/19 have rolled out 3.10, and ZFSonLinux won't compile against the 3.10 kernel. I found that out the hard way when the new kernel came in, and DKMS spit up all over the floor because of ZFS.

The solution? Virtualization to the rescue! I rolled up a Centos 6.4 virtual machine, pushed all the ZFS drives into it, gave it a fair chunk of memory, and voila. One legacy platform that can sit there happily for the next few years doing its thing, while the Fedora underneath it changes with the seasons.

Of course that is nothing new. A lot of the infrastructure that I migrated from Intransa's equipment onto Viakoo's equipment was virtualized servers dating in some cases all the way back to physical servers that Intransa bought in 2003 when they got their huge infusion of VC money. Still, it's just a practical reminder of the killer app for virtualization -- the fact that it allows your OS and software to survive despite underlying drivers and architectures changing with the seasons. Now making your computer work faster can be done without changing anything at all about it -- just buy a couple of new virtualization servers with the very latest fastest hardware and then migrate your virtual machines to them. Quick, easy, and terrifies OS vendors (especially Microsoft) like crazy because now you no longer need to buy a new OS to run on the new hardware, you can just keep using your old reliable OS forever.

-ELG

Saturday, April 27, 2013

Configuring shared access for KVM/libvirt VM's

Libvirt has some nice migration features in the latest RHEL/Centos 6.4 to let you move virtual machines from one server to the other, assuming that you . But if you try it with VM's set to auto-start on server startup, you'll swiftly run into problems the next time you reboot your compute servers -- the same VM will try to start up on multiple compute servers.

The reality is that unlike ESXi, which by default locks the VMDK file so that only a single virtual machine can use it at a time, thus meaning that the same VM set to start up on multiple servers will only start on one (that wins the race), libvirtd by default does *not* include any sort of locking. You have to configure a lock manager to do so. In my case, I configured 'sanlock', which has integration with libvirtd. So on each KVM host configured to access shared VM datastore /shared/datastore :

  • yum install sanlock
  • yum install libvirt-lock-sanlock
Now set up sanlock to start at system boot, and start it up:
  • chkconfig wdmd on
  • chkconfig sanlock on
  • service wdmd start
  • service sanlock start
On the shared datastore, create a locking directory and give it username/ID sanlock:sanlock and permissions for anybody who is in group sanlock to write to it:
  • cd /shared/datastore
  • mkdir sanlock
  • chown sanlock:sanlock sanlock
  • chmod 775 sanlock
Finally, you have to update the libvirtd configuration to use the new locking directory. Edit /etc/libvirt/qemu_sanlock.conf with the following:
  • auto_disk_leases = 1
  • disk_lease_dir = /shared/datastore/sanlock
  • host_id = 1
  • user = "sanlock"
  • group = "sanlock"
Everything else in the file should be commented out or a blank line. Host ID must be different for each compute host, I started counting at 1 and counted up for each compute host. And edit /etc/libvirt/qemu.conf to set the lock manager:
  • lock_manager = "sanlock"
(the line is probably already there, just commented out. Un-comment it). At this point, stop all your VM's on this host (or migrate them to another host), and either reboot (to make sure all comes up properly) or just restart libvirtd with
  • service libvirtd restart
Once you've done this on all servers, try starting up a virtual machine you don't care about on two different servers at the same time. The second attempt should fail with a locking error., At the end of the process it's always wise to shut down all your virtual machines and re-start your entire compute infrastructure that's using the sanlock locking to make sure everything comes up correctly. So-called "bounce tests" are painful, but the only way to be *sure* things won't go AWOL at system boot. If you have more than three compute servers I instead *strongly* suggest that you go to an OpenStack cloud instead, because things become unmanageable swiftly using this mechanism. At present the easiest way to deploy OpenStack appears to be Ubuntu, which has pre-compiled binaries on both their LTS and current distribution releases for OpenStack Grizzly, the latest production release of OpenStack as of this writing. OpenStack takes care of VM startup and shutdown cluster-wide and simply won't start a VM on two different servers at the same time. But that's something for another post. -ELG

Saturday, March 30, 2013

Why no cloud?

So I promised I'd explain why I was setting up normal Linux-based storage and normal KVM/ESXi compute servers for our new small business's network rather than an OpenStack private cloud, so I'll do so.
  1. One risky technology per deployment. It's about risk management -- the ability to manage risks in a reasonable manner. If you have multiple risky technologies, the interactions between risks rise exponentially and cause risks to be unmanageable. Normal Linux-based storage is a mature technology with over a decade of active deployment in production environments with the exception of the LIO iSCSI target. I concluded that the LIO iSCSI target was a necessity in our environment because the TGTD target provided with stable Linux distributions has multiple serious deficiencies (see earlier postings) that render it nothing more than a toy, and our legacy infrastructure was based around iSCSI talking to that pile of ancient Intransa blue boxes. So I've reached my limit on new technologies. Meanwhile OpenStack is multiple immature technologies under active development. Add that to LIO and the existing VMware ESX/ESXi servers' need for block storage and I'd require multiple storage networks to mitigate the risks. Which brings up...
  2. Power and space budget. My power and space budget allows for one storage network with a total of 8U of space and 1000 watts of power consumption. I don't have power and space for two storage networks, one for OpenStack and one for ESX/ESXi.
  3. Performance. The majority of what my network provides to end users is file storage via NFS and CIFS. In an OpenStack deployment file servers run as virtual machines talking to back end storage via iSCSI. This scales very well in large installations, but I don't have the power and space budget for a large installation so that's irrelevant. Running the NAS storage stack directly on the storage boxes results in much better responsiveness and real-world performance than running the NAS storage stack on a virtual machine talking to the storage boxes via iSCSI, even if the theoretical performance should be the same. The biggest issue is that this limits the size and performance of any particular data store to one storage box, but the reality is that this isn't a particularly big limitation for our environment, since we have far more iops and storage on a single storage box than any single data store in our environment will use for quite some time. (My rule of thumb is that no ext4 data store will ever be over 1Tb and no xfs data store will ever be over 2Tb, due to various limitations of those filesystems in a NAS environment... any other filesystem runs into issue #1, one risky technology per deployment, and I already hit that with LIO)
  4. Deep understanding of the underlying technologies. The Linux storage stack has been mature for many years now, with the exception of LIO. I know its source code at the kernel level fairly well. If there is an issue, I know how to resolve it, even to the point of poking bytes into headers on disk drive blocks to make things work. Recovery from failure thus is low risk (see #1). OpenStack is a new and immature technology. If there is an issue, we could be down for days while I chase around in the source code trying to figure out what went wrong and how to fix it.
Note that this is *not* a slam on OpenStack as a technology, or saying that you should not use one of the OpenStack cloud providers such as RackSpace or HP. They have massive redundancies in their OpenStack deployment and people on staff who have the expertise to manage it, and do not have to deal with legacy infrastructure requirements such as our ESXi servers with their associated Windows payloads. Plus they are based around a totally different workload. Our in-house workload is primarily a NAS workload for workstations, and our compute workload is primarily a small number of virtualized test servers or build servers for our software in a variety of environments as well as a handful of infrastructure servers to e.g. handle DNS. What OpenStack mostly gives you is the ability to manage massive numbers of storage servers and massive numbers of compute servers and massive numbers of virtual machines on those compute servers, none of which is our local workload.

The workload that RackSpace etc. are supporting is mostly about Big Data and Big Compute in the cloud or about web server farms in the cloud. All of that has far larger space and power requirements than our little two-rack data center can ever provide, and the reality is that we simply use their infrastructure when we have those requirements rather than attempt to replicate their infrastructure in-house. It simply isn't reasonable for a small business to try to replicate RackSpace or Amazon AWS in-house. We don't have the space and power for the massive amount of infrastructure they use to achieve redundancy and reliability, we don't have the requirement for our local workload, and we don't have the in-house expertise. In the end, it's a case of using the appropriate technology for the appropriate task -- and for what I'm attempting to achieve for the local infrastructure of a small business, using NAS-based Linux storage was more appropriate than attempting to shoe-horn our workload into an infrastructure that would give us no more capability for our needs but would cost us in terms of power, space, performance, and maintainability.

-ELG

Friday, March 2, 2012

Best practices for virtualization

A series of notes...

  1. vSphere/ESXi: Expensive. Inscrutable licensing scheme -- they have more SKU's than my employer, almost impossible to tell what you need for your application. Closest thing to It Just Works in virtualization. Call them the Apple of virtualization.
  2. Xen : Paravirtualization gives it an advantage in certain applications such as virtualized Linux VM's in the cloud. Paravirtualization generally is faster than hypervirtualization, though most hypervisors now include paravirtualized device drivers to ease that pain. Xen doesn't Just Work, it's more an erector set. Citrix's XenServer is the closest that Xen gets to vSphere's 'Just Works', I need to download it and try it out.
  3. KVM : The future. Integrating the hypervisor and the OS allows much better performance. That's why VMware wrote their own kernel with integrated hypervisor. Current issues: Management is the biggest difficulty. There is difficulty creating clustered filesystems for swift failover or migration of virtual machines (ESXi's VMFS is a cluster file system -- point several ESXi systems at a VMFS filesystem on a iSCSI or Fiber Channel block storage, and they'll all be able to access virtual machines on that system). Most KVM systems set up to do failover / migration in production use NFS instead, but NFS performs quite poorly for the typical virtualization workload for numerous reasons (may discuss later). Closest thing to VMFS performance for VM disks is using LVM volumes or clustered LVM (if using iSCSI block storage), but there are no management tools for KVM allowing you to set up LVM pools and manage them for virtual machine storage with snapshots and so forth. Virtual disk performance on normal Linux filesystems, via the qcow2 format, sucks whether you're talking ext4, xfs, or nfs. In short, the raw underlying bits and pieces are all there, but there is not a management infrastructure to use them. Best practice performance-wise for clustered setup: NFS share for metadata (xml description files of VM's, etc.), iSCSI or Fiber Channel block storage possibly sliced/diced with clustered LVM for the VM disks.
So what am I going to use today if I'm a busy IT guy who wants something that Just Works? VMware vSphere. Duh. If, on the other hand, I'm building a thousand-node cluster, a) it's probably my full time job so I have time to spend futzing with things like clustered LVM, and b) the cost of vSphere for a cluster that large would be astronomical so would decidedly make paying my salary to implement Xen or KVM on said cluster more palatable than paying VMware.

-ELG

Wednesday, July 20, 2011

Accessing raw drives from VirtualBox

In my previous virtualization series, I avoided VirtualBox because there was no GUI support for accessing raw drives, and I have a pair of 2TB Linux RAID drives that I wanted to use a Linux VM on my Windows host to assemble and export to my network as a set of network shares. However, when I wanted to see the Gnome Shell functionality of Fedora 15, I had no choice but to use VirtualBox -- it's the only virtualization solution out there that currently supports OpenGL acceleration for Linux virtual machines.

So given that incentive, I actually did it. The secret is VirtualBox's VMDK support for importing VMWare virtual drives. The VMDK header format allows creating a virtual disk that is actually just a pointer to a physical drive. So, I used the directions on the VirtualBox site to create two vmdk files pointing at physical drives and then add them to my freshly installed Fedora 15 virtual machine as "existing" drives.

The first thing to note is that on Windows Vista or Windows 7, you MUST run VirtualBox as Administrator to access physical drives. Otherwise your VM simply won't start (and you can't even create your virtual VMDK's if you're not doing it as Administrator).

The next thing to note is where Oracle puts all the binaries you'll need. So you'll need to pop open a Terminal window as the administrative user (i.e. right-click and "run as Administrator") and:

C:> path "%path%;C:\Program Files\Oracle\VirtualBox"

Then you'll need to find your virtual machines. For me:

C:> cd "\Users\eric\VirtualBox VMs"

did the trick.

Now you can run the commands. I knew from my VMware Player experiment what my two physical drives were identified as in Windows. They were identified as drives 0 and 1, while my boot drive (which plugs into the front) is identified as drive 2. So I went ahead and did it. Otherwise you may need to do a bit of poking around to figure out which drives to push. So anyhow:

C:> VBoxManage internalcommands createrawvmdk -filename Fedora15/Disk0.vmdk -rawdisk \\.\PhysicalDrive0
C:> VBoxManage internalcommands createrawvmdk -filename Fedora15/Disk1.vmdk -rawdisk \\.\PhysicalDrive1

Then I right-clicked my VirtualBox icon and ran it as Administrator (you will probably want to go into its settings via right-click on its icon and make that permanent for left-double-clicks too), clicked the Fedora15 virtual machine, then its Settings icon, hit "+", and added the two hard drives as "existing" virtual hard drives. I then started the VM and... success! I saw my two drives in /proc/partitions.

Well.... *almost* success. Fedora 15 didn't activate my arrays on that first boot. So mdadm to activate them, vgscan (to detect my volume groups after activating the RAID arrays with mdadm), then lvchange -ay to activate the detected logical volumes. But once I did all that I could mount my filesystems and add them to /etc/fstab, and Fedora 15 properly assembled everything on my next reboot.

So what's the downside of VirtualBox so far? Well... it's hard to say. One thing I *have* noticed is that YouTube videos do not play properly. Multimedia playback in virtual machines is always problematic because of timing jitter, and I suspect that being on a Windows host with its really lousy clock system doesn't help. But then, I can just view multimedia on the Windows host -- that's why I have it, for games and stuff that doesn't work well under Linux. Other than that, everything else seems to be working... and you can't beat the price: Free (for personal use).

-ELG

Thursday, March 24, 2011

Virtualization solution #3: Windows Virtual PC

Note that my Windows platform is Windows 7, which basically includes a free Windows XP virtual machine for Windows Virtual PC. But I was wanting to add physical hard drives. And... err... no. It won't do it.

My basic take: Windows Virtual PC works well if you're wanting some Windows "sandboxes" to play in. The drop-and-drag integration in particular is fairly impressive. But for what I want -- to create a virtual machine that's given ownership of a bunch of disks in order to software RAID-6 them and divvy slices of the resulting arrays out via iSCSI, CIFS, and AFP, i.e., to create a virtual storage appliance -- it simply lacks the basic functionality I need. Thus far my 64-bit Scientific Linux 6 is working quite well as a JBOD-manager with VMware Player... and there's not many virtualization solutions that can do that, and nothing other than VMware Player on the desktop.

-ELG

Sound, Flash, and VMware Player

This is some notes on how to get sound and flash working on a Scientific Linux 64-bit guest ( rebranded Red Hat Enterprise Linux 6 ) running inside VMware Player on Windows 7 64-bit:

Flash: Note that Chrome for 64-bit Linux does *not* include the integrated Flash player that all other platforms have. You'll need to download 64-bit Flash beta from Adobe. At the time of this writing, that's at Adobe Labs. Once you extract the tgz file, you'll be left with a file "libflashplayer.so", simply copy it to /usr/lib64/mozilla/plugins , restart Chrome or Firefox, and you're set.

Sound: VMware's sound system crashes with the default Ubuntu / Red Hat sound configuration. This is apparently because VMware doesn't bother emulating all pieces of the hardware they say they're emulating, and when ALSA touches the missing bits, VMware disables the sound device. That's easy enough to fix though. From the Gnome menu, go to System->Preferences->Sound. In the Sound preferences, click on Hardware. Change the profile at the bottom from "Analog Stereo Duplex" to "Analog Stereo Output". Then on the little icon of the speaker on the bottom margin of the VMware Player, right-click it and select "Connect". After about a minute it'll turn green and you'll be able to play sound again.

Why this happens: Though the Ensoniq device emulated is capable of stereo duplex operation (i.e., both recording and outputting at the same time), I know because I actually had one of the physical cards back in the day and used it for that purpose along with a multitrack recorder program, VMware's emulation of the device is not capable of such and thus you must disable that capability. Unless you were intending to record audio within the Linux virtual machine (*not* recommended, VMware's timing is not sufficiently good to get good results there), this has no actual effect -- you can still play Flash videos from Chrome (and Firefox presumably) and hear the sound.

So if you're running Windows 7 on your physical hardware because you need the graphics performance for games and aren't willing to do the dual-card IOMMU hack with Xen that I demonstrated previously (or your hardware simply doesn't have IOMMU support, or you're not willing to use the latest bleeding-edge OpenSUSE as your platform), you can still have the far more secure web browsing environment of Linux available, and with VMware Player you can give Linux any additional drives beyond your boot drive for Linux to manage. Linux works far better as a server than Windows 7 does -- it can provide CIFS, AFP, iSCSI, and do it all on a much better software RAID stack than Microsoft's, as well as using LVM to manage that space, for which Windows 7 has no equivalent. And VMware's Unity system actually works pretty well with SL6/RHEL6 on Windows, although not as well as it works in VMware Fusion on MacOS (the issue being that the little Unity menu icon gets put into the screen menu bar on MacOS and has a native look and feel, while shows up as a clunky little usually-invisible icon above the Start menu on Windows). Which means that you can mix Linux Chrome windows, Windows IE windows (ick! But there's a couple of applications I need for customer support that requires IE plugins that don't exist for any other browser), Linux shell windows, and hoary old Outlook all on the same screen and manage them using the normal Aero Peek icons at the bottom or, if you have a Logitech mouse with the Logitech drivers installed, by assigning one of the side buttons to Window Switcher (their clone of Apple's Expose'). And unlike earlier versions of Windows, Windows 7 is stable -- in fact, I've never managed to make it crash. It's just very annoying... but that's why Apple is still in business, after all. Because Apple makes computers that are not annoying. For a price. A big price, alas...

-ELG

Tuesday, March 8, 2011

Virtualization solution #2: VirtualBox

So the next piece of virtualization software that I was going to try out is VirtualBox. Remember, my Linux is installed on an entirely separate hard drive pair, that I mapped into VMware Player as two drives then installed Scientific Linux 6 onto one of the RAID arrays previously configured on that drive pair for that purpose (a 20GB RAID1 pair). 'grub' handles that situation just fine, it skips the RAID header and loads the Linux kernel and does its thing. At that point, I can see the remaining 1.8 terabytes of RAID'ed LVM volumes.

So I fire up VirtualBox and go to create a virtual machine via its GUI and... err... it doesn't allow me to assign physical drives to my VM. Which is one of those "WTF?!" moments, because the underlying QEMU that VirtualBox is based upon certainly has the *capability* to add physical drives into a virtual machine, but a bit of Googling around finds that you must do some cryptic command line hacking to make VirtualBox do it. The GUI won't do it.

At that point, realizing that VMware was point and click and ridiculously easy to set up and did what I wanted it to do without said hacking, I said "F*** that" and uninstalled VirtualBox. It may be that VirtualBox could perform better than VMware Player. But my time is more valuable than any tiny increment of performance that VirtualBox could potentially give me compared to VMware Player.

Next up: I check out Windows Virtual PC and see what I can do with it. For one thing, it'll be cool to try Windows XP Mode, even if it turns out not to be useful for virtualizing Linux...

-- ELG

Saturday, March 5, 2011

Windows virtualization software

Previously I mentioned three bare metal hypervisor solutions -- Xen, KVM, and VMware ESXi -- and what was required to push through a video card to a WIndows VM living on those hypervisors. I have since managed to get that working with VMware ESXi, and I suspect if I install the very latest Fedora 14 host it'll work with KVM too.

That works fine with a desktop machine or rack server where I can mount multiple video cards into the box. The performance of Windows 7 virtualized is indistinguishable from Windows 7 raw. Here are my scores for Windows 7 raw on that box, with a Seagate Momentus XT 7200 RPM boot drive (due to the front-loading slot on my case that allows easy swapping of OS drives):

  • Processor: 7.5
  • Memory: 7.6
  • Graphics: 7.3
  • Gaming graphics: 7.3
  • Primary hard disk: 5.9
Compare with the virtualized numbers.

The problem, however, is when you want to go mobile. You simply can't add a second video card to a laptop. So if I want to play games on one of those new Dell Sandy Bridge desktop-replacement laptops, I need to run WIndows native, and come up with a way to have Linux also running and handling my years of accumulated data, all of which is in a Unix filesystem tree and cannot simply be copied into Windows. I preferably want this Linux to be running on a raw partition -- not on a file within a Windows partition -- and it has to be able to access raw Linux-formated USB and SATA drives. So, let's look at the first candidate... VMware Player

VMware Player is VMware's entry level desktop virtualization program. At one point in time VMware player would only start up virtual machines that had been created by VMware Workstation, but now it's an almost full version of VMware Workstation with various functionality like snapshots stripped out and with a limitation of only four cores allowed. VMware Player is "free" -- it's free for personal use, if you want to deploy it in a corporate environment you can license it for a fairly trivial per-seat fee (quite trivial, it'll be lost in the noise of your IT budget).

The test machine is my Xen server with Windows 7 installed as described above. I installed VMware Server on it without any problem. I created my Scientific Linux 6.0 virtual machine with no problem, giving it 2 gigabytes of memory and a 16gb root filesystem. VMware Tools installed easily into SL6 and allowed me to treat the Linux "X" desktop as if it were just any other Windows window, I could click into it, my mouse pointer could be moved outside of the window while I was typing into Linux program, and so forth. Adding the two 2TB SATA physical hard drives to the virtual machine was as simple as point and click, though the VM had to be off to do so because VMware Player's hot-plug functionality apparently does not work with physical drives. Once I booted SL6 up, it saw the two 2TB drives and assembled the Linux software RAID arrays on it automagically, though I had to do vgchange -a -y to get SL6 to recognize the LVM volumes on the RAID arrays.

So how fast is access to those two SATA drives? On a subsequent reboot of my Linux VM, a RAID check got fired off. The two drives were being read at 105MB/sec apiece -- 210MB/sec total -- and it used less than 20% of one of my eight cores for VMware to virtualize this. My take on it is that VMware Player's fake SCSI device takes a fair chunk of CPU to virtualize, but modern multi-core CPU's are so bleepin' fast that you won't even notice (which I didn't, until I went to see).

The final thing I wanted to do was to export a NTFS-formatted volume to Windows via iSCSI. Windows 7 has Microsoft's iSCSI initiator built in. I gave both my Windows machine and my Linux VM fixed addresses (using the bridged mode for the Linux VM's virtual network card), and installed the iSCSI target daemon and utils with 'yum install scsi-target-utils'. Then I added the already existing logical volume /dev/datagroup/win7 volume to /etc/volumes/targets.conf (see that file for exact format of what you need to add) and started up the daemon with "service tgtd start". Then I went to the Windows Administrative Tools (you can get to them from the the Start Menu if you've configured them to appear there, or from the Control Panel), selected the iSCSI Initiator, told it to scan the IP address of my Linux VM, and voila, it popped up there and as a drive letter in the Windows Finder. Easy peasy! The only thing to remember is to poke a hole for iSCSI in both the Linux and Windows firewalls, or it doesn't work :). (Yes, been there, done that, heh!).

The final test was to attach a USB hard drive to the system and export it to my MacBook Pro via iSCSI and use it as a Time Machine device. When I attached the hard drive to the system, it popped up in Windows, but clicking Virtual Machine ->Removable Devices showed the new drive, and allowed me to add it to the virtual machine. I then added it to targets.conf and told the tgd daemon about it, then went to my iSCSI initiator on the MacBook (the globalSAN iSCSI initiator) and added it, then used Disk Utility to format it as a Mac volume. Then I went to Time Machine and told Time Machine to use it for backup, and... voila. It started backing up at about 20mb/sec -- or about 50% of theoretical USB2 speed, not too bad considering this is being done via WiFi, not a direct connection, and the iSCSI target is running in a virtual machine, not directly on the hardware. A copy in Windows of a 4GB file to a similar USB drive ran at 26mb/sec, and that should go faster than Time Machine writes because it's a big sequential write, not a lot of smaller files. So now I have the equivalent of one of those expensive Time Capsule thingies, except that a 2TB Western Digital drive in an external USB case costs a *lot* less! Why an external USB? Simple -- so if I ever have to restore my MacBook Pro after a disk crash (which has happened before), I can unplug it from my big server and plug it directly into the MBP for MacOS to restore the system back to pre-crash state.

So... it's clear that VMware Player will do everything I want it to do here. There's two more options to look at before calling this competition done, however: Oracle's VirtualBox, which recently released a brand new version (version 4.0.4), and Microsoft's own Windows Virtual PC, which doesn't officially support Linux but which has been made to do so. More on those later...

-ELG

Monday, November 15, 2010

Pushing a graphics card into a VM, part 5

Part 1 Part 2 Part 3 Part 4 Part 5

So here's the final thumbnail summary:

Hardware:

  • Video card #1: ATI 5750 (but the 5770 should work too and is slightly faster, but the 5750 was on the Xen compatibility list).
  • Video card #2: nVidia Corporation VGA G98 [GeForce 8400 GS] *PCI* card (BIOS set to use PCI as first card)
  • Intel(r) Desktop Board DX58SO -- not a great motherboard, but it was available at Fry's and was on the Xen VT-d compatibility list
  • Intel Core I7-950 processor
  • 12GB of Crucial 3x4GB DDR3 RAM
  • Hard drives: 2 Hitachi 7200 rpm SATA 2GB drives, configured as RAID1 via Linux software RAID.
  • Antec Two Hundred V2 gamer case to handle swapping OS's via the front 2 1/2" drive port.
  • Various 2 1/2" drives to hold the Linux OS's that I was experimenting with
Software:
  • OpenSUSE 11.3, *stock*.
  • Windows 7, *stock*.
By adding the PCI card, my Linux console remains my Linux console, and Xen properly starts up my Windows DomU. My configuration is now complete. I may extend my Windows LVM volume to 200G so I can install more games on it, but note that all of my personal files, ISO's, etc. live on Linux. Note that 5.9 is what the Windows Performance Index should be for that particular hard drive combo, so this Windows system is as good as most mid-range gaming systems performance-wise. I added the paravirtualization drivers for the networking and disk controller, but they didn't improve performance any -- all they did was reduce how much CPU the dom0 qemu was expending implementing virtual disk and network controllers. Given that I have a surplus of CPU on this system (8 threads, 3.2ghz), it's in retrospect no surprise that I saw no performance gain on the disk and network from going paravirtual -- all I did was free up more CPU for use for things like, say, video encoding.

Thoughts and conclusions:

One thing that was very clear through this entire process is that I'm very much pushing beyond the state of the art here. The software and hardware configurations needed for this to work were very twiddly -- there is exactly one (1) Linux distribution (OpenSuse 11.3) which will do it at this point in time, and there were no GUI tools for OpenSuse 11.3 which would create a Xen virtual machine with the proper PCI devices. Furthermore, the experimental Xen 4.01 software on OpenSuse is almost entirely undocumented -- or, rather, it has man pages provided with it, but the man pages document an earlier version of Xen which is significantly different from what's actually shipped with OpenSuse 11.3.

From a general virtualization perspective, comparing Xen, KVM, and ESXi, Xen currently wins on capabilities but only by a hair, and those capabilities are almost totally undocumented -- or worse yet, don't work the way the documentation says they work. Xen's only fundamental technological advantage over KVM and ESXi right now is its ability to run paravirtualized Linux distributions without needing the VT-x and VT-d extensions -- a capability which is important for ISP's with tens of thousands of older servers without these extensions, but becoming increasingly less important as VT-x is now everywhere except in the low-end Atom processors. Comparing my Xen installation at home with my KVM installation at work, both of which I have now used extensively and pushed their capabilities to their limits, I can see why Red Hat is pushing the KVM merger of hypervisor and operating system -- KVM gives you significantly greater ability to monitor the overall performance of your system, vs. Xen where 'xm top' is a poor substitute for being able to get detailed monitoring of your overall system performance, is significantly better at resource management since the same resource manager handles everything (core hypervisor/dom0 plus VM's), and the Linux scheduler can consider everything when deciding what to schedule, rather than having the Xen hypervisor out in the background making decisions about which Xen domain to schedule next based upon very little information.

In short, my general conclusion is that KVM is the future of Linux virtualization. Unfortunately my experience with both KVM and Xen 4.0 is that both are somewhat immature compared to VMware's ESX and ESXi products. They are difficult to manage, their documentation is persistently out of date and often incorrect, and both have a bad tendency to crash cryptically when doing things that they're supposed to be able to do. Their core functionality works well -- I've been running Internet services on Xen domains for over five years now and for that problem domain it is bullet-proof, while at work I am developing for several different variants of Linux using KVM virtual machines on Fedora 14 as well as running a Windows VM to handle the VSphere management tools, and it's been bullet-proof. But they decidedly are not as polished as VMware at this point, other than Citrix's XenServer, which lacks the PCI passthrough capability of ESXi and thus was not useful for the projects I was considering.

My take on this, however, is that VMware's time as head of the virtualization pack is going to be short. There isn't much more that they can add to their platform that the KVM and Xen people aren't already working on. Indeed, the graphics passthrough capability of Xen is already beyond where VMware is. At some point VMware is going to find themselves in the same position vs. open source virtualization that SGI and Sun found themselves in vs. open source POSIX. You'll note that SGI and Sun are no longer in business...

-ELG

Sunday, November 14, 2010

Pushing a graphics card into a VM, part 4

Part 1 Part 2 Part 3 Part 4 Part 5

Okay, so virt-manager did pick up my new VM once I created it with xm create on a config file, but when I rebooted the system the VM was gone. So how can I fix this? Well, by taking advantage of functionality that OpenSUSE has had for auto-starting Xen virtual machines all along: Just move my config file into /etc/xen/auto and it'll auto start (and auto shutdown, if I have the xen tools installed) at system boot.

Of course, that requires a config file. Rather than paste it here, I'll let you view the config file as a text file. Note that 'gfx_passthru=1' is commented out. The Xen documentation says I need it, but if I put it there, my VM doesn't start up -- it crashes into the QEMU monitor. Also I ran into another issue, a timing issue. pciback is grabbing the console away from Linux and leaving the video card half-initialized, and when Xen grabs the video card and shoves it into the VM, the video card locks up the system solid when Windows tries to write to it. My solution to that was even simpler -- put the older of the nVidia cards back into the system, and load the 'nouveau' driver using YaST's System > Kernel > INITRD_MODULES and System > Kernel > MODULES_LOADED_ON_BOOT functionality. This flips the console away from the ATI card early enough that it doesn't conflict with Xen giving the video card to Windows. This also gives me a Linux console on the nVidia card that I can switch to by plugging in a second keyboard to the front USB on my chassis (the one I did *not* push into the Windows VM) and flipping my monitor to its DVI input (rather than the HDMI coming from the ATI card).

With all of this done, I can now reboot my system and get Windows on video card 0, and Linux on video card 1. I suppose I could reverse the video cards (to give the boot video card to Linux), unfortunately my board puts the second 16-lane PCIe slot too close to the bottom of the case and a double-width PCIe card won't fit there. Maybe when I upgrade to one of those spiffy SuperMicro server motherboards with the IMPI and such, at which point I won't need a second video card anyhow because the on-board video will suffice for Linux...

Next up in Part 5: Thoughts and conclusions.

Saturday, November 13, 2010

Pushing a graphics card into a VM, Part 3

Part 1 Part 2 Part 3 Part 4 Part 5

OpenSUSE 11.3 was a quite easy install. I haven't used SUSE since the early 'oughts, but first impressions were pretty good. OpenSUSE 11.3 is KDE-based, which is a change from the other distributions I've been using for the past few years -- Ubuntu on my server at home, Debian on my web and email server, and various Red Hat derivations at work -- and seems to be pretty well put together. The latest incarnation of YAST makes it more easily managable from the command line over a slow network connection than the latest Ubuntu or Red Hat, which rely on GUI tools at the desktop. The biggest difference between Red Hat and SUSE was that SUSE uses a different package dependency manager, "zypper", which is roughly equivalent to Red Yat's "yast" and Debian's "apt-get" but with its own quirks. It appears to be slightly faster than "yast" and roughly the same speed as "apt-get". If you wonder why SUSE/Novell wrote "zypper", at the time they wrote it, "yast" was excruciatingly slow and utterly unusable unless you had the patience of Job. Red Hat has sped up "yast" significantly since that time, but SUSE has stuck with "zypper" nevertheless. I also set up the bridging VLAN configuration that I mention in my previous post about how to do it on Fedora. Again SUSE has slightly different syntax than Red Hat for how to do this in /etc/sysconfig/network/* (note *not* network-scripts), but again it was fairly easy to figure out via reading the ifup / ifdown scripts and consulting SUSE's documentation.

So anyhow, I installed the virtualization environment via "yast" and rebooted into the Xen kernel downloaded by that. At that point I created a "win7" LVM volume in LVM volume group "virtgroup" on my 2TB RAID array, and went into the Red Hat "virt-manager" and attached to my Xen domain, then told it to use that LVM volume as the "C" drive and installed Windows 7 on it. I'm using a LVM volume because at work with KVM, I find that this gives significantly better disk i/o performance in my virtual machine than pointing the virtual disk drive at file on a filesystem. Since both Xen and KVM use Qemu to provide the virtual disk drive to the VM, I figured that the same issue would apply to Xen, and adopted the same solution that I adopted at work -- just point it at a LVM volume, already. (More on that later, maybe).

Okay, so now I have Windows 7-64 bit installed and running, so I shut it down and went to attach PCI devices to it via virt-manager and... err. No. Virt-manager wouldn't do it. Red Hat strikes again, it claims that Xen can't do PCI passthrough! So I went back to the handy Xen Wiki and started figuring out via trial and error how to use the "xm" command line, where the 'man' page for xm doesn't in any way reflect the actual function of the program that you see when you type 'xm help'. So here we go...

First, claw back the physical devices you're going to use via booting with them attached to pciback. So my module line for the xen.gz kernel in /boot/grub/menu.lst looks like...

module /vmlinuz-2.6.34.7-0.5-xen root=/dev/disk/by-id/ata-WDC_WD5000BEVT-22ZAT0_WD-WXN109SE2104-part2 resume=/dev/datagroup/swapvol splash=silent showopts pciback.hide=(02:00.0)(02:00.1)(00:1a.0)(00:1a.1)(00:1a.2)(00:1a.7)(00:1b.0)

Note that while XenSource has renamed pciback to 'xen-pciback', OpenSUSE renames it back to 'pciback' for backward compatibility with older versions of Xen. So anyhow, on my system, this hides the ATI card and its sound card component, and the USB controller to which the mouse and keyboard are attached. I leave the other USB controller attached to Linux. I did not have any luck pushing USB devices directly to the VM, I had to push the entire controller instead, apparently the Xen version of QEMU shipped with OpenSUSE 11.3 doesn't implement the USB (or else I simply need to read the source). Note that you want to make sure your system boots *without* the pciback.hide before you boot *with* it, because once the kernel starts booting and sees those lines, your keyboard, mouse, and video go away!

Okay, so now I'm booted. I ssh into the system via the network port (err, make sure that's set up before you boot with the pciback.hide too!) and go into virt-manager (via X11 displaying back to my Macbook, again make sure you have some way of displaying X11 remotely before you start this) and start up the VM. At that point I can do:

  • xm list
and see my domain running, as well as log into Windows via virt-manager. So next, I attach my devices...

  • xm pci-attach win7 0000:02:00.0
  • xm pci-attach win7 0000:02:00.1
  • xm pci-attach win7 0000:00:1a.0
  • xm pci-attach win7 0000:00:1a.1
  • xm pci-attach win7 0000:00:1a.2
  • xm pci-attach win7 0000:00:1a.7
  • xm pci-attach win7 0000:00:1b.0
Windows detects the devices, loads drivers, and prompts me to reboot to activate. So I tell Windows to reboot, and it comes back up, but nothing's showing up on my real (as vs. virtual) video screen. then I go into device-manager in Windows and see what happened. The two USB devices (keyboard and mouse) show up just fine. But the ATI video card shows up with an error. I look at what Windows tells me about the video card, and Windows tells me that there is a resource conflict with another video card -- the virtual video card provided by QEMU. So I disable the QEMU video card, reboot and... SUCCESS! I now have Windows 7 on my main console with video and keyboard and mouse!

Windows Experience reports:

  • Calculations per second: 7.6
  • Memory: 7.8
  • Graphics: 7.2
  • Gaming graphics: 7.2
  • Primary hard disk: 5.9
Those are pretty good, quite sufficient for gaming, except for the disk performance which is mediocre because we're going through the QEMU-emulated hard drive adapter rather than a paravirtualized adapter. When doing network I/O to download Civilization V via Steam I also notice mediocre performance (and high CPU utilization on the dom0 host) for the same reason. We'll fix that later. But for playing games, we're set! Civilization V looks great on a modern videocard on a 1080P monitor with a fast CPU!

Okay, so now I have a one-off boot, but I want this to come up into Windows every time my server boots. I don't want to have to muck around with a remote shell and such every time I want to play Windows games on my vastly over-powered Linux server (let's face it, a Core I7-950 with 12GB of memory is somewhat undertasked pushing out AFS shares to a couple of laptops). And that, friends, is where part 4 comes in. But we'll talk about that tomorrow.

-ELG

Pushing a video card into a VM, Part 2

Part 1 Part 2 Part 3 Part 4 Part 5

The first issue I ran into was that my hardware was inadequate to the task. My old Core-2 Duo setup lacked VT-d support. So I went to the Xen compatibility list and found a motherboard which supported VT-d and upgraded my motherboard. At the same time I also upgraded my case to an Antec case that has a slot on the front for plugging in 2 1/2 inch drives. This was to make it easier to swap operating systems. Theoretically you can hot-swap, but I've not tested that and don't plan to.

Since I am inherently a lazy penguin (much like Larry Wall), the next thing I did was try the "virtualization environments". I found that XenServer was a very well designed environment for virtualizing systems in the cloud. Unfortunately it was also running a release 3.x version of Xen rather than the new 4.0 release, and did not implement PCI passthrough or USB passthrough to fully virtualized VM's natively. There were hacks you could do, but once you start doing hacks, the XenServer environment is not really a nice place to do them. So I moved on.

ProxMox VE is a somewhat oversimplified front-end to KVM and OpenVZ. It looks like a nice environment for running a web farm via web browser, but unfortunately it does not support PCI passthrough natively either. Again, you can start hacking on it, but again once you start doing that you might as well go to a non-dedicated environment.

Ubuntu 10.10 with KVM was my next bet. I *almost* got it running, but the VM wouldn't attach the graphics card. It turns out that was another issue altogether, but looking at the versions of QEMU and KVM provided, it appeared that Fedora 14 had one version newer (as you'd expect, since Fedora 14 came out almost a month later), so I went to Fedora 14 instead.

I got close -- really close -- with Fedora 14. But two different video cards -- an old nVidia 7800GT and a new nVidia GTS450 -- both ended up with error messages in the libvirtd logs saying there was an interrupt conflict that prevented attaching the PCI device. I ranted to a co-worker, "I thought MSI was supposed to solve that!" So I looked at enabling MSI on these nVidia cards and found out that... err... no. Not a good idea, even if I wanted, the cards generally crashed things hard if you tried. So I went back to the XenSource.com wiki on VGA passthrough again and followed the link to the list of video cards, and... err, okay. an ATI Radeon 5750 has been reported as running wiith Xen's VGA passthrough.

So, I swapped that out, and tried again with Fedora 14. This time the KVM module crashed with a kernel oops.

At this point I'm thinking, otay, KVM doesn't seem to want to do this. Xen, on the other hand, seems to have a Wiki and all documenting how to do this. So let's use Xen instead of KVM. The problem is that Xen is an operating system. It relies on having a special paravirtualized kernel for its "Dom0" that handles the actual I/O driver work. Red Hat claims providing such a kernel would be too much work and that they won't do it until the Dom0 patches are rolled into upstream by Linus. This despite the fact that Red Hat has patched their kernels to the point where Linus would barely recognize them if someone plunked the source to them on his disk, but it's that whole Not Invented Here thingy again, Red Hat invented KVM and was looking for an excuse to not include a Xen dom0 kernel, and there you go. I looked at downloading a dom0 kernel for Fedora 14, but then... hmm. Look. OpenSUSE 11.3 *comes* with a XEN dom0 kernel. So let's just install OpenSUSE 11.3.

OpenSUSE 11.3 is what I eventually had success with. But to do that, I ended up having to fight Red Hat -- again. But more on that in Part 3.

-ELG

Pushing a graphics card into a Xen VM, Part 1

Part 1 Part 2 Part 3 Part 4 Part 5

One of the eternal bummers for Linux fanboys is the paucity of games for Linux. This is, in part, because Linux is not an operating system, Linux is a toolkit for building operating systems -- and each operating system built with the Linux toolkit is different, but all of them claim to be "Linux". Well, from a game designer's perspective there is no such thing as "Linux" -- each of the variants puts files in different places, each of the variants has a different way of configuring X11, and so forth. And talking about X11, that's another issue. Mark Shuttleworth got a lot of heat for saying that desktop Linux was never going to be competitive as long as it was saddled with the decades of fail that are X11, when he proposed moving Ubuntu Linux to Wayland. But the only Unix variant that has ever gotten any traction on the desktop -- Mac OS X -- did so by abandoning X11 and going to their own lighter-weight GUI library that forced a common interface upon all programs that ran on the platform (except for ported X11 programs, which were made deliberately ugly by the Mac OS X11 server that ran on top of the native UI in order to encourage people to port them to the native UI). Linux fanboys might talk about how OpenGL over X11 isn't theoretically incapable of handling gaming demands, etc. etc., but the proof is in the pudding -- if it's so easy, why isn't anybody doing it?

So anyhow, one of the interesting things about the Intransa Video Appliance is that it looks like Windows if you sit down at the console... but behind the scenes, it's actually VMware on top of a Linux-based storage system. So why not, I wondered, just push the entire video subsystem into Windows via VT-d? I mean, it's not as if Linux user interfaces run any slower remotely displayed over VNC than they do locally, they're pretty light-weight by modern standards. So if you could push the display, keyboard, and mouse into a Windows virtual machine that was started up pretty much as soon as enough of Linux was up and going to support it, you could have a decently fast gaming machine, *and* have a good Linux development and virtualization server -- all on the same box.

So, I assembled my selection of operating systems and started at it. I assembled the bleeding edge of Linux -- Ubuntu 10.10,Fedora 14, Citrix XenServer 5.6.0, ProxMox VE version 1.6, and OpenSUSE 11.3, and set to work seeing what I could do with them...

Next up: Part 2: The distributions.

-ELG

Sunday, October 17, 2010

Opaque Linux

One of the things that is starting to annoy me is the increasing cluttering of Linux with opaque subsystems that have annoying bugs that are difficult to diagnose. Some are easy enough to work around -- udev's persistent-net-dev rule, for example, might make replicated virtual machines have no working network devices, but it's easy enough to simply remove the file (and fix the /etc/sysconfig/network-scripts files to remove any hardwired HMAC there too, of course, since using ovftool to push a virtual machine to ESX automatically gives it a new HMAC). But others -- like the NetworkManager subsystem that I mentioned last week -- either work or don't work for you. They're opaque black boxes that are pretty much impossible to do workarounds with, other than just completely disabling them.

I just finished rebuilding my system with the latest goodies to do virtualization -- I now have a quad-core processor with 12 gigs of memory and VT-d support. As part of that I just upgraded to Ubuntu 10.10, their latest and greatest. I've been using Fedora 13, Red Hat's latest and greatest, at work for the past month. My overall conclusion is... erm. Well. Fedora has its issues, but Ubuntu is getting to the point of utter opaqueness. For example, Ubuntu has a grand new grub system that generates elaborate boot menus. The only problem: Said elaborate boot menus are *wrong* for my system, they all say (hd2) where they should say (hd0). And the system that generates these elaborate boot menus is entirely opaque.... though at least I can go into the very elaborate grub.cnf file and manually edit it. Well, unless a software update has happened, at which point the elaborate boot menu generator subsystem runs again and whacks all your hd0's back to hd2s... even in entries that are old. Red Hat's grubby might be old and creaky, but at least it's never done *that* kind of silliness.

Now, granted, I am running rather unusual hardware in a far different configuration from what a desktop Linux user would want. If you want a desktop Linux system, I still believe Ubuntu is the best Linux you can put onto your system, I've run Ubuntu on the desktop for years and it serves well there. I especially like Ubuntu 10.10's ability to transparently encrypt your home directory similar to the way MacOS can, this will resolve a lot of issues with lost laptops and stolen data, this is an opaque subsystem but necessarily so. You can also put the proprietary Nvidia video drivers onto your system with a simple menu item, while with Fedora 13 you have to fight to put the proprietary driver onto the system (the GPL driver is loaded *in the initrd*, which makes it a PITA to get rid of). In short, if I were running Linux on the desktop, 10.10 is hard to beat. But for my purposes, doing virtualization research with KVM and VT-d, I'm wiping out Ubuntu in the morning and installing Fedora 13.

-ELG

Tuesday, June 22, 2010

The new alternative to VMware: KVM

Both the latest Ubuntu and the latest Red Hat are shipping with a new alternative to VMware Server called QEMU-KVM. I've been playing with it, and it is much faster and lighter weight than VMware Server, as well as being more flexible and easier to use.

To get started with QEMU-KVM on Ubuntu 10.04, first install kvm and qemu-kvm from aptitude. Then install virt-manager. After that, System Tools->Virtual Machine Manager will bring up your virtual machine management console.

You'll see two entries when you do this:

  • localhost (QEMU Usermode) - Not Connected
  • localhost (QEMU)
Double-click on localhost (QEMU) and it'll connect to the local root virtual machine manager. You could also connect to other machine's managers, if you're wanting to, say, manage the virtual machines on a host in your data center, by using File->Add Connection. Now you'll probably want to set up a data pool for use by your new virtual machines. Most of us put the virtual machines on their own partition, not on the root partition, but the default data pool is in /var/lib/libvirt/images -- which is on the root partition. Ick. Never fear, right-click on the localhost(QEMU) and select 'Details', then click on the 'Storage' tab when you get the details. Click "+" to add your new storage pool, once you define its location click the green 'play' button to make it active, then hit the red delete button to get rid of the 'default' pool. You now have a new default storage pool at the location you desire.

Okay, so you have your data pool, now what about creating a virtual machine? Easiest way to do that is to use an ISO image of your favorite distribution. Just right-click on the localhost(QEMU) entry again, and select 'New'. The resulting wizard is ridiculously easy to navigate as long as you remember that it's going to create it in whatever your enabled data pool is when you tell it to 'create a disk image on the computer's hard drive'.

So, after this you should be able to run the virtual machine and install your ISO on it. Remember that ctrl-alt gets you out of the QEMU console back into the regular Linux desktop environment, and you'll be fine. To open a console, just right-click and select 'open'. Or once you have a VM set up and installed, you can shut it

Okay, so what's the limits of QEMU/KVM right now? First of all, don't expect to run graphical environments via the normal console with any kind of responsiveness. It emulates a very slow/old display card which is then screen-scraped by a vnc server. KVM is mostly useful for running non-GUI setups, such as Asterisk servers or hosted virtual web servers. Secondly, some operating systems might not install at all into KVM due to driver support issues. Finally, there is no equivalent of "VMware tools" to integrate with your host environment so you can move your mouse freely between the virtual machine terminal and the host OS. Your best bet there, if you want a graphical console inside a virtual machine, is to install VNC in the virtual machine and then use VNC to view your graphical console.

But aside from those limitations, KVM appears to be working quite well. It is definitely better on Linux than VMware Server, and if you need to create a vmdk to import into VMware on some other non-Linux host, it's easy enough to just 'qemu-img convert -O vmdk VbAst32.img VbAst32.vmdk' and voila, the new virtual machine will import cleanly into VMware. And of course VPEP runs inside a KVM virtual machine just fine... :).

-ELG

Monday, March 22, 2010

About work...

One thing you'll notice, reading this blog, is that I haven't blogged about anything happening at work. There's a reason for that: It is, in general, a bad idea. If an employer believes a post puts the company in a bad light or simply decides that you have leaked proprietary information without permission, it's a great way to get fired -- dozens of bloggers have been fired over the past decade for posting about things that happened at work.

So anyhow, who I work for is no secret -- you can click on my LinkedIn profile and see -- but now I will be blogging about things I'm doing at work on my employer's own group blog. My first posts are up. You might recognize one of them as a revised version of one of the posts on this blog, except now I can say what I could only hint at then :).

-ELG

Friday, November 6, 2009

Parallels 5 vs. VMware Fusion 3

So I have tried both of these virtualization solutions for MacOS Snow Leopard and the winner is... VMware by a landslide. Not because of performance. VMware's performance is acceptable for my purposes but I can definitely tell that I'm running in a virtualized environment. But, rather, because VMware WORKS, and Parallels doesn't. That's the bottom line. I can go into more detail, but I'm just too frustrated with Parallels right now and would use language not appropriate for polite conversation. Having Parallels crash my computer *TWICE*, and lock up three different times, simply does not make me happy.

I am saddened to say this, because I've owned Parallels since version 2.0, but this is it. This is the end. They are not getting any more money from me. Each new release of Parallels they promise that they got it right this time. Each time, they break things badly -- for example, in Parallels 4, one of my mapping programs ended up going BLAMMO unless I turned off mouse pointer acceleration in the Windows control panel, and then the Parallels device driver simply refused to display any mouse pointer at all. Meanwhile VMware Fusion 3 is a rock solid product. It might be slightly slower than Parallels on some benchmarks (hard to tell, I could never keep Parallels running long enough to run the benchmarks I was wanting to run), but it *works*, and the integration between Windows and MacOS Snow Leopard is quite good, no problems with cut-and-paste or sharing files between Windows and MacOS or anything like that. The competition between VMware and Parallels is over, and Parallels is done. Finished. Kaput. They had first mover advantage, and like Netscape with web browsers, simply failed to execute.

Which reminds me of the time that my manager was the guy who ran Netscape's development process into the dirt. Needless to say the common Linux fanboy notion that Microsoft ran Netscape out of business is utter nonsense -- Netscape's browser technology disintegrated without any help from Microsoft at all. Their technology simply disintegrated under the weight of too many idiotic false deadlines and hacks, and the manager who did that then did the same thing for my then-employer's development process. But that's another ugly tale that tends to evoke unwise language so I'll do something a bit more abstract about deadlines and why they're both useful and, in some cases, toxic.

-EG

Numbers from Windows Experience quickie benchmark:

  • VMware 3:
    • Processor: 5.9
    • Memory: 3.9
    • Graphics: 2.9
    • Gaming graphics: 3.4
    • Primary hard disk: 6.3
  • Parallels 5:
    • Processor: 4.5
    • Memory: 3.9
    • Graphics: 2.9
    • Gaming graphics: 4.1
    • Primary hard disk: 5.9
Parallels has somewhat better 3D performance, somewhat poorer performance on processor and hard drive tests, same as VMware elsewhere. Parallels is probably better if you want to play games, but that's why Boot Camp was invented...