About Random Numbers and Virtual Machines
11 Jan 2013Several applications need random numbers for correct and secure operation. When ssh-server gets installed on a system, public and private key paris are generated. Random numbers are needed for this operation. Same with creating a GPG key pair. Initial TCP sequence numbers are randomized. Process PIDs are randomized. Without such randomization, we'd get a predictable set of TCP sequence numbers or PIDs, making it easy for attackers to break into servers or desktops.
On a system without any special hardware, Linux seeds its entropy pool from sources like keyboard and mouse input, disk IO, network IO, and any other sources whose kernel modules indicate they are capable of adding to the kernel's entropy pool (i.e .the interrupts they receive are from sufficiently non-deterministic sources). For servers, keyboard and mouse inputs are rare (most don't even have a keyboard / mouse connected). This makes getting true random numbers difficult: applications requesting random numbers from /dev/random have to wait for indefinite periods to get the randomness they desire (like creating ssh keys, typically during firstboot.).
For applications that need random numbers instantaneously, but can make do with slightly low-quality random numbers, they have the option of getting their randomness from /dev/urandom, which doesn't block to serve random numbers -- it's just not guaranteed that the numbers one receives from /dev/urandom truly reflect pure randomness. Indiscriminate reading of /dev/urandom will reduce the system's entropy levels, and will starve applications that need true random numbers. Random numbers in a system are a rare resource, so applications should only fetch them when they are needed, and only read as many bytes as needed.
There are a few random number generator devices that can be plugged into computers. These can be PCI or USB devices, and are fairly popular add-ons on servers. The Linux kernel has a hwrng (hardware random number generator) abstraction layer to select an active hwrng device among several that might be present, and ask the device to give random data when the kernel's entropy pool falls below the low watermark. The rng-tools package comes with rngd, a daemon, that reads input from hwrngs and feeds them into the kernel's entropy pool.
Virtual machines are similar to server setups: there is very little going on in a VM's environment for the guest kernel to source random data. A server that hosts several VMs may still have a lot of disk and network IO happening as a result of all the VMs it hosts, but a single VM may not be doing much to itself generate enough entropy for its applications. One solution, therefore, to sourcing random numbers in VMs is to ask the host for a portion of the randomness it has collected, and feed them into the guest's entropy pool. A paravirtualized hardware random number generator exists for KVM VMs. The device is called virtio-rng, and as the name suggests, the device sits on top of the virtio PV framework. The Linux kernel gained support for virtio-rng devices in kernel 2.6.26 (released in 2008). The QEMU-side device was added in the recent 1.3 release.
On the host side, the virtio-rng device (by default) reads from the host's /dev/random and feeds that into the guest. The source of this data can be modified, of course. If the host lacks any hwrng, /dev/random is the best source to use. If the host itself has a hwrng, using input from that device is recommended.
Newer Intel architectures (IvyBridge onwards) have an instruction, RDRAND, that provides random numbers. This instruction can be directly exposed to guests. Guests probe for the presence of this instruction (using CPUID) and use it if available. This doesn't need any modification to the guest. However, there's one drawback to exposing this instruction to guests: live migration. If not all hosts in a server farm have the same CPU, live-migrating a guest from one host that exposes this instruction to another that doesn't, will not work. In this case, virtio-rng in the host can be configured to use RDRAND as its source, and the guest can continue to work as in the previous example. This is still sub-optimal, as we'll be passing random numbers to the guest (as in the case of /dev/random), instead of real entropy. The RDSEED instruction, to be introduced later (Broadwell onwards) will provide entropy that can be safely passed on to a guest via virtio-rng as a source of true random entropy, eliminating the need to have a physical hardware random number generator device.
It looks like QEMU/KVM is the only hypervisor that has the support for exposing a hardware random number generator to guests. (One could pass through a real hwrng to a guest, but that doesn't scale and isn't practical for all situations -- e.g. live migration.) Fedora 19 will have QEMU 1.4, which has the virtio-rng device, and even older guests running on top of F19 will be able to use the device.
For more information on virtio-rng, see the QEMU feature page, and the Fedora feature page. LWN.net has an excellent article on random numbers, based on H. Peter Anvin's talk at LinuxCon EU 2012.
Updated 2013 May 22: Added info about RDSEED and the Fedora feature page, corrected few typos.
Update 2 2015 Mar 09: More information about ongoing improvements to this feature in a new post.