Tuesday, October 06, 2009

"The Silicon Age: Virtual I/O
Since 2005, VMware and Xen have gradually reduced the performance overheads of virtualization, aided by the Moore’s law doubling in transistor count, which inexorably shrinks overheads over time. AMD’s Rapid Virtualization Indexing (RVI – 2007) and Intel’s Extended Page Tables (EPT – 2009) substantially improved performance for a class of recalcitrant workloads by offloading the mapping of machine-level pages to Guest OS “physical” memory pages, from software to silicon. In the case of operations that stress the MMU—like an Apache compile with lots of short lived processes and intensive memory access—performance doubled with RVI/EPT. (Xen showed similar challenges prior to RVI/EPT on compilation benchmarks.)
Some of the other performance advances have included interrupt coalescing, IPv6 TCP segmentation offloading and NAPI support in the new VMware vmxnet3 driver. However, the last year has also seen two big advances: direct device mapping, enabled by this generation of CPU’s (e.g. Intel VT-D first described back in 2006), and the first generation of i/o adapters that are truly virtualization-aware.
Before Intel VT-D, 10GigE workloads became CPU-limited out at around 3.5GB/s of throughput. Afterwards (and with appropriate support in the hypervisor), throughputs above 9.6 GB/s have been achieved. More important, however, is the next generation of i/o adapters that actually spin up mini-virtual NIC’s in hardware and connect them directly into virtual machines—eliminating the need to copy networking packets around. This is one of the gems in Cisco’s UCS hardware which tightly couples a new NIC design with matching switch hardware. We’re now at the stage that if you’re using this year’s VMwar"

No comments: