VMware: Defrag or Not?

Dave Lewis sent in a question, “There is such a quandary about disk fragmentation in the VMware environment. One says defrag and another says never. Who’s right? This has been a hard subject to track and define.”

I’m going to debunk “defragging” in a minute, but if you read VMware’s own best practice guide on improving performance (found here), page 17 reveals “adding more memory” as the top recommendation while the second most important recommendation is to “defrag all guest machines.”

As much as VMware is aware that fragmentation impacts performance, the real question is how relevant is the task of defragging in today’s environment with sophisticated storage services and new mediums like flash that should never be defragged? First of all, no storage administrator would defrag an entire “live” disk volume without the tedious task of taking it offline due to the impact that change block activity has against services like replication and thin provisioning, which means the problem goes ignored on HDD-based storage systems. Second, organizations who utilize flash can do nothing about the write amplification issues from fragmentation or the resulting slow write performance from a surplus of small, fractured writes.

The beauty behind V-locity® I/O reduction software in a virtual environment is that fragmentation is never an issue because V-locity optimizes the I/O stream at the point of origin to ensure Windows executes writes in the most optimum manner possible. This means large, contiguous, sequential writes to the backend storage for every write and subsequent read. This boosts the performance of both HDD and SSD systems. As much as flash performs well with random reads, it chokes badly on random writes. A typical SSD might spec random reads at 300,000 IOPS but drop to 23,000 IOPS when it comes to writes due to erase cycles and housekeeping that goes into every write. This is why some organizations continue to use spindles for write heavy apps that are sequential in nature.

When most people think of fragmentation, they think in terms of it being a physical layer issue on a mechanical disk. However, in an enterprise environment, Windows is extracted from the physical layer. The real problem is an IOPS inflation issue where the relationship between I/O and data breaks down and there ends up being a surplus of small, tiny I/O that chews up performance no matter what storage media is used on the backend. Instead of utilizing a single I/O to process a 64K file, Windows will break that down into smaller and smaller chunks….with each chunk requiring its own I/O operation to process.

This is bad enough if one virtual server is being taxed by Windows write inefficiencies and sending down twice as many I/O requests as it should to process any given workload…now amplify that same problem happening across all the VMs on the same host and there ends up being a tsunami of unnecessary I/O overwhelming the host and underlying storage subsystem.

As much as virtualization has been great for server efficiency, the one downside is how it adds complexity to the data path. This means I/O characteristics from Windows that are much smaller, more fractured, and more random than they need to be. As a result, performance suffers “death by a thousand cuts” from all this small, tiny I/O that gets subsequently randomized at the hypervisor.

So instead of taking VMware’s recommendation to “defrag,” take our recommendation to never worry about the issue again and put an end to all the small, split I/Os that are hurting performance the most.