It’s an unprecedented time for system administrators. Users have been sent home, but still must accomplish work in a timely fashion. They must have access to company data and be able to manipulate it and save it. Their client devices must still be supported and must interact efficiently with company servers.

Because of these unique workplace restrictions, VDI technology is receiving unprecedented attention. It has benefits such as greater user productivity, improved data security, streamlined IT management, central image management, and low maintenance endpoints. But it also has issues such as complexity of deployment and management, and (most pertinent) cost of hardware. Any measures which can reduce these costs are, to say the least, greatly desirable.

In the past, the TCO of VDI has shown to be higher than that of traditional dedicated endpoints. The environmental factors of the current lockdown, however, are forcing enterprises into VDI utilization as the most effective way forward.

Companies are discovering the benefits of having workers telecommute—in fact, some are deciding to remain with this model, and not return to the costly central office, due to convenience and increased productivity.

Where previously TCO and ROI of VDI were not present, another factor has come into play that drastically affects these factors: the lowered real-estate footprint.

Performance

With VDI—as with any VM scenario—performance is paramount. A hampering of performance carries a parallel of slowed employee production. With an economy in pause mode, lower production is not an option for any company wishing to survive.

The common address for VDI performance issues is the same as it is for general VM environments: adding more hardware. Unfortunately, the considerable expense of adding more servers, faster drives, beefing up the network, adding more memory or faster CPUs, is usually not justified with actual performance increases, especially in the long run. That’s because the actual cause of slow performance is not being addressed. You can achieve a considerable performance increase without adding costly hardware.

The Real Reason

A surface glance at the capacity of your hardware might lead to the first of several performance myths when it comes to VDI:

Myth number one: “I have more than enough IOPs to handle the workload.”

The reason this myth persists is the misconception that the number of I/Os per second equals performance. This actually isn’t true. Despite the number of I/Os per second, workloads are processing 30 – 40 percent slower than they need to be due to small, split, random I/O patterns generated by the Windows OS.

Because of the I/O Blender effect and the fact that, especially in VDI, you get multiple VMs doing peak work at the same time causing overloads, your ‘adequate’ IOPs may be inadequate when your users need it the most. If you monitor when all the systems are busiest you will see extreme peaks.

You can see that in order to regain that 30 to 40 percent performance loss, we need to focus on the root cause, the structure of the data, and not on the hardware.

Myth number two: “Faster I/O response time is better.”

Again, let’s take a look at the size of the I/O. A smaller I/O transfers faster than a larger one—for example, a 4k I/O is going to transfer faster than a 60k I/O. But for overall performance, that faster transfer is not going to make it better.

That faster I/O response time also doesn’t take into account the fact that I/Os are split up instead of contiguous, and are random instead of sequential. Both of these factors make a considerable difference in throughput.

For example, when 100MB of data is written out, rather than writing it out in lots of small I/Os, it is more efficient to write it out in one single I/O.

The truth is that I/O response time will be faster when I/O’s are tiny. Overall throughput, however, will always be slower—whereas it will always be faster with contiguous, larger, sequential I/Os.

Since the I/Os are so randomized, it creates what is called the “I/O blender effect”—also known as I/O contention. The workload from VDI client #1 is being impacted by client #37. Whether they’re related or not, they’re sharing the same hardware. Even if they’re not on the same host, they’re sharing the same backend storage.

To sum up, an unhealthy I/O situation exists with small, split, random I/Os. It is the “perfect trifecta” for bad storage performance—death by a thousand cuts.

Here are possible solutions to this issue:

1. Throw more hardware at it, which we’ve already covered. It is an expensive gamble that may not pay off.

2. Optimize the image, which really can’t be done. You’re not going to be able to rewrite code, and all the applications are already in there because your company requires them.

3. Optimize the Windows OS, in the way it handles I/Os. This is where V-locity® I/O transformation software enters the picture.

The Healthy I/O Scenario

We’ve now discussed, at length, the fact that one of the real performance problems for VDI environments is the way that Windows deals with I/Os. That being the case, what is the solution?

The solution is the provision of contiguous, large, sequential I/Os—which is exactly what V-locity provides. Let’s take a look at how V-locity brings this about.

First, V-locity is 100 percent software and installs directly into the Windows OS. In the stack, it sits above everything: the hypervisor, server, storage, and everything else. Since it is fenced by Windows, it doesn’t need to interact with any other software—hence there are no compatibility issues. There are no issues with hardware, either, as the hardware is running Windows, and there are no special editions of Windows for different hardware platforms.

Because V-locity is at the top of the stack, it deals with the I/O problem right at the source, where I/O inefficiencies are occurring. V-locity prevents small, split random I/Os in the first place, eliminating overworked processes and slow performance.

I/O Transformation for VDI: Write I/Os

V-locity attacks the I/O problem from two different angles. The first is through write optimization. Through its IntelliWrite® patented technology, V-locity eliminates small, fractured files caused by the Windows File System splitting files into multiple write operations.

The benefits are:
• Clean, contiguous writes for more payload with every I/O operation
• Larger I/Os, which means fewer I/Os
• Sequential writes versus random writes

When an application requests the opening of a file for the writing of data, the application isn’t interacting with storage. The request is sent through the Windows OS—more specifically, through the NTFS file system. Because the NTFS File system does not “know” how large a file creation or extension is going to be, it tends to write out the data in smaller randomized I/Os.

Which do you think will be more efficient: a thousand tiny random I/Os, or a handful of large, more productive I/Os? You’ll find that the larger ones are far more efficient and will accomplish far more work in less time.

IntelliWrite provides intelligence to the NTFS file system to enforce large sequential writes.

Just by itself, IntelliWrite reduces I/O traffic by about 30 percent. But now let’s look at the other part of the equation: reads.

I/O Transformation for VDI: Read I/Os

First, you should realize that by writing data contiguously, it will be read contiguously as well. That is the first method of optimizing reads.

But well beyond that, V-locity caches hot data reads server-side, using idle, available DRAM. This is accomplished through another patented technology, IntelliMemory®.

We like to refer to V-locity caching as a Tier Zero caching strategy because it leverages DRAM right within the VDI client.

This caching is completely dynamic. Before memory is allocated for caching, a certain amount of memory must be free. Should memory become a scarce resource, IntelliMemory automatically frees memory from cache. These dynamic actions prevent the system from becoming starved for memory resources.

A great deal of thought went into designing the product and deciding exactly what to put and keep in cache for optimal performance. V-locity doesn’t cache everything—in fact, it’s rather selective. The product contains algorithms that analyze read behavior and recognize the most frequently read requests and what patterns of data give the best performance back —the kind causing problematic throughput issues. If these I/O requests can be satisfied directly from memory, the data transfer is 12 – 15 times faster than even flash or SSD. Since they are the most frequent, they will cause the largest gain in overall performance.

This functionality provides a greater amount of workflow with just a small footprint of memory. And again, V-locity is only using memory that would otherwise be free, idle, available, and unused. If there is a demand for memory, V-locity will release it out of cache and return it to Windows.

A Winning Combination

The bottom line is that in these tough times, everyone needs to make the most of what they’ve got. With V-locity’s patented technology, VDI density is greatly improved. Servers are limited to the number of virtual desktops they can support at any one time. V-locity doubles that number, and we have reports from some users of five times as many virtual desktops being support with V-locity than without.

Get started with V-locity now for free.

 

*TCO = total cost of ownership
ROI = return on investment
VDI = virtual desktop infrastructure