Condusiv Technologies Blog

Condusiv Technologies Blog

Blogging @Condusiv

The Condusiv blog shares insight into the issues surrounding system and application performance—and how I/O optimization software is breaking new ground in solving those issues.

Which Processes are Using All of My System Resources?

by Gary Quan 17. July 2018 05:50

Over time as more files and applications are added to your system, you notice that performance has degraded, and you want to find out what is causing it. A good starting point is to see how the system resources are being used and which processes and/or files are using them.

Both Diskeeper® and SSDkeeper® contain a lesser known feature to assist you on this. It is called the System Monitoring Report which can show you how the CPU and I/O resources are being utilized, then digging down a bit deeper, which processes or files are using them.

Under Reports on the Main Menu, the System Monitoring Report provides you with data on the system’s CPU usage and I/O Activity.

 

The CPU Usage report takes the average CPU usage from the past 7 days, then provides a graph of the hourly usage on an average day. You can then see at which times the CPU resources are being hit the most and by how much.

Digging down some more, you can then see which processes utilized the most CPU resources.

 

The Disk I/O Activity report takes the average disk I/O activity from the past 7 days, then provides a graph of the hourly activity on an average day. You can then determine at which times the I/O activity is the highest.

Digging down some more, you can then see which processes utilized the I/O resources the most, plus what processes are causing the most split (extra) I/Os.

 

You can also see which file types have the highest I/O utilization as well as those causing the most split (extra) I/Os.  This can help indicate what files and related processes are causing this type of extra I/O activity.

 

So, if you are trying to see how your system is being used, maybe for performance issues, this report gives you a quick and easy look on how the CPU and Disk I/O resources are being used on your system and what processes and file types are using them. This along with some other Microsoft Utilities, like Task Manager and Performance Monitor can help you tune your system for optimum performance.

Dashboard Analytics 13 Metrics and Why They Matter

by Rick Cadruvi, Chief Architect 11. July 2018 09:12

 

Our latest V-locity®, Diskeeper® and SSDkeeper® products include a built-in dashboard that reports the benefits our software is providing.  There are tabs in the dashboard that allow users to view very granular data that can help them assess the impact of our software.  In the dashboard Analytics tab we display hourly data for 13 key metrics.  This document describes what those metrics are and why we chose them as key to understanding your storage performance, which directly translates to your application performance.

To start with, let’s spend a moment  trying to understand why 24-hour graphs matter.  When you, and/or your users really notice bottlenecks is generally during peak usage periods.  While some servers are truly at peak usage 24x7,  most systems, including servers, have peak I/O periods.  These almost always follow peak user activity.  

Sometimes there will be spikes also in the overnight hours when you are doing backups, virus scans, large report/data maintenance jobs, etc.  While these may not be your major concern, some of our customers find that these overlap their daytime production and therefore can easily be THE major source of concern.  For some people, making these happen before the deluge of daytime work starts, is the single biggest factor they deal with.

Regardless of what causes the peaks, it is at those peak moments when performance matters most.  When little is happening, performance rarely matters.  When a lot is happening, it is key.  The 24-hour graphs allow you to visually see the times when performance matters to you.  You can also match metrics during specific hours to see where the bottlenecks are and what technologies of ours are most effective during those hours. 

Let’s move on to the actual metrics.

 

Total I/Os Eliminated

 

Total I/Os eliminated measures the number of I/Os that would have had to go through to storage if our technologies were not eliminating them before they ever got sent to storage.  We eliminate I/Os in one of two ways.  First, via our patented IntelliMemory® technology, we satisfy I/Os from memory without the request ever going out to the storage device.  Second, several of our other technologies, such as IntelliWrite® cause the data to be stored more efficiently and densely so that when data is requested, it takes less I/Os to get the same amount of data as would otherwise be required.  The net effect is that your storage subsystems see less actual I/Os sent to them because we eliminated the need for those extra I/Os.  That allows those I/Os that do go to storage to finish faster because they aren’t waiting on the eliminated I/Os to complete.

 

IOPS

IOPS stands for I/Os Per Second.  It is the number of I/OS that you are actually requesting.  During the times with the most activity, I/Os eliminated actually causes this number to be much higher than would be possible with just your storage subsystem.  It is also a measure of the total amount of work your applications/systems are able to accomplish.

 

Data from Cache (GB)

Data from cache tells you how much of that total throughput was satisfied directly from cache.  This can be deceiving.  Our caching algorithms are aimed at eliminating a lot of small noisy I/Os that jam up the storage subsystem works.  By not having to process those, the data freeway is wide open.  This is like a freeway with accidents.  Even though the cars have moved to the side, the traffic slows dramatically.  Our cache is like accident avoidance.  It may be just a subset of the total throughput, but you process a LOT more data because you aren’t waiting for those noisy, necessary I/Os that hold your applications/systems back.

Throughput (GB Total)

Throughput is the total amount of data you process and is measured in GigaBytes.  Think of this like a freight train.  The more railcars, the more total freight being shipped.  The higher the throughput, the more work your system is doing.

 

Throughput (MB/Sec)

Throughput is a measure of the total volume of data flowing to/from your storage subsystem.  This metric measures throughput in MegaBytes per second kind of like your speedometer versus your odometer.

I/O Time Saved (seconds)

The I/O Time Saved metric tells you how much time you didn’t have to wait for I/Os to complete because of the physical I/Os we eliminated from going to storage.  This can be extremely important during your busiest times.  Because I/O requests overlap across multiple processes and threads, this time can actually be greater than elapsed clock time.  And what that means to you is that the total amount of work that gets done can actually experience a multiplier effect because systems and applications tend to multitask.  It’s like having 10 people working on sub-tasks at the same time.  The projects finish much faster than if 1 person had to do all the tasks for the project by themselves.  By allowing pieces to be done by different people and then just plugging them altogether you get more done faster.  This metric measures that effect.

 

I/O Response Time

I/O Response time is sometimes referred to as Latency.  It is how long it takes for I/Os to complete.  This is generally measured in milliseconds.  The lower the number, the better the performance.

Read/Write %

Read/Write % is the percentage of Reads to Writes.  If it is at 75%, 3 out of every 4 I/Os are Reads to each Write.  If it were 25%, then it would signify that there are 3 Writes per each Read.

 

Read I/Os Eliminated

This metric tells you how many Read I/Os we eliminated.  If your Read to Write ratio is very high, this may be one of the most important metrics for you.  However, remember that eliminating Writes means that Reads that do go to storage do NOT have to wait for those writes we eliminated to complete.  That means they finish faster.  Of course, the same is true that Reads eliminated improves overall Read performance.

% Read I/Os Eliminated

 

% Read I/Os Eliminated tells you what percentage of your overall Reads were eliminated from having to be processed at all by your storage subsystem.

 

Write I/Os Eliminated

This metric tells you how many Write I/Os we eliminated.  This is due to our technologies that improve the efficiency and density of data being stored by the Windows NTFS file system.

% Write I/Os Eliminated 

 

% Write I/Os Eliminated tells you what percentage of your overall Writes were eliminated from having to be processed at all by your storage subsystem.

Fragments Prevented and Eliminated

Fragments Prevented and Eliminated gives you an idea of how we are causing data to be stored more efficiently and dense, thus allowing Windows to process the same amount of data with far fewer actual I/Os.

If you have our latest versions of V-locity, Diskeeper or SSDkeeper installed, you can open the Dashboard now and select the Analytics tab and see all of these metrics.

If you don’t have the latest version installed and you have a current maintenance agreement, login to your online account to download and install the software.

Not a customer yet and want to checkout these dashboard metrics, download a free trial at www.condusiv.com/try.

Solving the IO Blender Effect with Software-Based Caching

by Spencer Allingham 5. July 2018 07:30

First, let me explain exactly what the IO Blender Effect is, and why it causes a problem in virtualized environments such as those from VMware or Microsoft’s Hyper-V.



This is typically what storage IO traffic would look like when everything is working well. You have the least number of storage IO packets, each carrying a large payload of data down to the storage. Because the data is arriving in large chunks at a time, the storage controller has the opportunity to create large stripes across its media, using the least number of storage-level operations before being able to acknowledge that the write has been successful.



Unfortunately, all too often the Windows Write Driver is forced to split data that it’s writing into many more, much smaller IO packets. These split IO situations cause data to be transferred far less efficiently, and this adds overhead to each write and subsequent read. Now that the storage controller is only receiving data in much smaller chunks at a time, it can only create much smaller stripes across its media, meaning many more storage operations are required to process each gigabyte of storage IO traffic.


This is not only true when writing data, but also if you need to read that data back at some later time.

But what does this really mean in real-world terms?

It means that an average gigabyte of storage IO traffic that should take perhaps 2,000 or 3,000 storage IO packets to complete, is now taking 30,000, or 40,000 storage IO packets instead. The data transfer has been split into many more, much smaller, fractured IO packets. Each storage IO operation that has to be generated takes a measurable amount of time and system resource to process, and so this is bad for performance! It will cause your workloads to run slower than they should, and this will worsen over time unless you perform some time and resource-costly maintenance.

So, what about the IO Blender Effect?

Well, the IO Blender Effect can amplify the performance penalty (or Windows IO Performance Tax) in a virtualized environment. Here’s how it works…

 

As the small, fractured IO traffic from several virtual machines passes through the physical host hypervisor (Hyper-V server or VMware ESX server), the hypervisor acts like a blender. It mixes these IO streams, which causes a randomization of the storage IO packets, before sending out what is now a chaotic mess of small, fractured and now very random IO streams out to the storage controller.

It doesn’t matter what type of storage you have on the back-end. It could be direct attached disks in the physical host machine, or a Storage Area Network (SAN), this type of storage IO profile couldn’t be less storage-friendly.

The storage is now only receiving data in small chunks at a time, and won’t understand the relationship between the packets, so it now only has the opportunity to create very small stripes across its media, and that unfortunately means many more storage operations are required before it can send an acknowledgement of the data transfer back up to the Windows operating system that originated it.

How can RAM caching alleviate the problem?

 

Firstly, to be truly effective the RAM caching needs to be done at the Windows operating system layer. This provides the shortest IO path for read IO requests that can be satisfied from server-side RAM, provisioned to each virtual machine. By satisfying as many “Hot Reads” from RAM as possible, you now have a situation where not only are those read requests being satisfied faster, but those requests are now not having to go out to storage. That means less storage IO packets for the hypervisor to blend.

Furthermore, the V-locity® caching software from Condusiv Technologies also employs a patented technology called IntelliWrite®. This intelligently helps the Windows Write Driver make better choices when writing data out to disk, which avoids many of the split IO situations that would then be made worse by the IO Blender Effect. You now get back to that ideal situation of healthy IO; large, sequential writes and reads.

Is RAM caching a disruptive solution?

 

No! Not at all, if done properly.

Condusiv’s V-locity software for virtualised environments is completely non-disruptive to live, running workloads such as SQL Servers, Microsoft Dynamics, Business Information (BI) solutions such as IBM Cognos, or other important workloads such as SAP, Oracle and the such.

In fact, all you need to do to test this for yourself is download a free trialware copy from:

www.condusiv.com/try

Just install it! There are no reboots required, and it will start working in just a couple of minutes. If you decide that it isn’t for you, then uninstall it just as easily. No reboots, no disruption!


Windows is still Windows Whether in the Cloud, on Hyperconverged or All-flash

by Brian Morin 5. June 2018 04:43

Let me start by stating two facts – facts that I will substantiate if you continue to the end.

Fact #1 - Windows suffers from severe write inefficiencies that dampen overall performance. The holy grail question as to how severe is answered below.

Fact #2, Windows is still Windows whether running in the cloud, on hyperconverged systems, all-flash storage, or all three. Before you jump to the real-world examples below, let me first explain why.

No matter where you run Windows and no matter what kind of storage environment you run Windows on, Windows still penalizes optimal performance due to severe write inefficiencies in the hand-off of data to storage. Files are always broken down to be excessively smaller than they need to be. Since each piece means a dedicated I/O operation to process as a write or read, this means an enormous amount of noisy, unnecessary I/O traffic is chewing up precious IOPS, eroding throughput, and causing everything to run slower despite how many IOPS are at your disposal.

How much slower?

Now that the latest version of our I/O reduction software is being run across tens of thousands of servers and hundreds of thousands of PCs, we can empirically point out that no matter what kind of environment Windows is running on, there is always 30-40% of I/O traffic that is nothing but mere noise stealing resources and robbing optimal performance.

Yes, there are edge cases in which the inefficiency is as little as 10% but also other edge cases where the inefficiency is upwards of 70%. That being said, the median range is solidly in the 30-40% range and it has absolutely nothing to do with the backend media whether spindle, flash, hybrid, hyperconverged, cloud, or local storage.

Even if running Windows on an all-flash hyperconverged system, SAN or cloud environment with low latency and high IOPS, if the I/O profile isn’t addressed by our I/O reduction software to ensure large, clean, contiguous writes and reads, then 30-40% more IOPS will always be required for any given workload, which adds up to unnecessarily giving away 30-40% of the IOPS you paid for while slowing the completion of every job and query by the same amount.

So what’s going on here? Why is this happening and how?

First of all, the behavior of Windows when it comes to processing write and read input/output (I/O) operations is identical despite the storage backend whether local or network or media despite spindles or flash. This is because Windows only ever sees a virtual disk - the logical disk within the file system itself. The OS is abstracted from the physical layer entirely. Windows doesn’t know and doesn’t care if the underlying storage is a local disk or SSD, an array full of SSDs, hyperconverged, or cloud. In the mind of the OS, the logical disk IS the physical disk when, in fact, it’s just a reference architecture. In the case of enterprise storage, the underlying storage controllers manage where the data physically lives. However, no storage device can dictate to Windows how to write (and subsequently read) in the most efficient manner possible.

This is why many enterprise storage controllers have their own proprietary algorithms to “clean up” the mess Windows gives it by either buffering or coalescing files on a dedicated SSD or NVRAM tier or physically move pieces of the same file to line up sequentially, which does nothing for the first penalized write nor several penalized reads after as the algorithm first needs to identify a continued pattern before moving blocks. As much as storage controller optimization helps, it’s a far cry from an actual solution because it doesn’t solve the source of the larger root cause problem - even with backend storage controller optimizations, Windows will still make the underlying server to storage architecture execute many more I/O operations than are required to write and subsequently read a file, and every extra I/O required takes a measure of time in the same way that four partially loaded dump trucks will take longer to deliver the full load versus one fully loaded dump truck. It bears repeating - no storage device can dictate to Windows how to best write and read files for the healthiest I/O profile that delivers optimum performance because only Windows controls how files are written to the logical disk. And that singular action is what determines the I/O density (or lack of) from server to storage.

The reason this is occurring is because there are no APIs that exist between the Windows OS and underlying storage system whereby free space at the logical layer can be intelligently synced and consolidated with the physical layer without change block movement that would otherwise wear out SSDs and trigger copy-on-write activity that would blow up storage services like replication, thin provisioning, and more.

This means Windows has no choice but to choose the next available allocation at the logical disk layer within the file systems itself instead of choosing the BEST allocation to write and subsequently read a file.

The problem is that the next available allocation is only ever the right size on day 1 on a freshly formatted NTFS volume. But as time goes on and files are written and erased and re-written and extended and many temporary files are quickly created and erased, that means the next available space is never the right size. So, when Windows is trying to write a 1MB file but the next available allocation at the logical disk layer is 4K, it will fill that 4K, split the file, generate another I/O operation, look for the next available allocation, fill, split, and rinse and repeat until the file is fully written, and your I/O profile is cluttered with split I/Os. The result is an I/O degradation of excessively small writes and reads that penalizes performance with a “death by a thousand cuts” scenario.

It’s for this reason, over 2,500 small, midsized, and large enterprises have deployed our I/O reduction software to eliminate all that noisy I/O robbing performance by addressing the root cause problem. Since Condusiv software sits at the storage driver level, our purview is able to supply patented intelligence to the Windows OS, enabling it to choose the BEST allocation for any file instead of the next available, which is never the right size. This ensures the healthiest I/O profile possible for maximum storage performance on every write and read. Above and beyond that benefit, our DRAM read caching engine (the same engine OEM’d by 9 of the top 10 PC manufacturers), eliminates hot reads from traversing the full stack from storage by serving it straight from idle, available DRAM. Customers who add anywhere to 4GB-16GB of memory to key systems with a read bias to get more from that engine, will offload 50-80% of all reads from storage, saving even more precious storage IOPS while serving from DRAM which is 15X faster than SSD. Those who need the most performance possible or simply need to free up more storage IOPS will max our 128GB threshold and offload 90-99% of reads from storage.

Let’s look at some real-world examples from customers.

Here is VDI in AWS shared by Curt Hapner (CIO, Altenloh Brinck & Co.). 63% of read traffic is being offloaded from underlying storage and 33% of write I/O operations. He was getting sluggish VDI performance, so he bumped up memory slightly on all instances to get more power from our software and the sluggishness disappeared.

Here is an Epicor ERP with SQL backend in AWS from Altenloh Brinck & Co. 39% of reads are being eliminated along with 44% of writes to boost the performance and efficiency of their most mission critical system.

 

Here’s from one of the largest federal branches in Washington running Windows servers on an all-flash Nutanix. 45% of reads are being offloaded and 38% of write traffic.

 

Here is a spreadsheet compilation of different systems from one of the largest hospitality and event companies in Europe who run their workloads in Azure. The extraction of the dashboard data into the CSV shows not just the percentage of read and write traffic offloaded from storage but how much I/O capacity our software is handing back to their Azure instances.

 

To illustrate we use the software here at Condusiv on our own systems, this dashboard screenshot is from our own Chief Architect (Rick Cadruvi), who uses Diskeeper on his SSD-powered PC. You can see him share his own production data in the recent “live demo” webinar on V-locity 7.0 - https://youtu.be/Zn2QGxBHUzs

As you can see, 50% of reads are offloaded from his local SSD while 42% of writes operations have been saved by displacing small, fractured files with large, clean contiguous files. Not only is that extending the life of his SSD by reducing write amplification, but he has saved over 6 days of I/O time in the last month.

 

Finally, regarding all-flash SAN storage systems, the full data is in this case study with the University of Illinois who used Condusiv I/O reduction software to more than double the performance of SQL and Oracle sitting on their all-flash arrays: http://learn.condusiv.com/rs/246-QKS-770/images/CS_University-Illinois.pdf?utm_campaign=CS_UnivIll_Case_Study

For a free trial, visit http://learn.condusiv.com/Try-V-locity.html. For best results, bump up memory on key systems if you can and make sure to install the software on all the VMs on the same host. If you have more than 10 VMs, you may want to Contact Us for SE assistance in spinning up our centralized management console to push everything at once – a 20-min exercise and no reboot required.

Please visit www.condusiv.com/v-locity for more than 20 case studies on how our I/O reduction software doubled the performance of mission critical applications like MS-SQL for customers of various environments.

Condusiv Smashes the I/O Performance Gap with New V-locity 7.0, Diskeeper 18, and SSDkeeper 2.0

by Brian Morin 6. April 2018 08:37

Condusiv is pleased to announce the release of V-locity® 7.0, Diskeeper® 18, and SSDkeeper 2.0 that smash the I/O Performance Gap on Windows servers and PCs as growing volumes of data continue to outpace the ability of underlying server and storage hardware to meet performance SLAs on mission critical workloads like MS-SQL.

The new 2018 editions of V-locity, Diskeeper, and SSDkeeper come with “no reboot” capabilities and enhanced reporting that offers a single pane view of all systems to show the exact benefit of I/O reduction software to each system in terms of number of noisy I/Os eliminated, percentage of read and write traffic offloaded from storage, and, most importantly, how much time is saved on each system as a result. It is also now easier than ever to quickly identify systems underperforming from a caching standpoint that could use more memory.

When a minimum of 30-40% I/O traffic from any Windows server is completely unnecessary, nothing but mere noise chewing up IOPS and throughput, it needs to be easy to see the exact levels of inefficiency on individual systems and what it means in terms of I/O reduction and “time saved” when Condusiv software is deployed to eliminate those inefficiencies. Since many customers choose to add a little more memory on key systems like MS-SQL to get even more from the software, it is now clearly evident what 50% or more reduction in I/O traffic actually means.

Our recent 4th annual I/O Performance Survey (no Condusiv customers included) found that MS-SQL performance problems are at their worst level in 4 years despite heavy investments in hardware infrastructure. 28% of mid-sized and large enterprises receive regular complaints from users regarding sluggish SQL-based applications. This is simply due to the growth of I/O outpacing the hardware stack’s ability to keep up. This is why it is more important than ever to consider I/O reduction software solutions that guarantee to solve performance issues instead of reactively throwing expensive new servers or storage at the problem.

Not only are the latest versions of V-locity, Diskeeper, and SSDkeeper easier to deploy and manage with “no reboot” capabilities, but reporting has been enhanced to enable administrators to quickly see the full value being provided to each system along with memory tuning recommendations for even more benefit.

A single pane view lists out all systems with associated workload data, memory data, and benefit data from I/O reduction software and lists systems as red, yellow, and green according to caching effectiveness to help administers quickly identify and prioritize systems that could use a little more memory to achieve a 50% or more reduction in I/O to storage.

Regarding the new “no reboot” capabilities, this is something that the engineering team has been attempting to crack for some time.  Per Rick Cadruvi, SVP, Engineering, “All storage filter drivers require a reboot, which is problematic for admins who manage software across thousands of servers. However, due to our extensive knowledge of Windows Kernel internals dating back to Windows NT 3.51, we were able to find a way to properly synchronize and handle the load/unload sequences of our driver transparently to other drivers in the storage stack so as not to require a reboot when deploying or updating Condusiv software.” 

Fore more on Condusiv’s quest for “no reboot” capabilities, see this blog by Rick Cadruvi, SVP Engineering: The Inside Story of Condusiv's No Reboot Quest 

This means that customers who are currently on V-locity 5.3 and higher or Diskeeper 15 and higher are able to upgrade to the latest version without a reboot. Customers on older versions will have to uninstall, reboot, then install the new version.

"As much as Condusiv I/O reduction software has been a real benefit to our applications running across 2,500+ Windows servers, we are happy to see a no reboot version of the software released so it is now truly "Set It & Forget It"®. My team is happy they no longer have to wrestle down a reboot window for hundreds of servers in order to update or deploy Condusiv software," said Blake W. Smith, MSME, System Director, Enterprise Infrastructure, CHRISTUS Health.

 

RecentComments

Comment RSS

Month List

Calendar

<<  September 2018  >>
MoTuWeThFrSaSu
272829303112
3456789
10111213141516
17181920212223
24252627282930
1234567

View posts in large calendar