Condusiv Technologies Blog

Condusiv Technologies Blog

Blogging @Condusiv

The Condusiv blog shares insight into the issues surrounding system and application performance—and how I/O optimization software is breaking new ground in solving those issues.

Windows is still Windows Whether in the Cloud, on Hyperconverged or All-flash

by Brian Morin 5. June 2018 04:43

Let me start by stating two facts – facts that I will substantiate if you continue to the end.

Fact #1 - Windows suffers from severe write inefficiencies that dampen overall performance. The holy grail question as to how severe is answered below.

Fact #2, Windows is still Windows whether running in the cloud, on hyperconverged systems, all-flash storage, or all three. Before you jump to the real-world examples below, let me first explain why.

No matter where you run Windows and no matter what kind of storage environment you run Windows on, Windows still penalizes optimal performance due to severe write inefficiencies in the hand-off of data to storage. Files are always broken down to be excessively smaller than they need to be. Since each piece means a dedicated I/O operation to process as a write or read, this means an enormous amount of noisy, unnecessary I/O traffic is chewing up precious IOPS, eroding throughput, and causing everything to run slower despite how many IOPS are at your disposal.

How much slower?

Now that the latest version of our I/O reduction software is being run across tens of thousands of servers and hundreds of thousands of PCs, we can empirically point out that no matter what kind of environment Windows is running on, there is always 30-40% of I/O traffic that is nothing but mere noise stealing resources and robbing optimal performance.

Yes, there are edge cases in which the inefficiency is as little as 10% but also other edge cases where the inefficiency is upwards of 70%. That being said, the median range is solidly in the 30-40% range and it has absolutely nothing to do with the backend media whether spindle, flash, hybrid, hyperconverged, cloud, or local storage.

Even if running Windows on an all-flash hyperconverged system, SAN or cloud environment with low latency and high IOPS, if the I/O profile isn’t addressed by our I/O reduction software to ensure large, clean, contiguous writes and reads, then 30-40% more IOPS will always be required for any given workload, which adds up to unnecessarily giving away 30-40% of the IOPS you paid for while slowing the completion of every job and query by the same amount.

So what’s going on here? Why is this happening and how?

First of all, the behavior of Windows when it comes to processing write and read input/output (I/O) operations is identical despite the storage backend whether local or network or media despite spindles or flash. This is because Windows only ever sees a virtual disk - the logical disk within the file system itself. The OS is abstracted from the physical layer entirely. Windows doesn’t know and doesn’t care if the underlying storage is a local disk or SSD, an array full of SSDs, hyperconverged, or cloud. In the mind of the OS, the logical disk IS the physical disk when, in fact, it’s just a reference architecture. In the case of enterprise storage, the underlying storage controllers manage where the data physically lives. However, no storage device can dictate to Windows how to write (and subsequently read) in the most efficient manner possible.

This is why many enterprise storage controllers have their own proprietary algorithms to “clean up” the mess Windows gives it by either buffering or coalescing files on a dedicated SSD or NVRAM tier or physically move pieces of the same file to line up sequentially, which does nothing for the first penalized write nor several penalized reads after as the algorithm first needs to identify a continued pattern before moving blocks. As much as storage controller optimization helps, it’s a far cry from an actual solution because it doesn’t solve the source of the larger root cause problem - even with backend storage controller optimizations, Windows will still make the underlying server to storage architecture execute many more I/O operations than are required to write and subsequently read a file, and every extra I/O required takes a measure of time in the same way that four partially loaded dump trucks will take longer to deliver the full load versus one fully loaded dump truck. It bears repeating - no storage device can dictate to Windows how to best write and read files for the healthiest I/O profile that delivers optimum performance because only Windows controls how files are written to the logical disk. And that singular action is what determines the I/O density (or lack of) from server to storage.

The reason this is occurring is because there are no APIs that exist between the Windows OS and underlying storage system whereby free space at the logical layer can be intelligently synced and consolidated with the physical layer without change block movement that would otherwise wear out SSDs and trigger copy-on-write activity that would blow up storage services like replication, thin provisioning, and more.

This means Windows has no choice but to choose the next available allocation at the logical disk layer within the file systems itself instead of choosing the BEST allocation to write and subsequently read a file.

The problem is that the next available allocation is only ever the right size on day 1 on a freshly formatted NTFS volume. But as time goes on and files are written and erased and re-written and extended and many temporary files are quickly created and erased, that means the next available space is never the right size. So, when Windows is trying to write a 1MB file but the next available allocation at the logical disk layer is 4K, it will fill that 4K, split the file, generate another I/O operation, look for the next available allocation, fill, split, and rinse and repeat until the file is fully written, and your I/O profile is cluttered with split I/Os. The result is an I/O degradation of excessively small writes and reads that penalizes performance with a “death by a thousand cuts” scenario.

It’s for this reason, over 2,500 small, midsized, and large enterprises have deployed our I/O reduction software to eliminate all that noisy I/O robbing performance by addressing the root cause problem. Since Condusiv software sits at the storage driver level, our purview is able to supply patented intelligence to the Windows OS, enabling it to choose the BEST allocation for any file instead of the next available, which is never the right size. This ensures the healthiest I/O profile possible for maximum storage performance on every write and read. Above and beyond that benefit, our DRAM read caching engine (the same engine OEM’d by 9 of the top 10 PC manufacturers), eliminates hot reads from traversing the full stack from storage by serving it straight from idle, available DRAM. Customers who add anywhere to 4GB-16GB of memory to key systems with a read bias to get more from that engine, will offload 50-80% of all reads from storage, saving even more precious storage IOPS while serving from DRAM which is 15X faster than SSD. Those who need the most performance possible or simply need to free up more storage IOPS will max our 128GB threshold and offload 90-99% of reads from storage.

Let’s look at some real-world examples from customers.

Here is VDI in AWS shared by Curt Hapner (CIO, Altenloh Brinck & Co.). 63% of read traffic is being offloaded from underlying storage and 33% of write I/O operations. He was getting sluggish VDI performance, so he bumped up memory slightly on all instances to get more power from our software and the sluggishness disappeared.

Here is an Epicor ERP with SQL backend in AWS from Altenloh Brinck & Co. 39% of reads are being eliminated along with 44% of writes to boost the performance and efficiency of their most mission critical system.

 

Here’s from one of the largest federal branches in Washington running Windows servers on an all-flash Nutanix. 45% of reads are being offloaded and 38% of write traffic.

 

Here is a spreadsheet compilation of different systems from one of the largest hospitality and event companies in Europe who run their workloads in Azure. The extraction of the dashboard data into the CSV shows not just the percentage of read and write traffic offloaded from storage but how much I/O capacity our software is handing back to their Azure instances.

 

To illustrate we use the software here at Condusiv on our own systems, this dashboard screenshot is from our own Chief Architect (Rick Cadruvi), who uses Diskeeper on his SSD-powered PC. You can see him share his own production data in the recent “live demo” webinar on V-locity 7.0 - https://youtu.be/Zn2QGxBHUzs

As you can see, 50% of reads are offloaded from his local SSD while 42% of writes operations have been saved by displacing small, fractured files with large, clean contiguous files. Not only is that extending the life of his SSD by reducing write amplification, but he has saved over 6 days of I/O time in the last month.

 

Finally, regarding all-flash SAN storage systems, the full data is in this case study with the University of Illinois who used Condusiv I/O reduction software to more than double the performance of SQL and Oracle sitting on their all-flash arrays: http://learn.condusiv.com/rs/246-QKS-770/images/CS_University-Illinois.pdf?utm_campaign=CS_UnivIll_Case_Study

For a free trial, visit http://learn.condusiv.com/Try-V-locity.html. For best results, bump up memory on key systems if you can and make sure to install the software on all the VMs on the same host. If you have more than 10 VMs, you may want to Contact Us for SE assistance in spinning up our centralized management console to push everything at once – a 20-min exercise and no reboot required.

Please visit www.condusiv.com/v-locity for more than 20 case studies on how our I/O reduction software doubled the performance of mission critical applications like MS-SQL for customers of various environments.

Condusiv Smashes the I/O Performance Gap with New V-locity 7.0, Diskeeper 18, and SSDkeeper 2.0

by Brian Morin 6. April 2018 08:37

Condusiv is pleased to announce the release of V-locity® 7.0, Diskeeper® 18, and SSDkeeper 2.0 that smash the I/O Performance Gap on Windows servers and PCs as growing volumes of data continue to outpace the ability of underlying server and storage hardware to meet performance SLAs on mission critical workloads like MS-SQL.

The new 2018 editions of V-locity, Diskeeper, and SSDkeeper come with “no reboot” capabilities and enhanced reporting that offers a single pane view of all systems to show the exact benefit of I/O reduction software to each system in terms of number of noisy I/Os eliminated, percentage of read and write traffic offloaded from storage, and, most importantly, how much time is saved on each system as a result. It is also now easier than ever to quickly identify systems underperforming from a caching standpoint that could use more memory.

When a minimum of 30-40% I/O traffic from any Windows server is completely unnecessary, nothing but mere noise chewing up IOPS and throughput, it needs to be easy to see the exact levels of inefficiency on individual systems and what it means in terms of I/O reduction and “time saved” when Condusiv software is deployed to eliminate those inefficiencies. Since many customers choose to add a little more memory on key systems like MS-SQL to get even more from the software, it is now clearly evident what 50% or more reduction in I/O traffic actually means.

Our recent 4th annual I/O Performance Survey (no Condusiv customers included) found that MS-SQL performance problems are at their worst level in 4 years despite heavy investments in hardware infrastructure. 28% of mid-sized and large enterprises receive regular complaints from users regarding sluggish SQL-based applications. This is simply due to the growth of I/O outpacing the hardware stack’s ability to keep up. This is why it is more important than ever to consider I/O reduction software solutions that guarantee to solve performance issues instead of reactively throwing expensive new servers or storage at the problem.

Not only are the latest versions of V-locity, Diskeeper, and SSDkeeper easier to deploy and manage with “no reboot” capabilities, but reporting has been enhanced to enable administrators to quickly see the full value being provided to each system along with memory tuning recommendations for even more benefit.

A single pane view lists out all systems with associated workload data, memory data, and benefit data from I/O reduction software and lists systems as red, yellow, and green according to caching effectiveness to help administers quickly identify and prioritize systems that could use a little more memory to achieve a 50% or more reduction in I/O to storage.

Regarding the new “no reboot” capabilities, this is something that the engineering team has been attempting to crack for some time.  Per Rick Cadruvi, SVP, Engineering, “All storage filter drivers require a reboot, which is problematic for admins who manage software across thousands of servers. However, due to our extensive knowledge of Windows Kernel internals dating back to Windows NT 3.51, we were able to find a way to properly synchronize and handle the load/unload sequences of our driver transparently to other drivers in the storage stack so as not to require a reboot when deploying or updating Condusiv software.” 

Fore more on Condusiv’s quest for “no reboot” capabilities, see this blog by Rick Cadruvi, SVP Engineering: The Inside Story of Condusiv's No Reboot Quest 

This means that customers who are currently on V-locity 5.3 and higher or Diskeeper 15 and higher are able to upgrade to the latest version without a reboot. Customers on older versions will have to uninstall, reboot, then install the new version.

"As much as Condusiv I/O reduction software has been a real benefit to our applications running across 2,500+ Windows servers, we are happy to see a no reboot version of the software released so it is now truly "Set It & Forget It"®. My team is happy they no longer have to wrestle down a reboot window for hundreds of servers in order to update or deploy Condusiv software," said Blake W. Smith, MSME, System Director, Enterprise Infrastructure, CHRISTUS Health.

 

MS-SQL Performance Woes Worsen Despite Massive Hardware Investments, Survey Confirms

by Brian Morin 26. February 2018 06:36

We just completed our 4th annual I/O Performance Survey from 870 IT Professionals. This is one of the industry’s largest studies of its kind and reveals the latest trends in applications driving performance demands, and how IT Professionals are responding.

The survey results consist of 20 detailed questions designed to identify the impact of I/O growth in the modern IT environment. In addition to multiple choice questions, the survey included optional open responses, allowing a respondent to provide commentary on why they selected a particular answer. All of the individual responses have been compiled across 68 pages to help readers dive deeply into any question.

The most interesting year over year take away when comparing the 2018 report against the 2017 report is that IT Professionals cite worsening performance from applications sitting on MS-SQL despite larger investments in multi-core servers and all-flash storage. More than 1 in 4 respondents cite staff and customer complaints regarding sluggish SQL queries and reports. No Condusiv customer was included in the survey.

Below is a word cloud representing hundreds of answers to visually show the applications IT Pros are having the most trouble to support from a performance standpoint.

To see what has either grown or shrunk, below is the same word cloud from 2017 responses.

 

The full report can be accessed by visiting http://learn.condusiv.com/IO-Performance-Study.html

As much as organizations continue to reactively respond to performance challenges by purchasing expensive new server and storage hardware, our V-locity® I/O reduction software offers a far more efficient path by guaranteeing to solve the toughest application performance challenges on I/O intensive systems like MS-SQL. This means organizations are able to offload 50% of I/O traffic from storage that is nothing but mere noise chewing up IOPS and dampening performance. As soon as we open up 50% of that bandwidth to storage, sluggish performance disappears and now there’s far more storage IOPS to be used for other things.

To learn more about how V-locity I/O reduction software eliminates the two big I/O inefficiencies in a virtual environment see 2-min Video: Condusiv I/O Reduction Software Overview

 

How to Achieve 2X Faster MS-SQL Applications

by Brian Morin 8. November 2017 05:31

By following the best practices outlined here, we can virtually guarantee a 2X or faster boost in your MS-SQL performance with our I/O reduction software.

  1) Don’t just run our I/O reduction software on the SQL Server instances but also on the application servers that run on top of MS-SQL

- It’s not just SQL performance that needs improvement, but the associated application servers that communicate with SQL. Our software will eliminate a minimum of 30-40% of the I/O traffic from those systems.

  2) Run our I/O reduction software on all the non-SQL systems on the same host/hypervisor

- Sometimes a customer is only concerned with improving their SQL performance, so they only install our I/O reduction software on the SQL Server instances. Keep in mind, the other VMs on the same host/hypervisor are interfering with the performance of your SQL instances due to chatty I/O that is contending for the same storage resources. Our software eliminates a minimum of 30-40% of the I/O traffic from those systems that is completely unnecessary, so they don’t interfere with your SQL performance.

- Any customer that is on the core or host pricing model is able to deploy the software to an unlimited number of guest machines on the same host. If you are on per system pricing, consider migrating to a host model if your VM density is 7 or greater.

  3) Cap MS-SQL memory usage, leaving at least 8GB left over

- Perhaps the largest SQL inefficiency is related to how it uses memory. SQL is a memory hog. It takes everything you give it then does very little with it to actually boost performance, which is why customers see such big gains with our software when memory has been tuned properly. If SQL is left uncapped, our software will not see any memory available to be used for cache, so only our write optimization engine will be in effect. Moreover, most DB admins cap SQL, leaving 4GB for the OS to use according to Microsoft’s own best practice.

- However, when using our software, it is best to begin by capping SQL a little more aggressively by leaving 8GB. That will give plenty to the OS, and whatever is leftover as idle will be dynamically leveraged by our software for cache. If 4GB is available to be used as cache by our software, we commonly see customers achieve 50% cache hit rates. It doesn’t take much capacity for our software to drive big gains.

  4) Consider adding more memory to the SQL Server

- Some customers will add more memory then limit SQL memory usage to what it was using originally, which leaves the extra RAM left over for our software to use as cache. This also alleviates concerns about capping SQL aggressively if you feel that it may result in the application being memory starved. The very best place to start is by adding 16GB of RAM then monitor the dashboard to see the impact. The software can use up to 128GB of DRAM. Those customers who are generous in this approach on read-heavy applications get into otherworldly kind of gains far beyond 2X with >90% of I/O served from DRAM. Remember, DRAM is 15X faster than SSD and sits next to the CPU.

  5) Monitor the dashboard for a 50% reduction in I/O traffic to storage

- When our dashboard shows a 50% reduction in I/O to storage, that’s when you know you have properly tuned your system to be in the range of 2X faster gains to the user, barring any network congestion issues or delivery issues.

- As much as capping SQL at 8GB may be a good place to start, it may not always get you to the desired 50% I/O reduction number. Monitor the dashboard to see how much I/O is being offloaded and simply tweak memory usage by capping SQL a little more aggressively. If you feel you may be memory constrained already, then add a little more memory, so you can cap more aggressively. For every 1-2GB of memory added, another 10-25% of read traffic will be offloaded.

 

Not a customer yet? Download a free trial of Condusiv I/O reduction software and apply these best practice steps at www.condusiv.com/try

 

New Dashboard Finally Answers the Big Question

by Brian Morin 25. October 2017 04:38

After surveying thousands of IT professionals, we’ve found that the vast majority agree that Windows performance degrades over time – they just don’t agree on how much. Unbeknown to most is what the problem actually is, which is I/O degradation as the size of writes and reads become excessively smaller than they should. This inefficiency is akin to moving a gallon of water across a room with dixie cups instead of a single gallon jug. Even if you have all-flash storage and can move those dixie cups quickly, you are still not processing data nearly as fast as you could.

In the same surveys, we’ve also found that the vast majority of IT professionals are aware of the performance penalty of the “I/O blender” effect in a virtual environment, which is the mixing and randomizing of I/O streams from the disparate virtual machines on the same host. What they don’t agree on is how much. And, they are not aware of how the issue is compounded by Windows write inefficiencies.

Now that the latest Condusiv in-product dashboard has been deployed across thousands of customer systems who have upgraded their Condusiv I/O reduction software to the latest version, customers are getting their first-ever granular view into what I/O reduction software is doing for their systems in terms of seeing the exact percentage and number of read and write I/O operations eliminated from storage and how much I/O time that saves any given system or group of systems. Ultimately, it’s a picture into the size of the problem – all the I/O traffic that is mere noise – all the unnecessary I/O that dampens system performance.

In our surveys, we found IT professionals all over the map on the size of the performance penalty from inefficiencies. Some are quite positive the performance penalty is no more than 10%. More put that range at 20%. Most put it at 30%. Then it dips back down with fewer believing a 40% penalty with the fewest throwing the dart at 50%.

As it turns out, our latest version has been able to drop a pin on that.

There are variances that move the extent of the penalty on any given workload such as system configuration and/or workload behavior. Some systems might be memory constrained, some workloads might be too light to matter, etc.

However, after thousands of installs over the last several months, we see a very consistent range on the vast majority of systems in which 30-40% of all I/O traffic is being offloaded from underlying storage with our software. Not only does that represent an immediate performance boost for users, but it also means 30-40% of I/O headroom is handed back to the storage subsystem that can now use those IOPS for other things.

The biggest factor to consider is the 30-40% improvement number represents systems where memory has not been increased beyond the typical configuration that most administrators use. Customers who offload 50% or more of I/O traffic from storage are the ones with read heavy workloads who beef up memory server-side to get more from the software. For every additional 1-2GB of memory added, another 10-25% of read traffic is offloaded. Some customers are more aggressive and leverage as much memory as possible server-side to offload 90% or more I/O traffic on read-heavy applications.

As expensive as new all-flash systems are, how much sense does it make to pay for all those IOPS only to allow 30-40% of those IOPS to be chewed up by unnecessary, noisy I/O? By addressing the two biggest penalties that dampen performance (Windows write inefficiencies compounded by the “I/O blender” effect), Condusiv I/O reduction software ensures optimal performance and protects the CapEx investments made into servers and storage by extending their useful life.

Tags:

Disruption, Application Performance, IOPS | virtualization | V-Locity

RecentComments

Comment RSS

Month List

Calendar

<<  January 2019  >>
MoTuWeThFrSaSu
31123456
78910111213
14151617181920
21222324252627
28293031123
45678910

View posts in large calendar