Condusiv Technologies Blog

Condusiv Technologies Blog

Blogging @Condusiv

The Condusiv blog shares insight into the issues surrounding system and application performance—and how I/O optimization software is breaking new ground in solving those issues.

How to Improve Application Performance by Decreasing Disk Latency like an IT Engineer

by Spencer Allingham 13. June 2018 06:49

You might be responsible for a busy SQL server, for example, or a Web Server; perhaps a busy file and print server, the Finance Department's systems, documentation management, CRM, BI, or something else entirely.

Now, think about WHY these are the workloads that you care about the most?

 

Were YOU responsible for installing the application running the workload for your company? Is the workload being run business critical, or considered TOO BIG TO FAIL?

Or is it simply because users, or even worse, customers, complain about performance?

 

If the last question made you wince, because you know that YOU are responsible for some of the workloads running in your organisation that would benefit from additional performance, please read on. This article is just for you, even if you don't consider yourself a "Techie".

Before we get started, you should know that there are many variables that can affect the performance of the applications that you care about the most. The slowest, most restrictive of these is referred to as the "Bottleneck". Think of water being poured from a bottle. The water can only flow as fast as the neck of the bottle, the 'slowest' part of the bottle.

Don't worry though, in a computer the bottleneck will pretty much always fit into one of the following categories:

•           CPU

•           DISK

•           MEMORY

•           NETWORK

The good news is that if you're running Windows, it is usually very easy to find out which one the bottleneck is in, and here is how to do it (like an IT Engineer):

 •          Open Resource Monitor by clicking the Start menu, typing "resource monitor", and pressing Enter. Microsoft includes this as part of the Windows operating system and it is already installed.

 •          Do you see the graphs in the right-hand pane? When your computer is running at peak load, or users are complaining about performance, which of the graphs are 'maxing out'?

This is a great indicator of where your workload's bottleneck is to be found.         

 

SO, now you have identified the slowest part of your 'compute environment' (continue reading for more details), what can you do to improve it?

The traditional approach to solving computer performance issues has been to throw hardware at the solution. This could be treating yourself to a new laptop, or putting more RAM into your workstation, or on the more extreme end, buying new servers or expensive storage solutions.

BUT, how do you know when it is appropriate to spend money on new or additional hardware, and when it isn't. Well the answer is; 'when you can get the performance that you need', with the existing hardware infrastructure that you have already bought and paid for. You wouldn't replace your car, just because it needed a service, would you?

Let's take disk speed as an example.  Let’s take a look at the response time column in Resource Monitor. Make sure you open the monitor to full screen or large enough to see the data.  Then open the Disk Activity section so you can see the Response Time column.  Do it now on the computer you're using to read this. (You didn't close Resource Monitor yet, did you?) This is showing the Disk Response Time, or put another way, how long is the storage taking to read and write data? Of course, slower disk speed = slower performance, but what is considered good disk speed and bad?

To answer that question, I will refer to a great blog post by Scott Lowe, that you can read here...

https://www.techrepublic.com/blog/the-enterprise-cloud/use-resource-monitor-to-monitor-storage-performance/

In it, the author perfectly describes what to expect from faster and slower Disk Response Times:

"Response Time (ms). Disk response time in milliseconds. For this metric, a lower number is definitely better; in general, anything less than 10 ms is considered good performance. If you occasionally go beyond 10 ms, you should be okay, but if the system is consistently waiting more than 20 ms for response from the storage, then you may have a problem that needs attention, and it's likely that users will notice performance degradation. At 50 ms and greater, the problem is serious."

Hopefully when you checked on your computer, the Disk Response Time is below 20 milliseconds. BUT, what about those other workloads that you were thinking about earlier. What's the Disk Response Times on that busy SQL server, the CRM or BI platform, or those Windows servers that the users complain about?

If the Disk Response Times are often higher than 20 milliseconds, and you need to improve the performance, then it's choice time and there are basically two options:

           In my opinion as an IT Engineer, the most sensible option is to use storage workload reduction software like Diskeeper for physical Windows computers, or V-locity for virtualised Windows computers. These will reduce Disk Storage Times by allowing a good percentage of the data that your applications need to read, to come from a RAM cache, rather than slower disk storage. This works because RAM is much faster than the media in your disk storage. Best of all, the only thing you need to do to try it, is download a free copy of the 30 day trial. You don't even have to reboot the computer; just check and see if it is able to bring the Disk Response Times down for the workloads that you care about the most.

           If you have tried the Diskeeper or V-locity software, and you STILL need faster disk access, then, I'm afraid, it's time to start getting quotations for new hardware. It does make sense though, to take a couple of minutes to install Diskeeper or V-locity first, to see if this step can be avoided. The software solution to remove storage inefficiencies is typically a much more cost-effective solution than having to buy hardware!

Visit www.condusiv.com/try to download Diskeeper and V-locity now, for your free trial.

 

Windows is still Windows Whether in the Cloud, on Hyperconverged or All-flash

by Brian Morin 5. June 2018 04:43

Let me start by stating two facts – facts that I will substantiate if you continue to the end.

Fact #1 - Windows suffers from severe write inefficiencies that dampen overall performance. The holy grail question as to how severe is answered below.

Fact #2, Windows is still Windows whether running in the cloud, on hyperconverged systems, all-flash storage, or all three. Before you jump to the real-world examples below, let me first explain why.

No matter where you run Windows and no matter what kind of storage environment you run Windows on, Windows still penalizes optimal performance due to severe write inefficiencies in the hand-off of data to storage. Files are always broken down to be excessively smaller than they need to be. Since each piece means a dedicated I/O operation to process as a write or read, this means an enormous amount of noisy, unnecessary I/O traffic is chewing up precious IOPS, eroding throughput, and causing everything to run slower despite how many IOPS are at your disposal.

How much slower?

Now that the latest version of our I/O reduction software is being run across tens of thousands of servers and hundreds of thousands of PCs, we can empirically point out that no matter what kind of environment Windows is running on, there is always 30-40% of I/O traffic that is nothing but mere noise stealing resources and robbing optimal performance.

Yes, there are edge cases in which the inefficiency is as little as 10% but also other edge cases where the inefficiency is upwards of 70%. That being said, the median range is solidly in the 30-40% range and it has absolutely nothing to do with the backend media whether spindle, flash, hybrid, hyperconverged, cloud, or local storage.

Even if running Windows on an all-flash hyperconverged system, SAN or cloud environment with low latency and high IOPS, if the I/O profile isn’t addressed by our I/O reduction software to ensure large, clean, contiguous writes and reads, then 30-40% more IOPS will always be required for any given workload, which adds up to unnecessarily giving away 30-40% of the IOPS you paid for while slowing the completion of every job and query by the same amount.

So what’s going on here? Why is this happening and how?

First of all, the behavior of Windows when it comes to processing write and read input/output (I/O) operations is identical despite the storage backend whether local or network or media despite spindles or flash. This is because Windows only ever sees a virtual disk - the logical disk within the file system itself. The OS is abstracted from the physical layer entirely. Windows doesn’t know and doesn’t care if the underlying storage is a local disk or SSD, an array full of SSDs, hyperconverged, or cloud. In the mind of the OS, the logical disk IS the physical disk when, in fact, it’s just a reference architecture. In the case of enterprise storage, the underlying storage controllers manage where the data physically lives. However, no storage device can dictate to Windows how to write (and subsequently read) in the most efficient manner possible.

This is why many enterprise storage controllers have their own proprietary algorithms to “clean up” the mess Windows gives it by either buffering or coalescing files on a dedicated SSD or NVRAM tier or physically move pieces of the same file to line up sequentially, which does nothing for the first penalized write nor several penalized reads after as the algorithm first needs to identify a continued pattern before moving blocks. As much as storage controller optimization helps, it’s a far cry from an actual solution because it doesn’t solve the source of the larger root cause problem - even with backend storage controller optimizations, Windows will still make the underlying server to storage architecture execute many more I/O operations than are required to write and subsequently read a file, and every extra I/O required takes a measure of time in the same way that four partially loaded dump trucks will take longer to deliver the full load versus one fully loaded dump truck. It bears repeating - no storage device can dictate to Windows how to best write and read files for the healthiest I/O profile that delivers optimum performance because only Windows controls how files are written to the logical disk. And that singular action is what determines the I/O density (or lack of) from server to storage.

The reason this is occurring is because there are no APIs that exist between the Windows OS and underlying storage system whereby free space at the logical layer can be intelligently synced and consolidated with the physical layer without change block movement that would otherwise wear out SSDs and trigger copy-on-write activity that would blow up storage services like replication, thin provisioning, and more.

This means Windows has no choice but to choose the next available allocation at the logical disk layer within the file systems itself instead of choosing the BEST allocation to write and subsequently read a file.

The problem is that the next available allocation is only ever the right size on day 1 on a freshly formatted NTFS volume. But as time goes on and files are written and erased and re-written and extended and many temporary files are quickly created and erased, that means the next available space is never the right size. So, when Windows is trying to write a 1MB file but the next available allocation at the logical disk layer is 4K, it will fill that 4K, split the file, generate another I/O operation, look for the next available allocation, fill, split, and rinse and repeat until the file is fully written, and your I/O profile is cluttered with split I/Os. The result is an I/O degradation of excessively small writes and reads that penalizes performance with a “death by a thousand cuts” scenario.

It’s for this reason, over 2,500 small, midsized, and large enterprises have deployed our I/O reduction software to eliminate all that noisy I/O robbing performance by addressing the root cause problem. Since Condusiv software sits at the storage driver level, our purview is able to supply patented intelligence to the Windows OS, enabling it to choose the BEST allocation for any file instead of the next available, which is never the right size. This ensures the healthiest I/O profile possible for maximum storage performance on every write and read. Above and beyond that benefit, our DRAM read caching engine (the same engine OEM’d by 9 of the top 10 PC manufacturers), eliminates hot reads from traversing the full stack from storage by serving it straight from idle, available DRAM. Customers who add anywhere to 4GB-16GB of memory to key systems with a read bias to get more from that engine, will offload 50-80% of all reads from storage, saving even more precious storage IOPS while serving from DRAM which is 15X faster than SSD. Those who need the most performance possible or simply need to free up more storage IOPS will max our 128GB threshold and offload 90-99% of reads from storage.

Let’s look at some real-world examples from customers.

Here is VDI in AWS shared by Curt Hapner (CIO, Altenloh Brinck & Co.). 63% of read traffic is being offloaded from underlying storage and 33% of write I/O operations. He was getting sluggish VDI performance, so he bumped up memory slightly on all instances to get more power from our software and the sluggishness disappeared.

Here is an Epicor ERP with SQL backend in AWS from Altenloh Brinck & Co. 39% of reads are being eliminated along with 44% of writes to boost the performance and efficiency of their most mission critical system.

 

Here’s from one of the largest federal branches in Washington running Windows servers on an all-flash Nutanix. 45% of reads are being offloaded and 38% of write traffic.

 

Here is a spreadsheet compilation of different systems from one of the largest hospitality and event companies in Europe who run their workloads in Azure. The extraction of the dashboard data into the CSV shows not just the percentage of read and write traffic offloaded from storage but how much I/O capacity our software is handing back to their Azure instances.

 

To illustrate we use the software here at Condusiv on our own systems, this dashboard screenshot is from our own Chief Architect (Rick Cadruvi), who uses Diskeeper on his SSD-powered PC. You can see him share his own production data in the recent “live demo” webinar on V-locity 7.0 - https://youtu.be/Zn2QGxBHUzs

As you can see, 50% of reads are offloaded from his local SSD while 42% of writes operations have been saved by displacing small, fractured files with large, clean contiguous files. Not only is that extending the life of his SSD by reducing write amplification, but he has saved over 6 days of I/O time in the last month.

 

Finally, regarding all-flash SAN storage systems, the full data is in this case study with the University of Illinois who used Condusiv I/O reduction software to more than double the performance of SQL and Oracle sitting on their all-flash arrays: http://learn.condusiv.com/rs/246-QKS-770/images/CS_University-Illinois.pdf?utm_campaign=CS_UnivIll_Case_Study

For a free trial, visit http://learn.condusiv.com/Try-V-locity.html. For best results, bump up memory on key systems if you can and make sure to install the software on all the VMs on the same host. If you have more than 10 VMs, you may want to Contact Us for SE assistance in spinning up our centralized management console to push everything at once – a 20-min exercise and no reboot required.

Please visit www.condusiv.com/v-locity for more than 20 case studies on how our I/O reduction software doubled the performance of mission critical applications like MS-SQL for customers of various environments.

It’s Diskeeper Groundhog Day

by Brian Morin 12. July 2017 06:08

Every week, I find myself sending the same email to at least one Diskeeper® customer. Almost every time, it’s a new manager/director/VP who joins the company and hears Diskeeper is running on their physical servers or clients. This is how it goes:

Hi Christopher,

I’m the SVP of WW Sales here and received news XYZ company may not be renewing support on your Diskeeper licenses because of concerns with it on MS-SQL servers attached to SAN storage with SSDs.

Since you own these licenses, I wanted to reach out for a quick 15-min tech conversation only to make sure you understand what you have in the latest version. Many still have legacy ideas of Diskeeper when it was a “defrag” product and not applicable to the new world order of SSDs and modern SANs.

I guarantee the current version of our product will offload anywhere from 30-40% of your I/O traffic from your underlying storage to provide a nice performance boost and give some precious I/O headroom back to that subsystem. Many customers offload >50% of I/O by simply adding more memory, enabling them to sweat their storage assets significantly and use those IOPS for other things. It’s a free upgrade while you are active on your maintenance.

As a primer, you can read this case study we published last month with the University of Illinois in which we doubled performance of their SQL and Oracle applications sitting on all-flash arrays. And if you want the short, short summary of what the technology does now, here’s the 2-min video: https://www.youtube.com/watch?v=Ge-49YYPwBM. This is why Gartner named us Cool Vendor of the Year a couple years back.

The good news is that they have heard of Diskeeper. The bad news is that they still associate it with legacy versions that emphasized defragmentation applicable only to spinning disk. For those customers who virtualized and converted their licenses to V-locity®, we don’t run into this issue.

If you are a current Diskeeper customer and have difficulty educating new management, I suggest setting up a tech review between Condusiv and new team members. If you are running all-flash, an alternative approach others have taken is simply replacing their Diskeeper with SSDkeeper® so they don’t run into old “defrag” objections. The core features are identical and both auto-detect if the storage is HDD or SSD and apply the best optimization method.

Keep in mind, since Diskeeper now proactively eliminates excessively small tiny writes and reads in the first place, the whole concept of “defragmentation” is a dead concept for the most part except in some extenuating circumstances. An example would be a heavily fragmented volume on spinning disk that didn’t have Diskeeper on it and could use a one-time clean up. A good example of this is a new customer last week whose full backup was taking over a day to complete! After running Diskeeper on their physical servers, the backup time was cut in half.

Tags:

Defrag | Diskeeper | SAN | SSD, Solid State, Flash

University of Illinois Doubles SQL and Oracle Performance on All-Flash Arrays with V-locity® I/O Reduction Software

by Brian Morin 19. May 2017 11:00

The University of Illinois, had already deployed an all-flash Dell Compellent storage array to support their hardest hitting application, AssetWorks AiM, that runs on Oracle. After a year in service, performance began to erode due to growth in users and overall workload. Other hard hitting MS-SQL applications supported by hybrid arrays were suffering performance degradation as well.

The one common denominator is that all these applications ran on Windows servers – Windows Server 2012R2.

“As we learned through this exercise with Condusiv’s V-locity I/O reduction software, we were getting hit really hard by thousands of excessively small, tiny writes and reads that dampened performance significantly,” said Greg Landes, Manager of Systems Services. “Everything was just slower due to Windows Server write inefficiencies that break writes down to be much smaller than they need to be, and forces the all-flash SAN to process far more I/O operations than necessary for any given workload.”

Landes continued, “When you have a dump truck but are only filling it a shovelful at a time before sending it on, you’re not getting near the payload you should get with each trip. That’s the exact effect we were getting with a surplus of unnecessarily small, fractured writes and subsequent reads, and it was really hurting our storage performance, even though we had a really fast ‘dump truck.’ We had no idea how much this was hurting us until we tried V-locity to address the root-cause problem to get more payload with every write. When you no longer have to process three small, fractured writes for something that only needs one write and a single I/O operation, everything is just faster.”

When testing the before-and-after effect on their production system, Greg and his team first measured without V-locity. It took 4 hours and 3 minutes to process 1.57TB of data, requiring 13,910,568 I/O operations from storage. After V-locity, the same system processed 2.1TB of data in 1 hour and 6 minutes, while only needing to process 2,702,479 I/O operations from underlying storage. “We processed half a terabyte more in a quarter of the time,” said Landes.

“Not only did V-locity dramatically help our write-heavy MS-SQL and Oracle Servers by increasing performance 50–100% on several workloads, we saw even bigger gains on our read heavy applications that could take advantage of V-locity’s patented DRAM caching engine, that put our idle, unused memory to good use. Since we had provisioned adequate memory for these I/O-intensive systems, we were well positioned to get the most from V-locity.”

“We thought we were getting the most performance possible from our systems, but it wasn’t until we used V-locity that we realized how inefficient these systems really are if you’re not addressing the root cause performance issues related to the I/O profile from Windows servers. By solving the issue of small, fractured, random I/O, we’ve been able to increase the efficiency of our infrastructure and, ultimately, our people,” said Landes.

Tags:

General | SAN | Success Stories | virtualization | V-Locity | Windows Server 2012

How Can I/O Reduction Software Guarantee to Solve the Toughest Performance Problems?

by Brian Morin 14. January 2017 01:00

The #1 request I’ve been getting from customers is a white board video that succinctly explains the two silent killers of VM performance and how our I/O reduction guarantees to solve performance problems, so applications run perfectly on every Windows server.

Expensive backend storage upgrades should ONLY take place when needing more capacity – not more performance. Anytime I tell someone our I/O reduction software guarantees to solve their toughest performance problems…the very first response is invariably the same…HOW? Not only have I answered this question hundreds of times, our own customers find themselves answering this question repeatedly to other team members or new hires.

To make this easier, I’ve answered it all here in this 10-min White Board Video ->, or you can continue reading.

 Most of us have been upgrading hardware to get more performance ever since we can remember. It’s become so engrained, it’s often times the ONLY approach we think of when needing a performance upgrade.

For many organizations, they don’t necessarily need a performance boost on EVERY application, but they need it on one or two I/O intensive applications. To throw a new all-flash array or new hybrid array at a performance problem ends up being the most expensive and disruptive way to solve a performance problem when all you have to do is the same thing thousands of our customers have done: simply try our I/O reduction software on any Windows server and watch the application run at least 50% faster and in many cases 2X-10X faster.

Most IT professionals are unaware of the fact that as great as virtualization has been for server efficiency, the one downside is how it adds complexity to the data path. On top of that, Windows doesn’t play well in a virtual environment (or any environment where it is abstracted from the physical layer). This means I/O characteristics that are a lot smaller, more fractured and more random than they need to be – the perfect trifecta for bad storage performance.

This “death by a thousand cuts” scenario means systems are processing workloads about 50% slower than they should. Condusiv’s I/O reduction software solves this problem by displacing many small tiny writes and reads with large, clean contiguous writes and reads. As huge as that patented engine is for our customers, it’s not the only thing we’re doing to make applications run smoothly. Performance is further electrified by establishing a tier-0 caching strategy - automatically using idle, available memory to serve hot reads. This is the same battle-tested technology that has been OEM’d by some of the largest out there – Dell, Lenovo, HP, SanDisk, Western Digital, just to name a few.

Although we might be most known for our first patented engine that solves Windows write inefficiencies to HDDs or SSDs, more and more customers are discovering just how important our patented DRAM caching engine is. If any customer can maintain even just 4GB of available memory to be used for cache, they most often see cache hit rates in the range of 50%. That means serving data out of DRAM, which is 15X faster than SSD and opens up even more precious bandwidth to and from storage for everything else. Other customers who really need to crank up performance are simply provisioning more memory on those systems and seeing >90% cache hit rates.

See all this and more described in the latest Condusiv I/O Reduction White Board video that explains eeevvvveeerything you need to know about the problem, how we solve it, and the typical results that should be expected in the time it takes you to drink a cup of coffee. So go get a cup of coffee, sit back, relax, and see how we can solve your toughest performance problems – guaranteed.

 

RecentComments

Comment RSS

Month List

Calendar

<<  June 2018  >>
MoTuWeThFrSaSu
28293031123
45678910
11121314151617
18192021222324
2526272829301
2345678

View posts in large calendar