Condusiv Technologies Blog

Condusiv Technologies Blog

Blogging @Condusiv

The Condusiv blog shares insight into the issues surrounding system and application performance—and how I/O optimization software is breaking new ground in solving those issues.

Help! I hit “Save” instead of “Save As”!!

by Gary Quan 19. June 2018 06:30

Need to get back to a previous version of a Microsoft Office file before the changes you just made?  Undelete has you covered with its Versioning feature.

Have you or your users ever made some changes to a Word document, Excel spreadsheet, or a PowerPoint presentation, saved it and then realized later that what was saved did not contain the previous work? For example, and a true story, a CEO was working on a PowerPoint file he needed for a Board of Directors presentation that afternoon. He had worked about 4 hours that morning making changes and he was careful to periodically save the changes as he worked. The trouble was the last changes he saved had a large part of his previous changes accidentally overwritten.  The CEO then panicked as he just lost a majority of the 4 hours of work he just put in and was not sure he could redo it in time for his presentation deadline. He immediately called up his IT Manager who indicated the nightly backups would not help as they would not contain any of the changes he made that morning. The IT Manager then remembered he had Undelete installed on this file server. This was mainly to recover accidentally deleted files, but he recalled a Versioning feature that would allow recovery of previous versions of Microsoft Office files. He was then able use Undelete to retrieve the previous version of the CEO’s PowerPoint presentation and recover the work he did that morning. The CEO was extremely happy, and the IT Manager was a ‘hero’ to the CEO!

Another very common scenario is users making edits to original files and then selecting “Save” instead of “Save As” and then the original files are now gone. As an example, a customer had a budget file in Excel and several people had accessed it throughout the day. At some point, someone had inadvertently made multiple changes to it for his department, including deleting sections that were not relevant to his department all the while thinking he was working in his own Save-As copy. Boy, were the other department heads upset! The way our IT Admin customer tells the story it sounded like a riot was about to erupt! Well, he swoops in just in time and recovers the earlier version in minutes and saves the day. We hear stories daily about Word document overwrites that IT Admins are able to recover the previous versions of in just a few minutes, saving users hours of having to recreate their work.

While the most popular functionality of Undelete is the ability to recover accidentally deleted files instantly with the click of a mouse, the Undelete Versioning feature is certainly the runner up, so we wanted to remind users, or prospective users, that it’s also here to save the day for you, too.

The Undelete Versioning feature will automatically save the previous versions of specific file types, including Microsoft Office files. The default is to save the last 5 versions, but this is settable.  Undelete then allows you to see what and when versions were saved and are then easily recoverable. A vital data protection feature to have.

If you already have Undelete Server installed on your file servers, check out the Versioning feature. If you have any of your own “hero” stories you would like to share, email custinfo@condusiv.com

If you don’t have Undelete Server or Undelete Pro yet, you can purchase them from your favorite online reseller or you can buy online from our store http://www.condusiv.com/purchase/Undelete/

 

Tags:

Data Protection | Data Recovery | File Recovery | General | Success Stories | Undelete

How to Improve Application Performance by Decreasing Disk Latency like an IT Engineer

by Spencer Allingham 13. June 2018 06:49

You might be responsible for a busy SQL server, for example, or a Web Server; perhaps a busy file and print server, the Finance Department's systems, documentation management, CRM, BI, or something else entirely.

Now, think about WHY these are the workloads that you care about the most?

 

Were YOU responsible for installing the application running the workload for your company? Is the workload being run business critical, or considered TOO BIG TO FAIL?

Or is it simply because users, or even worse, customers, complain about performance?

 

If the last question made you wince, because you know that YOU are responsible for some of the workloads running in your organisation that would benefit from additional performance, please read on. This article is just for you, even if you don't consider yourself a "Techie".

Before we get started, you should know that there are many variables that can affect the performance of the applications that you care about the most. The slowest, most restrictive of these is referred to as the "Bottleneck". Think of water being poured from a bottle. The water can only flow as fast as the neck of the bottle, the 'slowest' part of the bottle.

Don't worry though, in a computer the bottleneck will pretty much always fit into one of the following categories:

•           CPU

•           DISK

•           MEMORY

•           NETWORK

The good news is that if you're running Windows, it is usually very easy to find out which one the bottleneck is in, and here is how to do it (like an IT Engineer):

 •          Open Resource Monitor by clicking the Start menu, typing "resource monitor", and pressing Enter. Microsoft includes this as part of the Windows operating system and it is already installed.

 •          Do you see the graphs in the right-hand pane? When your computer is running at peak load, or users are complaining about performance, which of the graphs are 'maxing out'?

This is a great indicator of where your workload's bottleneck is to be found.         

 

SO, now you have identified the slowest part of your 'compute environment' (continue reading for more details), what can you do to improve it?

The traditional approach to solving computer performance issues has been to throw hardware at the solution. This could be treating yourself to a new laptop, or putting more RAM into your workstation, or on the more extreme end, buying new servers or expensive storage solutions.

BUT, how do you know when it is appropriate to spend money on new or additional hardware, and when it isn't. Well the answer is; 'when you can get the performance that you need', with the existing hardware infrastructure that you have already bought and paid for. You wouldn't replace your car, just because it needed a service, would you?

Let's take disk speed as an example.  Let’s take a look at the response time column in Resource Monitor. Make sure you open the monitor to full screen or large enough to see the data.  Then open the Disk Activity section so you can see the Response Time column.  Do it now on the computer you're using to read this. (You didn't close Resource Monitor yet, did you?) This is showing the Disk Response Time, or put another way, how long is the storage taking to read and write data? Of course, slower disk speed = slower performance, but what is considered good disk speed and bad?

To answer that question, I will refer to a great blog post by Scott Lowe, that you can read here...

https://www.techrepublic.com/blog/the-enterprise-cloud/use-resource-monitor-to-monitor-storage-performance/

In it, the author perfectly describes what to expect from faster and slower Disk Response Times:

"Response Time (ms). Disk response time in milliseconds. For this metric, a lower number is definitely better; in general, anything less than 10 ms is considered good performance. If you occasionally go beyond 10 ms, you should be okay, but if the system is consistently waiting more than 20 ms for response from the storage, then you may have a problem that needs attention, and it's likely that users will notice performance degradation. At 50 ms and greater, the problem is serious."

Hopefully when you checked on your computer, the Disk Response Time is below 20 milliseconds. BUT, what about those other workloads that you were thinking about earlier. What's the Disk Response Times on that busy SQL server, the CRM or BI platform, or those Windows servers that the users complain about?

If the Disk Response Times are often higher than 20 milliseconds, and you need to improve the performance, then it's choice time and there are basically two options:

           In my opinion as an IT Engineer, the most sensible option is to use storage workload reduction software like Diskeeper for physical Windows computers, or V-locity for virtualised Windows computers. These will reduce Disk Storage Times by allowing a good percentage of the data that your applications need to read, to come from a RAM cache, rather than slower disk storage. This works because RAM is much faster than the media in your disk storage. Best of all, the only thing you need to do to try it, is download a free copy of the 30 day trial. You don't even have to reboot the computer; just check and see if it is able to bring the Disk Response Times down for the workloads that you care about the most.

           If you have tried the Diskeeper or V-locity software, and you STILL need faster disk access, then, I'm afraid, it's time to start getting quotations for new hardware. It does make sense though, to take a couple of minutes to install Diskeeper or V-locity first, to see if this step can be avoided. The software solution to remove storage inefficiencies is typically a much more cost-effective solution than having to buy hardware!

Visit www.condusiv.com/try to download Diskeeper and V-locity now, for your free trial.

 

Windows is still Windows Whether in the Cloud, on Hyperconverged or All-flash

by Brian Morin 5. June 2018 04:43

Let me start by stating two facts – facts that I will substantiate if you continue to the end.

Fact #1 - Windows suffers from severe write inefficiencies that dampen overall performance. The holy grail question as to how severe is answered below.

Fact #2, Windows is still Windows whether running in the cloud, on hyperconverged systems, all-flash storage, or all three. Before you jump to the real-world examples below, let me first explain why.

No matter where you run Windows and no matter what kind of storage environment you run Windows on, Windows still penalizes optimal performance due to severe write inefficiencies in the hand-off of data to storage. Files are always broken down to be excessively smaller than they need to be. Since each piece means a dedicated I/O operation to process as a write or read, this means an enormous amount of noisy, unnecessary I/O traffic is chewing up precious IOPS, eroding throughput, and causing everything to run slower despite how many IOPS are at your disposal.

How much slower?

Now that the latest version of our I/O reduction software is being run across tens of thousands of servers and hundreds of thousands of PCs, we can empirically point out that no matter what kind of environment Windows is running on, there is always 30-40% of I/O traffic that is nothing but mere noise stealing resources and robbing optimal performance.

Yes, there are edge cases in which the inefficiency is as little as 10% but also other edge cases where the inefficiency is upwards of 70%. That being said, the median range is solidly in the 30-40% range and it has absolutely nothing to do with the backend media whether spindle, flash, hybrid, hyperconverged, cloud, or local storage.

Even if running Windows on an all-flash hyperconverged system, SAN or cloud environment with low latency and high IOPS, if the I/O profile isn’t addressed by our I/O reduction software to ensure large, clean, contiguous writes and reads, then 30-40% more IOPS will always be required for any given workload, which adds up to unnecessarily giving away 30-40% of the IOPS you paid for while slowing the completion of every job and query by the same amount.

So what’s going on here? Why is this happening and how?

First of all, the behavior of Windows when it comes to processing write and read input/output (I/O) operations is identical despite the storage backend whether local or network or media despite spindles or flash. This is because Windows only ever sees a virtual disk - the logical disk within the file system itself. The OS is abstracted from the physical layer entirely. Windows doesn’t know and doesn’t care if the underlying storage is a local disk or SSD, an array full of SSDs, hyperconverged, or cloud. In the mind of the OS, the logical disk IS the physical disk when, in fact, it’s just a reference architecture. In the case of enterprise storage, the underlying storage controllers manage where the data physically lives. However, no storage device can dictate to Windows how to write (and subsequently read) in the most efficient manner possible.

This is why many enterprise storage controllers have their own proprietary algorithms to “clean up” the mess Windows gives it by either buffering or coalescing files on a dedicated SSD or NVRAM tier or physically move pieces of the same file to line up sequentially, which does nothing for the first penalized write nor several penalized reads after as the algorithm first needs to identify a continued pattern before moving blocks. As much as storage controller optimization helps, it’s a far cry from an actual solution because it doesn’t solve the source of the larger root cause problem - even with backend storage controller optimizations, Windows will still make the underlying server to storage architecture execute many more I/O operations than are required to write and subsequently read a file, and every extra I/O required takes a measure of time in the same way that four partially loaded dump trucks will take longer to deliver the full load versus one fully loaded dump truck. It bears repeating - no storage device can dictate to Windows how to best write and read files for the healthiest I/O profile that delivers optimum performance because only Windows controls how files are written to the logical disk. And that singular action is what determines the I/O density (or lack of) from server to storage.

The reason this is occurring is because there are no APIs that exist between the Windows OS and underlying storage system whereby free space at the logical layer can be intelligently synced and consolidated with the physical layer without change block movement that would otherwise wear out SSDs and trigger copy-on-write activity that would blow up storage services like replication, thin provisioning, and more.

This means Windows has no choice but to choose the next available allocation at the logical disk layer within the file systems itself instead of choosing the BEST allocation to write and subsequently read a file.

The problem is that the next available allocation is only ever the right size on day 1 on a freshly formatted NTFS volume. But as time goes on and files are written and erased and re-written and extended and many temporary files are quickly created and erased, that means the next available space is never the right size. So, when Windows is trying to write a 1MB file but the next available allocation at the logical disk layer is 4K, it will fill that 4K, split the file, generate another I/O operation, look for the next available allocation, fill, split, and rinse and repeat until the file is fully written, and your I/O profile is cluttered with split I/Os. The result is an I/O degradation of excessively small writes and reads that penalizes performance with a “death by a thousand cuts” scenario.

It’s for this reason, over 2,500 small, midsized, and large enterprises have deployed our I/O reduction software to eliminate all that noisy I/O robbing performance by addressing the root cause problem. Since Condusiv software sits at the storage driver level, our purview is able to supply patented intelligence to the Windows OS, enabling it to choose the BEST allocation for any file instead of the next available, which is never the right size. This ensures the healthiest I/O profile possible for maximum storage performance on every write and read. Above and beyond that benefit, our DRAM read caching engine (the same engine OEM’d by 9 of the top 10 PC manufacturers), eliminates hot reads from traversing the full stack from storage by serving it straight from idle, available DRAM. Customers who add anywhere to 4GB-16GB of memory to key systems with a read bias to get more from that engine, will offload 50-80% of all reads from storage, saving even more precious storage IOPS while serving from DRAM which is 15X faster than SSD. Those who need the most performance possible or simply need to free up more storage IOPS will max our 128GB threshold and offload 90-99% of reads from storage.

Let’s look at some real-world examples from customers.

Here is VDI in AWS shared by Curt Hapner (CIO, Altenloh Brinck & Co.). 63% of read traffic is being offloaded from underlying storage and 33% of write I/O operations. He was getting sluggish VDI performance, so he bumped up memory slightly on all instances to get more power from our software and the sluggishness disappeared.

Here is an Epicor ERP with SQL backend in AWS from Altenloh Brinck & Co. 39% of reads are being eliminated along with 44% of writes to boost the performance and efficiency of their most mission critical system.

 

Here’s from one of the largest federal branches in Washington running Windows servers on an all-flash Nutanix. 45% of reads are being offloaded and 38% of write traffic.

 

Here is a spreadsheet compilation of different systems from one of the largest hospitality and event companies in Europe who run their workloads in Azure. The extraction of the dashboard data into the CSV shows not just the percentage of read and write traffic offloaded from storage but how much I/O capacity our software is handing back to their Azure instances.

 

To illustrate we use the software here at Condusiv on our own systems, this dashboard screenshot is from our own Chief Architect (Rick Cadruvi), who uses Diskeeper on his SSD-powered PC. You can see him share his own production data in the recent “live demo” webinar on V-locity 7.0 - https://youtu.be/Zn2QGxBHUzs

As you can see, 50% of reads are offloaded from his local SSD while 42% of writes operations have been saved by displacing small, fractured files with large, clean contiguous files. Not only is that extending the life of his SSD by reducing write amplification, but he has saved over 6 days of I/O time in the last month.

 

Finally, regarding all-flash SAN storage systems, the full data is in this case study with the University of Illinois who used Condusiv I/O reduction software to more than double the performance of SQL and Oracle sitting on their all-flash arrays: http://learn.condusiv.com/rs/246-QKS-770/images/CS_University-Illinois.pdf?utm_campaign=CS_UnivIll_Case_Study

For a free trial, visit http://learn.condusiv.com/Try-V-locity.html. For best results, bump up memory on key systems if you can and make sure to install the software on all the VMs on the same host. If you have more than 10 VMs, you may want to Contact Us for SE assistance in spinning up our centralized management console to push everything at once – a 20-min exercise and no reboot required.

Please visit www.condusiv.com/v-locity for more than 20 case studies on how our I/O reduction software doubled the performance of mission critical applications like MS-SQL for customers of various environments.

The Inside Story of Condusiv’s “No Reboot” Quest

by Rick Cadruvi, Chief Architect 17. April 2018 04:57

In a world of 24/7 uptime and rare reboot windows, one of our biggest challenges as a company has simply been getting our own customers upgraded to the latest version of our I/O reduction software.

In the last year, we have done dashboard review sessions with a substantial number of customers to demonstrate the power of our latest versions to hybrid and all-flash arrays, hyperconverged systems, Azure/AWS, local SSDs, and more. However, many remain undone simply because customers can’t find the time for reboot windows to upgrade to the latest versions with the most powerful engines and new benefits dashboard. This has been particularly challenging for customers with hundreds to thousands of servers.

Even though we own the trademark term, “Set It and Forget It®,” there was always one aspect that wasn’t, and that’s the fact that it required a reboot to install or upgrade.

Herein lies the problem – important components of our software sit at the storage driver level. At least to the best of our knowledge, all other software vendors who sit at that layer also require a reboot to install or upgrade. So, consider our engineering challenge to take on a project most people wouldn’t know was even solvable.

Let’s start with an explanation as to why this barrier existed. Our software contains several filter drivers that allow us to implement leading edge performance enhancing technologies.  Some of them act at the Windows File System level. Windows has long provided a Filter Manager that allows developers to create File System and Network filter drivers that can be loaded and unloaded without requiring a reboot.  You will quickly recognize that Anti-Malware and Data Backup/Recovery software tends to be the principle targets for this Filter Manager. There are also products such as data encryption that benefit from the Windows Filter Manager. And, as it turns out, we benefit because some of our filter drivers run above the File System.

However, sometimes a software product needs to be closer to the physical hardware itself. This allows a much broader view of what is going on with the actual I/O to the physical device subsystem. There are quite a few software products that need this bigger view. It turns out that we do also.  One of the reasons, is to allow our patented IntelliMemory® caching software to eliminate a huge amount of noisy I/O that creates substantial, yet preventable, bottlenecks to your application. This is I/O that your application wouldn’t even recognize as problematic to its performance, nor would you. Because we have a global view, we can eliminate a large percentage of I/Os from having to go to storage, while using very limited system resources. We also have other technologies that benefit from our telemetry disk filter that helps us see a more global picture of storage performance and what is actually causing bottlenecks. This allows us to focus our efforts on the true causes of those bottlenecks, giving our customers the greatest bang for their buck.  Because we collect excellent empirical data about what is causing the bottlenecks, we can apply very limited and targeted system resources to deliver very significant storage performance increases. Keep in mind, the limited CPU cycles we use operate at lowest priority and we only use resources that are otherwise idle, so the benefits of our engines are completely non-intrusive to overall server performance.

Why does the above matter? Well, the Microsoft Filter Manager doesn’t provide support for most driver stacks and this includes the parts of the storage driver stack below the File System. That means that our disk filter drivers couldn’t actually start providing their benefits upon initial install until after a reboot. If we add new functionality to provide even greater storage performance via a change to one of our disk filter drivers, a reboot was required after an update before the new functionality could be brought to bear.

Until now we just lived with the restrictions. We didn’t live with it because we couldn’t create a solution, but because we anticipated that the frequency of Windows updates, especially security-based updates, would start to increase the frequency of server reboot requirements and the problem would, for all intents and purposes, become manageable. Alas, our hopes and dreams in this area failed to materialize. 

We’ve been doing Windows system and especially kernel software development for decades. I just attended Plugfest 30 for file system filter driver developers.  This is a Microsoft event to ensure high-quality standards for products with filter drivers like ours. We were also at the first Plugfest nearly two decades ago. In addition, we also wrote the Windows NTFS file system component to allow safe, live file defragmentation for Windows NT dating back to the Windows NT 3.51 release.  That by itself is an interesting story, but I’ll leave that for another time.

Anyway, we finally realized that our crystal ball prediction about an increase in the frequency of Windows Server reboots due to Windows Update cycles (patch Tuesday?) was a little less clear than we had hoped. Accepting that this problem wasn’t going away, we set out to create our own Filter Manager to provide a mechanism that allowed filter drivers on stacks not supported by the Microsoft Filter Manager to be inserted and removed without the reboot requirement. This was something we’ve been considering, talked about with other software vendors in a similar situation, and even prototyped before. The time had finally come where we needed to facilitate our customers in getting the significant increased performance from our software immediately instead of waiting for reboot opportunities.

We took our decades of experience and knowledge of Windows Operating System internals and experience developing Kernel software and aimed it at giving our customers the relief from this limitation. The result is in our latest release of V-locity® 7.0, Diskeeper® 18, and SSDkeeper™ 2.0. 

We’d love to hear your stories about how this revolutionary enablement technology has made a difference for you and your organization.

Tags:

Diskeeper | V-Locity

Condusiv Smashes the I/O Performance Gap with New V-locity 7.0, Diskeeper 18, and SSDkeeper 2.0

by Brian Morin 6. April 2018 08:37

Condusiv is pleased to announce the release of V-locity® 7.0, Diskeeper® 18, and SSDkeeper 2.0 that smash the I/O Performance Gap on Windows servers and PCs as growing volumes of data continue to outpace the ability of underlying server and storage hardware to meet performance SLAs on mission critical workloads like MS-SQL.

The new 2018 editions of V-locity, Diskeeper, and SSDkeeper come with “no reboot” capabilities and enhanced reporting that offers a single pane view of all systems to show the exact benefit of I/O reduction software to each system in terms of number of noisy I/Os eliminated, percentage of read and write traffic offloaded from storage, and, most importantly, how much time is saved on each system as a result. It is also now easier than ever to quickly identify systems underperforming from a caching standpoint that could use more memory.

When a minimum of 30-40% I/O traffic from any Windows server is completely unnecessary, nothing but mere noise chewing up IOPS and throughput, it needs to be easy to see the exact levels of inefficiency on individual systems and what it means in terms of I/O reduction and “time saved” when Condusiv software is deployed to eliminate those inefficiencies. Since many customers choose to add a little more memory on key systems like MS-SQL to get even more from the software, it is now clearly evident what 50% or more reduction in I/O traffic actually means.

Our recent 4th annual I/O Performance Survey (no Condusiv customers included) found that MS-SQL performance problems are at their worst level in 4 years despite heavy investments in hardware infrastructure. 28% of mid-sized and large enterprises receive regular complaints from users regarding sluggish SQL-based applications. This is simply due to the growth of I/O outpacing the hardware stack’s ability to keep up. This is why it is more important than ever to consider I/O reduction software solutions that guarantee to solve performance issues instead of reactively throwing expensive new servers or storage at the problem.

Not only are the latest versions of V-locity, Diskeeper, and SSDkeeper easier to deploy and manage with “no reboot” capabilities, but reporting has been enhanced to enable administrators to quickly see the full value being provided to each system along with memory tuning recommendations for even more benefit.

A single pane view lists out all systems with associated workload data, memory data, and benefit data from I/O reduction software and lists systems as red, yellow, and green according to caching effectiveness to help administers quickly identify and prioritize systems that could use a little more memory to achieve a 50% or more reduction in I/O to storage.

Regarding the new “no reboot” capabilities, this is something that the engineering team has been attempting to crack for some time.  Per Rick Cadruvi, SVP, Engineering, “All storage filter drivers require a reboot, which is problematic for admins who manage software across thousands of servers. However, due to our extensive knowledge of Windows Kernel internals dating back to Windows NT 3.51, we were able to find a way to properly synchronize and handle the load/unload sequences of our driver transparently to other drivers in the storage stack so as not to require a reboot when deploying or updating Condusiv software.” 

Fore more on Condusiv’s quest for “no reboot” capabilities, see this blog by Rick Cadruvi, SVP Engineering: The Inside Story of Condusiv's No Reboot Quest 

This means that customers who are currently on V-locity 5.3 and higher or Diskeeper 15 and higher are able to upgrade to the latest version without a reboot. Customers on older versions will have to uninstall, reboot, then install the new version.

"As much as Condusiv I/O reduction software has been a real benefit to our applications running across 2,500+ Windows servers, we are happy to see a no reboot version of the software released so it is now truly "Set It & Forget It"®. My team is happy they no longer have to wrestle down a reboot window for hundreds of servers in order to update or deploy Condusiv software," said Blake W. Smith, MSME, System Director, Enterprise Infrastructure, CHRISTUS Health.

 

RecentComments

Comment RSS

Month List

Calendar

<<  December 2018  >>
MoTuWeThFrSaSu
262728293012
3456789
10111213141516
17181920212223
24252627282930
31123456

View posts in large calendar