Condusiv Technologies Blog

Condusiv Technologies Blog

Blogging @Condusiv

The Condusiv blog shares insight into the issues surrounding system and application performance—and how I/O optimization software is breaking new ground in solving those issues.

The Inside Story of Condusiv’s “No Reboot” Quest

by Rick Cadruvi, Chief Architect 17. April 2018 04:57

In a world of 24/7 uptime and rare reboot windows, one of our biggest challenges as a company has simply been getting our own customers upgraded to the latest version of our I/O reduction software.

In the last year, we have done dashboard review sessions with a substantial number of customers to demonstrate the power of our latest versions to hybrid and all-flash arrays, hyperconverged systems, Azure/AWS, local SSDs, and more. However, many remain undone simply because customers can’t find the time for reboot windows to upgrade to the latest versions with the most powerful engines and new benefits dashboard. This has been particularly challenging for customers with hundreds to thousands of servers.

Even though we own the trademark term, “Set It and Forget It®,” there was always one aspect that wasn’t, and that’s the fact that it required a reboot to install or upgrade.

Herein lies the problem – important components of our software sit at the storage driver level. At least to the best of our knowledge, all other software vendors who sit at that layer also require a reboot to install or upgrade. So, consider our engineering challenge to take on a project most people wouldn’t know was even solvable.

Let’s start with an explanation as to why this barrier existed. Our software contains several filter drivers that allow us to implement leading edge performance enhancing technologies.  Some of them act at the Windows File System level. Windows has long provided a Filter Manager that allows developers to create File System and Network filter drivers that can be loaded and unloaded without requiring a reboot.  You will quickly recognize that Anti-Malware and Data Backup/Recovery software tends to be the principle targets for this Filter Manager. There are also products such as data encryption that benefit from the Windows Filter Manager. And, as it turns out, we benefit because some of our filter drivers run above the File System.

However, sometimes a software product needs to be closer to the physical hardware itself. This allows a much broader view of what is going on with the actual I/O to the physical device subsystem. There are quite a few software products that need this bigger view. It turns out that we do also.  One of the reasons, is to allow our patented IntelliMemory® caching software to eliminate a huge amount of noisy I/O that creates substantial, yet preventable, bottlenecks to your application. This is I/O that your application wouldn’t even recognize as problematic to its performance, nor would you. Because we have a global view, we can eliminate a large percentage of I/Os from having to go to storage, while using very limited system resources. We also have other technologies that benefit from our telemetry disk filter that helps us see a more global picture of storage performance and what is actually causing bottlenecks. This allows us to focus our efforts on the true causes of those bottlenecks, giving our customers the greatest bang for their buck.  Because we collect excellent empirical data about what is causing the bottlenecks, we can apply very limited and targeted system resources to deliver very significant storage performance increases. Keep in mind, the limited CPU cycles we use operate at lowest priority and we only use resources that are otherwise idle, so the benefits of our engines are completely non-intrusive to overall server performance.

Why does the above matter? Well, the Microsoft Filter Manager doesn’t provide support for most driver stacks and this includes the parts of the storage driver stack below the File System. That means that our disk filter drivers couldn’t actually start providing their benefits upon initial install until after a reboot. If we add new functionality to provide even greater storage performance via a change to one of our disk filter drivers, a reboot was required after an update before the new functionality could be brought to bear.

Until now we just lived with the restrictions. We didn’t live with it because we couldn’t create a solution, but because we anticipated that the frequency of Windows updates, especially security-based updates, would start to increase the frequency of server reboot requirements and the problem would, for all intents and purposes, become manageable. Alas, our hopes and dreams in this area failed to materialize. 

We’ve been doing Windows system and especially kernel software development for decades. I just attended Plugfest 30 for file system filter driver developers.  This is a Microsoft event to ensure high-quality standards for products with filter drivers like ours. We were also at the first Plugfest nearly two decades ago. In addition, we also wrote the Windows NTFS file system component to allow safe, live file defragmentation for Windows NT dating back to the Windows NT 3.51 release.  That by itself is an interesting story, but I’ll leave that for another time.

Anyway, we finally realized that our crystal ball prediction about an increase in the frequency of Windows Server reboots due to Windows Update cycles (patch Tuesday?) was a little less clear than we had hoped. Accepting that this problem wasn’t going away, we set out to create our own Filter Manager to provide a mechanism that allowed filter drivers on stacks not supported by the Microsoft Filter Manager to be inserted and removed without the reboot requirement. This was something we’ve been considering, talked about with other software vendors in a similar situation, and even prototyped before. The time had finally come where we needed to facilitate our customers in getting the significant increased performance from our software immediately instead of waiting for reboot opportunities.

We took our decades of experience and knowledge of Windows Operating System internals and experience developing Kernel software and aimed it at giving our customers the relief from this limitation. The result is in our latest release of V-locity® 7.0, Diskeeper® 18, and SSDkeeper™ 2.0. 

We’d love to hear your stories about how this revolutionary enablement technology has made a difference for you and your organization.

Tags:

Diskeeper | V-Locity

Comments (7) -

4/21/2018 4:54:57 AM #

Interesting story, thanks. Any expected release dates?

Alex Greece

5/11/2018 4:49:16 AM #

Finally, an explanation I'm able to understand (mostly) and the logic behind the improvements. Thank you Condusiv!!  

Bruce McGarvey Canada

5/11/2018 5:27:11 AM #

Thanks for the great detail Rick. We certainly look forward to the no-reboot technology. We do keep our clients' systems full updated with the latest version of their Condusiv software. But we also patch our clients' systems at least twice per month if not more frequently. And yes that requires a reboot. I've been in the industry 25 years, and rebooting servers should not be a problem. Super highly available systems should be configured in cluster-mode allowing for one node to be rebooted at a time.

Scary as heck to note that most of your enterprise customers are not rebooting. If they are not rebooting, then that means they are also not installing Windows patches. Very very scary. Sounds like Equifax all over again.

Felicia United States

5/11/2018 7:07:40 AM #

I'll be certain to share this with our technology students!

Professor Jennings United States

5/13/2018 11:58:02 AM #

Hi!

Interesting Update. However, can anything be expected on the SSD caching side, read and write caching using a large dedicated cache partition, such as on an Intel 3D Xpoint P900 Optane card?

Axel Mertes Germany

5/23/2018 6:07:17 AM #

Hi Felicia,

Thank you for taking the time to comment and happy to hear your clients are so well taken care of.  

Kellie

Kellie United States

5/23/2018 6:08:06 AM #

Hi Axel,

You are correct that the SSD caching will already provide you with performance gains, but IntelliMemory will still provide further benefits as the DRAM caching will provide a faster response time than the SSD caching.

Kellie

Kellie United States

Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading

RecentComments

Comment RSS

Month List

Calendar

<<  December 2018  >>
MoTuWeThFrSaSu
262728293012
3456789
10111213141516
17181920212223
24252627282930
31123456

View posts in large calendar