Condusiv Technologies Blog

Condusiv Technologies Blog

Blogging @Condusiv

The Condusiv blog shares insight into the issues surrounding system and application performance—and how I/O optimization software is breaking new ground in solving those issues.

The Inside Story of Condusiv’s “No Reboot” Quest

by Rick Cadruvi, Chief Architect 17. April 2018 04:57

In a world of 24/7 uptime and rare reboot windows, one of our biggest challenges as a company has simply been getting our own customers upgraded to the latest version of our I/O reduction software.

In the last year, we have done dashboard review sessions with a substantial number of customers to demonstrate the power of our latest versions to hybrid and all-flash arrays, hyperconverged systems, Azure/AWS, local SSDs, and more. However, many remain undone simply because customers can’t find the time for reboot windows to upgrade to the latest versions with the most powerful engines and new benefits dashboard. This has been particularly challenging for customers with hundreds to thousands of servers.

Even though we own the trademark term, “Set It and Forget It®,” there was always one aspect that wasn’t, and that’s the fact that it required a reboot to install or upgrade.

Herein lies the problem – important components of our software sit at the storage driver level. At least to the best of our knowledge, all other software vendors who sit at that layer also require a reboot to install or upgrade. So, consider our engineering challenge to take on a project most people wouldn’t know was even solvable.

Let’s start with an explanation as to why this barrier existed. Our software contains several filter drivers that allow us to implement leading edge performance enhancing technologies.  Some of them act at the Windows File System level. Windows has long provided a Filter Manager that allows developers to create File System and Network filter drivers that can be loaded and unloaded without requiring a reboot.  You will quickly recognize that Anti-Malware and Data Backup/Recovery software tends to be the principle targets for this Filter Manager. There are also products such as data encryption that benefit from the Windows Filter Manager. And, as it turns out, we benefit because some of our filter drivers run above the File System.

However, sometimes a software product needs to be closer to the physical hardware itself. This allows a much broader view of what is going on with the actual I/O to the physical device subsystem. There are quite a few software products that need this bigger view. It turns out that we do also.  One of the reasons, is to allow our patented IntelliMemory® caching software to eliminate a huge amount of noisy I/O that creates substantial, yet preventable, bottlenecks to your application. This is I/O that your application wouldn’t even recognize as problematic to its performance, nor would you. Because we have a global view, we can eliminate a large percentage of I/Os from having to go to storage, while using very limited system resources. We also have other technologies that benefit from our telemetry disk filter that helps us see a more global picture of storage performance and what is actually causing bottlenecks. This allows us to focus our efforts on the true causes of those bottlenecks, giving our customers the greatest bang for their buck.  Because we collect excellent empirical data about what is causing the bottlenecks, we can apply very limited and targeted system resources to deliver very significant storage performance increases. Keep in mind, the limited CPU cycles we use operate at lowest priority and we only use resources that are otherwise idle, so the benefits of our engines are completely non-intrusive to overall server performance.

Why does the above matter? Well, the Microsoft Filter Manager doesn’t provide support for most driver stacks and this includes the parts of the storage driver stack below the File System. That means that our disk filter drivers couldn’t actually start providing their benefits upon initial install until after a reboot. If we add new functionality to provide even greater storage performance via a change to one of our disk filter drivers, a reboot was required after an update before the new functionality could be brought to bear.

Until now we just lived with the restrictions. We didn’t live with it because we couldn’t create a solution, but because we anticipated that the frequency of Windows updates, especially security-based updates, would start to increase the frequency of server reboot requirements and the problem would, for all intents and purposes, become manageable. Alas, our hopes and dreams in this area failed to materialize. 

We’ve been doing Windows system and especially kernel software development for decades. I just attended Plugfest 30 for file system filter driver developers.  This is a Microsoft event to ensure high-quality standards for products with filter drivers like ours. We were also at the first Plugfest nearly two decades ago. In addition, we also wrote the Windows NTFS file system component to allow safe, live file defragmentation for Windows NT dating back to the Windows NT 3.51 release.  That by itself is an interesting story, but I’ll leave that for another time.

Anyway, we finally realized that our crystal ball prediction about an increase in the frequency of Windows Server reboots due to Windows Update cycles (patch Tuesday?) was a little less clear than we had hoped. Accepting that this problem wasn’t going away, we set out to create our own Filter Manager to provide a mechanism that allowed filter drivers on stacks not supported by the Microsoft Filter Manager to be inserted and removed without the reboot requirement. This was something we’ve been considering, talked about with other software vendors in a similar situation, and even prototyped before. The time had finally come where we needed to facilitate our customers in getting the significant increased performance from our software immediately instead of waiting for reboot opportunities.

We took our decades of experience and knowledge of Windows Operating System internals and experience developing Kernel software and aimed it at giving our customers the relief from this limitation. The result is in our latest release of V-locity® 7.0, Diskeeper® 18, and SSDkeeper™ 2.0. 

We’d love to hear your stories about how this revolutionary enablement technology has made a difference for you and your organization.

Tags:

Diskeeper | V-Locity

Condusiv Addresses Concerns about the Intel CPU Security Flaw

by Rick Cadruvi, Chief Architect 8. January 2018 11:08

Since the news broke on the Intel CPU security flaw, we have fielded customer concerns about the potential impact to our software and worries of increased contention for CPU cycles if less CPU power1 is available after the patches issued by affected vendors.

Let us first simply state there is no overhead concern regarding Condusiv software related to software contention for fewer CPU cycles post-patch. If any user has concerns about CPU overhead from Condusiv software, then it is important that we communicate how Condusiv software is architected to use CPU resources in the first place (explained below) such that it is not an issue.

Google reported this issue to Intel, AMD and ARM on Jun 1, 2017.  The issue became public Jan 3, 2018.  The issue affects most Intel CPUs dating back to 1995.  Most OSes released a patch this week to mitigate the risk.  Also, firmware updates are expected soon.  A report about this flaw from the Google Project Zero team can be found at:

https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Before discussing the basic vulnerabilities and any impact the security flaw or patch has or doesn’t have on V-locity®/SSDkeeper®/Diskeeper® (or how our software actually proves beneficial), let me first address the performance hits anticipated due to patches in OSes and/or firmware.  While the details are being tightly held, probably to keep hackers from being able to exploit the vulnerabilities, the consensus seems to be that the fixes in the OSes (Windows Linux, etc.) will have a potential of reducing performance by 5%-30%1.  A lot of this depends on how the system is being used.  For most dedicated systems/applications the consensus appears to be that the affect will be negligible.  That is likely due to the fact that most of those systems already have excess compute capability, so the user just won’t experience a noticeable slowdown.  Also, they aren’t doing lots of things concurrently.

The real issue comes up for servers where there are multiple accessors/applications hitting on the server.  They will experience the greatest degradation as they will likely have the most number of overlapping data access.  The following article indicates that the biggest performance hits will happen on reads from storage and on SSDs. 

https://www.pcworld.com/article/3245606/security/intel-x86-cpu-kernel-bug-faq-how-it-affects-pc-mac.html

So what about V-locity/Diskeeper/SSDkeeper?

As previously mentioned, we can state that there is not increased CPU contention or negative overhead impact by Condusiv software. Condusiv background operations run at low priority, which means only otherwise idle and available CPU cycles are used. This means that despite whatever CPU horsepower is available (a lot or little), Condusiv software is unobtrusive on server performance because its patented Invisitasking® technology only uses free CPU cycles. If computing is ever completely bound by other tasks, Condusiv software sits idle so there is NO negative intrusion or impact on server resources. The same can be said about our patented DRAM caching engine (IntelliMemory®) as it only uses memory that is otherwise idle and not being used – zero contention for resources.

However, if storage reads slow down due to the fix (per the PC World article), our software will certainly overcome a significant amount of the lost performance since eliminating I/O traffic in the first place is what our software is designed to do. Telemetry data across thousands of systems demonstrates our software eliminates 30-40% of noisy and completely unnecessary I/O traffic on typically configured systems2. In fact, those who add just a little more DRAM on important systems to better leverage the tier-0 caching strategy, see a 50% or more reduction, which pushes them into the 2X faster gains and higher.

Those organizations who are concerned about loss of read performance from their SSDs due to the chip fixes and patches need only do one thing to mitigate that concern – simply allocate more DRAM to important systems. Our software will pick up the additional available memory to enhance your tier-0 caching layer. For every 2GB of memory added, we typically see a conservative 25% of read traffic offloaded from storage. That figure is often times 35-40% and even as high as 50% depending on the workload. Since our behavioral analytics engine sits at the application layer, we are able to cache very effectively with very little cache churn. And since DRAM is 15X faster than SSD, that means only a small amount of capacity can drive very big gains. Simply monitor the in-product dashboard to watch your cache hit rates rise with additional capacity.

Regarding the vulnerabilities themselves, for a very long time, memory in the CPU chip itself has been a LOT faster than system memory (DRAM).  As a result, chip manufacturers have done several things to help take advantage of CPU speed increases.  For the purpose of this paper, I will be discussing the following 2 approaches that were used to improve performance:

1. Speculative execution
2. CPU memory cache

These mechanisms to improve performance opened up the security vulnerabilities being labeled “Spectre” and “Meltdown”.

Speculative execution is a technology whereby the CPU prefetches machine instructions and data from system memory (typically DRAM) for the purpose of predicting likely code execution paths.  The CPU can pre-execute various potential code paths.  This means that by the time the actual code execution path is determined, it has often already been executed.  Think of this like coming to a fork in the road as you are driving.  What if your car had already gone down both possible directions before you even made a decision as to which one you wanted to take?  By the time you decided which path to take, your car would have already been significantly further on the way regardless of which path you ultimately chose. 

Of course, in our world, we can’t be at two places at one time, so that can’t happen.  However, a modern CPU chip has lots of unused execution cycles due to synchronization, fetching data from DRAM, etc.  These “wait states” present an opportunity to do other things. One thing is to figure out likely code that could be executed and pre-execute it.  Even if that code path wasn’t ultimately taken, all that happened is that execution cycles that would otherwise have been wasted, just tried those paths even though they didn’t need to be tried.  And with modern chips, they can execute lots of these speculative code paths. 

Worst case – No harm No foul, right?  Not quite.  Because the code, and more importantly the DRAM data needed for that code, got fetched it is in the CPU and potentially available to software.  And, the data from DRAM got fetched without checking if it was legal for this program to read it.  If the guess was correct, your system increased performance a LOT!  BUT, since memory (that may not have had legal access based on memory protection) was pre-fetched, a very clever program could take advantage of this.  Google was able to create a proof of concept for this flaw.  This is the “Spectre” case.

Before you panic about getting hacked, realize that to effectively find really useful information would require extreme knowledge of the CPU chip and the data in memory you would be interested in.  Google estimates that an attack on a system with 64GB of RAM would require a software setup cycle of 10-30 minutes.  Essentially, a hacker may be able to read around 1,500 bytes per second – a very small amount of memory.  The hacker would have to attack specific applications for this to be effective. 

As the number of transistors in a chip grew dramatically, it became possible to create VERY large memory caches on the CPU itself.  Referencing memory from the CPU cache is MUCH faster than accessing the data from the main system memory (DRAM).  The “Meltdown” flaw proof of concept was able to access this data directly without requiring the software to have elevated privileges. 

Again, before getting too excited, it is important to think through what memory is in the CPU cache.  To start with, current chips typically max out around 8MB of cache on chip.  Depending on the type of cache, this is essentially actively used memory.  This is NOT just large swaths of DRAM.  Of course, the exploit fools the chip caching algorithms to think that the memory the attack wants to read is being actively used.  According to Google, it takes more than 100 CPU cycles to cause un-cached data to become cached.  And that is in CPU word size chunks – typically 8 bytes.

So what about V-locity/Diskeeper/SSDkeeper?

Our software runs such that we are no more or less vulnerable than any other application/software component.  Data in the NTFS File System Cache and in SQL Server’s cache are just as vulnerable to being read as data in our IntelliMemory cache.  The same holds true for Oracle or any other software that caches data in DRAM.  And, your typical anti-virus has to analyze file data, so it too may have data in memory that could be read from various data files. However, as the chip flaws are fixed, our I/O reduction software provides the advantage of making up for lost performance, and more.


1https://www.theregister.co.uk/2018/01/02/intel_cpu_design_flaw/
2 http://blog.condusiv.com/post/2017/10/25/New-Dashboard-Finally-Answers-the-Big-Question.aspx

Tags:

The Revolution of Our Technology

by Rick Cadruvi, Chief Architect 18. October 2017 12:38

I chose to use the word “Revolution” instead of “Evolution” because, with all due modesty, our patented technology has been more a series of leaps to stay ahead of performance-crushing bottlenecks. After all, our company purpose as stated by our Founder, Craig Jensen, is:

“The purpose of our company is to provide computer technology that enormously increases

the production and income of an area.”

We have always been about improving your production. We know your systems are not about having really cool hardware but rather about maximizing your organization’s production. Our passion has been about eliminating the stops, slows and stalls to your application performance and instead, to jack up that performance and give you headroom for expansion. Now, most of you know us by our reputation for Diskeeper®. What you probably don’t know about us is our leadership in system performance software.

We’ve been at this for 35 years with a laser focus. As an example, for years hard drives were the common storage technology and they were slow and limited in size, so we invented numerous File System Optimization technologies such as Defragmentation, I-FAAST®1 and Directory Consolidation to remove the barriers to getting at data quickly. As drive sizes grew, we added new technologies and jettisoned those that no longer gave bang for the buck. Technologies like InvisiTasking® were invented to help maximize overall system performance, while removing bottlenecks.

As SSDs began to emerge, we worked with several OEMs to take advantage of SSDs to dramatically reduce data access times as well as reducing the time it took to boot systems and resume from hibernate. We created technologies to improve SSD longevity and even worked with manufacturers on hybrid drives, providing hinting information, so their drive performance and endurance would be world class.

As storage arrays were emerging we created technologies to allow them to better utilize storage resources and pre-stage space for future use. We also created technologies targeting performance issues related to file system inefficiencies without negatively affecting storage array technologies like snapshots.

When virtualization was emerging, we could see the coming VM resource contention issues that would materialize. We used that insight to create file system optimization technologies to deal with those issues before anyone coined the phrase “I/O Blender Effect”.

We have been doing caching for a very long time2. We have always targeted removal of the I/Os that get in your applications path to data along with satisfying the data from cache that delivers performance improvements of 50-300% or more. Our goal was not caching your application specific data, but rather to make sure your application could access its data much faster. That’s why our unique caching technology has been used by leading OEMs.

Our RAM-based caching solutions include dynamic memory allocation schemes to use resources that would otherwise be idle to maximize overall system performance. When you need those resources, we give them back. When they are idle, we make use of them without your having to adjust anything for the best achievable performance. “Set It and Forget It®” is our trademark for good reason.

We know that staying ahead of the problems you face now, with a clear understanding of what will limit your production in 3 to 5 years, is the best way we can realize our company purpose and help you maximize your production and thus your profitability. We take seriously having a clear vision of where your problems are now and where they will be in the future. As new hardware and software technologies roll out, we will be there removing the new barriers to your performance then, just as we do now.

1. I-FAAST stands for Intelligent File Access Acceleration Sequencing Technology, a technology designed to take advantage of different performing regions on storage to allow your hottest data to be retrieved in the fastest time.

2. If I can personally brag, I’ve created numerous caching solutions over a period of 40 years.

Overview of How We Derive Storage I/O Time Saved

by Rick Cadruvi, Chief Architect 11. January 2017 01:00

The latest versions of V-locity® (for virtual servers) and Diskeeper® (for physical servers and PCs) both contain built-in dashboards that show the exact benefit of the product to any one system or group of systems by showing how much and what percentage of read/write traffic is offloaded from storage and how much “I/O Time” that saves.

To understand the computation on “I/O Time Saved,” in its simplest form, the formula is essentially:

       Storage I/O Time Saved = Total I/Os Eliminated * Average I/O Response Time

In essence, if you take Total I/Os Eliminated from the dashboard Benefits screen and multiply it times the average latency from the I/O Performance dashboard screen, you will generally end up in the ballpark of the “I/O Time Saved.”

I/O counts and I/O times are accumulated on a per I/O basis. Every I/O that goes to storage is timed using Windows High Performance Counters for accuracy.  That timing is from when the I/O is sent down the stack until it comes back up. In essence we time I/O response time (IORT) or latency that the application sees, not the storage device.  We also track reads and writes separately as they impact the storage “I/O Time Saved” differently.

The data is accumulated and calculated during periods of time rather than across the entire reporting period. In the long term, that period of time ends up being hourly. Very active I/O periods will have longer IORTs and therefore the amount of I/O storage time saved per I/O eliminated will likely be greater than during relatively light periods. 

If there is a high queue depth, the IORT we time will be larger than the per I/O storage IORT.  We look at the effective IORT the application would see rather than the time the underlying storage takes to process any single I/O.  After all, the user only cares about how long the application took to process an I/O he/she requested, not how long a HDD or SSD took for any single I/O when it got around to processing it.

Let’s talk for a moment about storage “I/O Time Saved” versus clock time because they are not the same and our technologies can, in some cases, save far more storage I/O time than clock time.

If all storage I/O was sequential for the entire instance of the operating system, then the maximum amount of storage “I/O Time Saved” would be the amount of time since installation, and you would expect it to be considerably less as we are unlikely to eliminate ALL I/Os. And you might expect some idle time. Of course, applications do not do pure sequential I/O.  Modern applications are almost always multi-threaded and most computer systems are running multiple applications or instances of them at the same time.  Also, other operations are happening on the system outside of the primary application.  Think of Outlook running in the background while you do some other work on your system. Outlook is constantly receiving updated data.  Windows is also processing lots of I/Os in the background just for it to be able to continue operations.  These I/Os happen in parallel to any I/Os that users may be doing with an application.

In general, there are lots of I/Os that are being processed at the same time.  You would not want to work on a computer system where only a single I/O was being processed at any one point in time as it would be VERY slow.  If the average queue depth would have been 5 without us but 2 with us, that means every time 2 I/Os go through to storage, we would have eliminated 3 I/Os.  The end result would be a storage “I/O Time Saved” of somewhere between 1.5-3x clock time, depending on how the underlying storage processed the I/Os. 

Another factor that contributes to the possibility of storage “I/O Time Saved” exceeding of clock time is the reduction of split I/Os.  Let’s say that without our product all I/Os actually end up being split into 3 I/Os due to Windows writing files in an excessively small, fragmented manner.  After installing our product, by displacing small, tiny writes with large, contiguous writes, each of those I/Os that had to be split into 3 are now being completed as a single I/O.  If that was the normal case, the storage “I/O Time Saved” for each I/O would be roughly 2x the actual storage I/O time due to prevention of fragmentation.

Month List

Calendar

<<  April 2018  >>
MoTuWeThFrSaSu
2627282930311
2345678
9101112131415
16171819202122
23242526272829
30123456

View posts in large calendar