Condusiv Technologies Blog

Condusiv Technologies Blog

Blogging @Condusiv

The Condusiv blog shares insight into the issues surrounding system and application performance—and how I/O optimization software is breaking new ground in solving those issues.

Do I Really Need V-locity on All VMs?

by Rick Cadruvi, Chief Architect 15. August 2019 04:12

V-locity® customers may wonder, “How many VMs do I need to install V-locity on for optimal results? What kind of benefits will I see with V-locity on one or two VMs versus all the VMs on a host?” 

As a refresher…

It is true that V-locity will likely provide significant benefit on that one VM.  It may even be extraordinary.  But loading V-locity on just one VM on a host with sometimes dozens of VMs won’t give you the biggest bang for your buck. V-locity includes many technologies that address storage performance issues in an extremely intelligent manner.  Part of the underlying design is to learn about the specific loads your system has and intelligently adapt to each specific environment presented to it.   That’s why we created V-locity especially for virtual environments in the first place. 

As you have experienced, the beauty of V-locity is its ability to deal with the I/O Blender Effect.  When there are multiple VMs on a host, or multiple hosts with VMs that use the same back-end storage system (e.g., a SAN) a “blender” effect occurs when all these VMs are sending I/O requests up and down the stack.  As you can guess, it can create huge performance bottlenecks. In fact, perhaps the most significant issue that virtualized environments face is the fact that there are MANY performance chokepoints in the ecosystem, especially the storage subsystem.  These chokepoints are robbing 30-50% of your throughput.  This is the dark side of virtualized systems. 

Look at it this way.  VM “A” may have different resource requirements than VM “B” and so on.  Besides performing different tasks with different workloads, they may have different peak usage periods.  What happens when those peaks overlap?  Worse yet, what happens if several of your VMs have very similar resource requirements and workloads that constantly overlap? 

 

The answer is that the I/O Blender Effect takes over and now VM “A” is competing directly with VM “B” and VM “C” and so on.  The blender pours all those resource desires into a funnel, creating bottlenecks with unpredictable performance results.  What is predictable is that performance will suffer, and likely a LOT.

V-locity was designed from the ground up to intelligently deal with these core issues.  The guiding question in front of us as it was being designed and engineered, was: 

Given your workload and resources, how can V-locity help you overcome the I/O Blender Effect? 

By making sure that V-locity will adapt to your specific workload and having studied what kinds of I/Os amplify the I/O Blender Effect, we were able to add intelligence to specifically go after those I/Os.  We take a global view.  We aren’t limited to a specific application or workload.  While we do have technologies that shine under certain workloads, such as transactional SQL applications, our goal is to optimize the entire ecosystem.  That’s the only way to overcome the I/O Blender Effect.

So, while we can indeed give you great gains on a single VM, V-locity truly gets to shine and show off its purpose when it can intelligently deal with the chokepoints that create the I/O Blender Effect.  That means you should add V-locity to ALL your VMs.  With our no-reboot installation and a V-locity Management Console, it’s fast and easy to cover and manage your environment.

If you have V-locity on all the VMs on your host(s), let us know how it is going! If you don’t yet, contact your account manager who can get you set up!

For an in-depth refresher,  watch our 10-min whiteboard video

 

Overcoming the I/O Blender Effect with V-locity

by Rick Cadruvi, Chief Architect 5. August 2019 05:23

You’ve decided that you want to try out V-locity® software – kick the tires so to speak.  You’ll just load it on one of your Virtual Machines and see how it goes.  What kind of benefit will you see?

It is true that V-locity will likely provide significant benefit on that one VM.  It may even be extraordinary.  But loading V-locity on just one VM on a host with sometimes dozens of VMs won’t give you the biggest bang for your buck. V-locity includes many technologies that address storage performance issues in an extremely intelligent manner.  Part of the underlying design is to learn about the specific loads your system has and intelligently adapt to each specific environment presented to it.   That’s why we created a product especially for Virtual environments in the first place. 

The beauty of V-locity is its ability to deal with something called the I/O Blender Effect.  This is the dark side of virtualized systems.  When there are multiple VMs on a host, or multiple hosts with VMs that use the same back-end storage system (e.g., a SAN) a “blender” effect occurs when all these VMs are sending I/O requests up and down the stack.  As you can guess, it can create huge performance bottlenecks. In fact, perhaps the most significant issue that virtualized environments face is the fact that there are MANY performance chokepoints in the ecosystem, especially the storage subsystem.  These chokepoints are robbing 30-50% of your throughput.  That’s what V-locity can recover.

Look at it this way.  VM “A” may have different resource requirements than VM “B” and so on.  Besides performing different tasks with different workloads, they may have different peak usage periods.  What happens when those peaks overlap?  Worse yet, what happens if several of your VMs have very similar resource requirements and workloads that constantly overlap? 

 

The answer is that the I/O Blender Effect takes over and now VM “A” is competing directly with VM “B” and VM “C” and so on.  The blender pours all those resource desires into a funnel, creating bottlenecks with unpredictable performance results.  What is predictable is that performance will suffer, and likely a LOT.

Enter V-locity.  V-locity was designed from the ground up to intelligently deal with these core issues.  The guiding question in front of us as it was being designed and engineered, was:

Given your workload and resources, how can V-locity help you overcome the I/O Blender Effect?

By making sure that V-locity will adapt to your specific workload and having studied what kinds of I/Os amplify the I/O Blender Effect, we were able to add intelligence to specifically go after those I/Os.  We take a global view.  We aren’t limited to a specific application or workload.  While we do have technologies that shine under certain workloads, such as transactional SQL applications, our goal is to optimize the entire ecosystem.  That’s the only way to overcome the I/O Blender Effect.

So, while we can indeed give you great gains on a single VM, V-locity truly gets to shine and show off its purpose when it can intelligently deal with the chokepoints that create the I/O Blender Effect.  That means you should add V-locity to ALL your VMs.  With our no-reboot installation and a V-locity Management Console, it’s fast and easy to cover and manage your environment.

And yes, this same I/O Blender effect can occur in your physical environment with multiple physical systems all accessing different LUNs on the same SAN. Our Diskeeper® software is the answer here.

Go ahead and try V-locity on the VMs that are in the most competition for resources and you’ll be amazed at the benefits.  The chokepoints aren’t obvious or right in front of your face, but they are real and V-locity is the answer.  After that, just add V-locity to all your VMs, then sit back and see how smart you were to so easily improve throughput across your eco-system.

Video: Condusiv I/O Reduction Software Overview

Download a 30-day Free Trial

 

Why Faster Storage May NOT Fix It

by Rick Cadruvi, Chief Architect 20. September 2018 04:58

 

With all the myriad of possible hardware solutions to storage I/O performance issues, the question that people are starting to ask is something like:

         If I just buy newer, faster Storage, won’t that fix my application performance problem?

 The short answer is:

         Maybe Yes (for a while), Quite Possibly No.

I know – not a satisfying answer.  For the next couple of minutes, I want to take a 10,000-foot view of just three issues that affect I/O performance to shine some technical light on the question and hopefully give you a more satisfying answer (or maybe more questions) as you look to discover IT truth.  There are other issues, but let’s spend just a moment looking at the following three:

1.     Non-Application I/O Overhead

2.     Data Pipelines

3.     File System Overhead

These three issues by themselves can create I/O bottlenecks causing degradation to your applications of 30-50% or more.

Non-Application I/O Overhead:

One of the most commonly overlooked performance issues is that an awful lot of I/Os are NOT application generated.  Maybe you can add enough DRAM and go to an NVMe direct attached storage model and get your application data cached at an 80%+ rate.  Of course, you still need to process Writes and the NVMe probably makes that a lot faster than what you can do today.  But you still need to get it to the Storage.  And, there are lots of I/Os generated on your system that are not directly from your application.  There’s also lots of application related I/Os that are not targeted for caching – they’re simply non-essential overhead I/Os to manage metadata and such.  People generally don’t think about the management layers of the computer and application that have to perform Storage I/O just to make sure everything can run.  Those I/Os hit the data path to Storage along with the I/Os your application has to send to Storage, even if you have huge caches.  They get in the way and make your Application specific I/Os stall and slow down responsiveness.

And let’s face it, a full Hyper-Converged, NVMe based storage infrastructure sounds great, but there are lots of issues besides the enormous cost with that.  What about data redundancy and localization?  That brings us to issue # 2.

Data Pipelines: 

Since your data is exploding and you’re pushing 100s of Terabytes, perhaps Petabytes and in a few cases maybe even Exabytes of data, you’re not going to get that much data on your one server box, even if you didn’t care about hardware/data failures.  

Like it or not, you have an entire infrastructure of Servers, Switches, SANs, whatever.  Somehow, all that data needs to get to and from the application and wherever it is stored.  And if you add Cloud storage into the mix, it gets worse. At some point the data pipes themselves become the limiting factor.  Even with Converged infrastructures, and software technologies that stage data for you where it is supposedly needed most, data needs to be constantly shipped along a pipe that is nowhere close to the speed of access that your new high-speed storage can handle.  Then add lots of users and applications simultaneously beating on that pipe and you can quickly start to visualize the problem.

If this wasn’t enough, there are other factors and that takes us to issue #3.

File System Overhead:

You didn’t buy your computer to run an operating system.  You bought it to manipulate data.  Most likely, you don’t even really care about the actual application.  You care about doing some kind of work.  Most people use Microsoft Word to write documents.  I did to draft this blog.  But I didn’t really care about using Word.  I cared about writing this blog and Word was something I had, I knew how to use and was convenient for the task.  That’s your application, but manipulating the data is your real conquest.  The application is a tool to allow you to paint a beautiful picture of your data, so you can see it and accomplish your job better.

The Operating System (let’s say Windows), is one of a whole stack of tools between you, your application and your data.  Operating Systems have lots of layers of software to manage the flow from your user to the data and back.  Storage is a BLOB of stuff.  Whether it is spinning hard drives, SSDs, SANs, cloud-based storage, or you name it, it is just a canvas where the data can be stored.  One of the first strokes of the brush that will eventually allow you to create that picture you want from your data is the File System.  It brings some basic order.  You can see this by going into Windows File Explorer and perusing the various folders.  The file system abstracts that BLOB into pieces of data in a hierarchical structure with folders, files, file types, information about size/location/ownership/security, etc... you get the idea.  Before the painting you want to see from your data emerges, a lot of strokes need to be placed on the canvas and a lot of those strokes happen from the Operating and File Systems.  They try to manage that BLOB so your Application can turn it into usable data and eventually that beautiful (we hope) picture you desire to draw. 

Most people know there is an Operating System and those of you reading this know that Operating Systems use File Systems to organize raw data into useful components.  And there are other layers as well, but let’s focus.  The reality is there are lots of layers that have to be compensated for.  Ignoring file system overhead and focusing solely on application overhead is ignoring a really big Elephant in the room.

Let’s wrap this up and talk about the initial question.  If I just buy newer, faster Storage won’t that fix my application performance?  I suppose if you have enough money you might think you can.  You’ll still have data pipeline issues unless you have a very small amount of data, little if any data/compute redundancy requirements and a very limited number of users.  And yet, the File System overhead will still get in your way. 

When SSDs were starting to come out, Condusiv® worked with several OEMs to produce software to handle obvious issues like the fact that writes were slower and re-writes were limited in number. In doing that work, one of our surprise discoveries was that when you got beyond a certain level of file system fragmentation, the File System overhead of trying to collect/arrange the small pieces of data made a huge impact regardless of how fast the underlying storage was.  Just making sure data wasn’t broken down into too many pieces each time a need to manipulate it came along provided truly measurable and, in some instances, gave incredible performance gains. 

Then there is that whole issue of I/Os that have nothing to do with your data/application. We also discovered that there was a path to finding/eliminating the I/Os that, while not obvious, made substantial differences in performance because we could remove those out of the flow, thus allowing the I/Os your application wants to perform happen without the noise.  Think of traffic jams.  Have you ever driven in stop and go traffic and noticed there aren’t any accidents or other distractions to account for such slowness?  It’s just too many vehicles on the road with you.  What if you could get all the people who were just out for a drive, off the road?  You’d get where you want to go a LOT faster.  That’s what we figured out how to do.  And it turns out no one else is focused on that - not the Operating System, not the File System, and certainly not your application. 

And then you got swamped with more data.  Okay, so you’re in an industry where regulations forced that decision on you.  Either way, you get the point.  There was a time when 1GB was more storage than you would ever need.  Not too long ago, 1TB was the ultimate.  Now that embedded SSD on your laptop is 1TB.  Before too long, your phone will have 1TB of storage.  Mine has 128GB, but hey I’m a geek and MicroSD cards are cheap.  My point is that the explosion of data in your computing environment strains File System Architectures.  The good news is that we’ve built technologies to compensate for and fix limitations in the File System.

Let me wrap this up by giving you a 10,000-foot view of us and our software.  The big picture is we have been focused on Storage Performance for a very long time and at all layers.  We’ve seen lots of hardware solutions that were going to fix Storage slowness.  And we’ve seen that about the time a new generation comes along, there will be reasons it will still not fix the problem.  Maybe it does today, but tomorrow you’ll overtax that solution as well.  As computing gets faster and storage gets denser, your needs/desires to use it will grow even faster.  We are constantly looking into the crystal ball knowing the future presents new challenges.  We know by looking into the rear-view mirror, the future doesn’t solve the problem, it just means the problems are different.  And that’s where I get to have fun.  I get to work on solving those problems before you even realize they exist.  That’s what turns us on.  That’s what we do, and we have been doing it for a long time and, with all due modesty, we’re really good at it! 

So yes, go ahead and buy that shiny new toy.  It will help, and your users will see improvements for a time.  But we’ll be there filling in those gaps and your users will get even greater improvements.  And that’s where we really shine.  We make you look like the true genius you are, and we love doing it.

  

 

Dashboard Analytics 13 Metrics and Why They Matter

by Rick Cadruvi, Chief Architect 11. July 2018 09:12

 

Our latest V-locity®, Diskeeper® and SSDkeeper® products include a built-in dashboard that reports the benefits our software is providing.  There are tabs in the dashboard that allow users to view very granular data that can help them assess the impact of our software.  In the dashboard Analytics tab we display hourly data for 13 key metrics.  This document describes what those metrics are and why we chose them as key to understanding your storage performance, which directly translates to your application performance.

To start with, let’s spend a moment  trying to understand why 24-hour graphs matter.  When you, and/or your users really notice bottlenecks is generally during peak usage periods.  While some servers are truly at peak usage 24x7,  most systems, including servers, have peak I/O periods.  These almost always follow peak user activity.  

Sometimes there will be spikes also in the overnight hours when you are doing backups, virus scans, large report/data maintenance jobs, etc.  While these may not be your major concern, some of our customers find that these overlap their daytime production and therefore can easily be THE major source of concern.  For some people, making these happen before the deluge of daytime work starts, is the single biggest factor they deal with.

Regardless of what causes the peaks, it is at those peak moments when performance matters most.  When little is happening, performance rarely matters.  When a lot is happening, it is key.  The 24-hour graphs allow you to visually see the times when performance matters to you.  You can also match metrics during specific hours to see where the bottlenecks are and what technologies of ours are most effective during those hours. 

Let’s move on to the actual metrics.

 

Total I/Os Eliminated

 

Total I/Os eliminated measures the number of I/Os that would have had to go through to storage if our technologies were not eliminating them before they ever got sent to storage.  We eliminate I/Os in one of two ways.  First, via our patented IntelliMemory® technology, we satisfy I/Os from memory without the request ever going out to the storage device.  Second, several of our other technologies, such as IntelliWrite® cause the data to be stored more efficiently and densely so that when data is requested, it takes less I/Os to get the same amount of data as would otherwise be required.  The net effect is that your storage subsystems see less actual I/Os sent to them because we eliminated the need for those extra I/Os.  That allows those I/Os that do go to storage to finish faster because they aren’t waiting on the eliminated I/Os to complete.

 

IOPS

IOPS stands for I/Os Per Second.  It is the number of I/OS that you are actually requesting.  During the times with the most activity, I/Os eliminated actually causes this number to be much higher than would be possible with just your storage subsystem.  It is also a measure of the total amount of work your applications/systems are able to accomplish.

 

Data from Cache (GB)

Data from cache tells you how much of that total throughput was satisfied directly from cache.  This can be deceiving.  Our caching algorithms are aimed at eliminating a lot of small noisy I/Os that jam up the storage subsystem works.  By not having to process those, the data freeway is wide open.  This is like a freeway with accidents.  Even though the cars have moved to the side, the traffic slows dramatically.  Our cache is like accident avoidance.  It may be just a subset of the total throughput, but you process a LOT more data because you aren’t waiting for those noisy, necessary I/Os that hold your applications/systems back.

Throughput (GB Total)

Throughput is the total amount of data you process and is measured in GigaBytes.  Think of this like a freight train.  The more railcars, the more total freight being shipped.  The higher the throughput, the more work your system is doing.

 

Throughput (MB/Sec)

Throughput is a measure of the total volume of data flowing to/from your storage subsystem.  This metric measures throughput in MegaBytes per second kind of like your speedometer versus your odometer.

I/O Time Saved (seconds)

The I/O Time Saved metric tells you how much time you didn’t have to wait for I/Os to complete because of the physical I/Os we eliminated from going to storage.  This can be extremely important during your busiest times.  Because I/O requests overlap across multiple processes and threads, this time can actually be greater than elapsed clock time.  And what that means to you is that the total amount of work that gets done can actually experience a multiplier effect because systems and applications tend to multitask.  It’s like having 10 people working on sub-tasks at the same time.  The projects finish much faster than if 1 person had to do all the tasks for the project by themselves.  By allowing pieces to be done by different people and then just plugging them altogether you get more done faster.  This metric measures that effect.

 

I/O Response Time

I/O Response time is sometimes referred to as Latency.  It is how long it takes for I/Os to complete.  This is generally measured in milliseconds.  The lower the number, the better the performance.

Read/Write %

Read/Write % is the percentage of Reads to Writes.  If it is at 75%, 3 out of every 4 I/Os are Reads to each Write.  If it were 25%, then it would signify that there are 3 Writes per each Read.

 

Read I/Os Eliminated

This metric tells you how many Read I/Os we eliminated.  If your Read to Write ratio is very high, this may be one of the most important metrics for you.  However, remember that eliminating Writes means that Reads that do go to storage do NOT have to wait for those writes we eliminated to complete.  That means they finish faster.  Of course, the same is true that Reads eliminated improves overall Read performance.

% Read I/Os Eliminated

 

% Read I/Os Eliminated tells you what percentage of your overall Reads were eliminated from having to be processed at all by your storage subsystem.

 

Write I/Os Eliminated

This metric tells you how many Write I/Os we eliminated.  This is due to our technologies that improve the efficiency and density of data being stored by the Windows NTFS file system.

% Write I/Os Eliminated 

 

% Write I/Os Eliminated tells you what percentage of your overall Writes were eliminated from having to be processed at all by your storage subsystem.

Fragments Prevented and Eliminated

Fragments Prevented and Eliminated gives you an idea of how we are causing data to be stored more efficiently and dense, thus allowing Windows to process the same amount of data with far fewer actual I/Os.

If you have our latest versions of V-locity, Diskeeper or SSDkeeper installed, you can open the Dashboard now and select the Analytics tab and see all of these metrics.

If you don’t have the latest version installed and you have a current maintenance agreement, login to your online account to download and install the software.

Not a customer yet and want to checkout these dashboard metrics, download a free trial at www.condusiv.com/try.

The Inside Story of Condusiv’s “No Reboot” Quest

by Rick Cadruvi, Chief Architect 17. April 2018 04:57

In a world of 24/7 uptime and rare reboot windows, one of our biggest challenges as a company has simply been getting our own customers upgraded to the latest version of our I/O reduction software.

In the last year, we have done dashboard review sessions with a substantial number of customers to demonstrate the power of our latest versions to hybrid and all-flash arrays, hyperconverged systems, Azure/AWS, local SSDs, and more. However, many remain undone simply because customers can’t find the time for reboot windows to upgrade to the latest versions with the most powerful engines and new benefits dashboard. This has been particularly challenging for customers with hundreds to thousands of servers.

Even though we own the trademark term, “Set It and Forget It®,” there was always one aspect that wasn’t, and that’s the fact that it required a reboot to install or upgrade.

Herein lies the problem – important components of our software sit at the storage driver level. At least to the best of our knowledge, all other software vendors who sit at that layer also require a reboot to install or upgrade. So, consider our engineering challenge to take on a project most people wouldn’t know was even solvable.

Let’s start with an explanation as to why this barrier existed. Our software contains several filter drivers that allow us to implement leading edge performance enhancing technologies.  Some of them act at the Windows File System level. Windows has long provided a Filter Manager that allows developers to create File System and Network filter drivers that can be loaded and unloaded without requiring a reboot.  You will quickly recognize that Anti-Malware and Data Backup/Recovery software tends to be the principle targets for this Filter Manager. There are also products such as data encryption that benefit from the Windows Filter Manager. And, as it turns out, we benefit because some of our filter drivers run above the File System.

However, sometimes a software product needs to be closer to the physical hardware itself. This allows a much broader view of what is going on with the actual I/O to the physical device subsystem. There are quite a few software products that need this bigger view. It turns out that we do also.  One of the reasons, is to allow our patented IntelliMemory® caching software to eliminate a huge amount of noisy I/O that creates substantial, yet preventable, bottlenecks to your application. This is I/O that your application wouldn’t even recognize as problematic to its performance, nor would you. Because we have a global view, we can eliminate a large percentage of I/Os from having to go to storage, while using very limited system resources. We also have other technologies that benefit from our telemetry disk filter that helps us see a more global picture of storage performance and what is actually causing bottlenecks. This allows us to focus our efforts on the true causes of those bottlenecks, giving our customers the greatest bang for their buck.  Because we collect excellent empirical data about what is causing the bottlenecks, we can apply very limited and targeted system resources to deliver very significant storage performance increases. Keep in mind, the limited CPU cycles we use operate at lowest priority and we only use resources that are otherwise idle, so the benefits of our engines are completely non-intrusive to overall server performance.

Why does the above matter? Well, the Microsoft Filter Manager doesn’t provide support for most driver stacks and this includes the parts of the storage driver stack below the File System. That means that our disk filter drivers couldn’t actually start providing their benefits upon initial install until after a reboot. If we add new functionality to provide even greater storage performance via a change to one of our disk filter drivers, a reboot was required after an update before the new functionality could be brought to bear.

Until now we just lived with the restrictions. We didn’t live with it because we couldn’t create a solution, but because we anticipated that the frequency of Windows updates, especially security-based updates, would start to increase the frequency of server reboot requirements and the problem would, for all intents and purposes, become manageable. Alas, our hopes and dreams in this area failed to materialize. 

We’ve been doing Windows system and especially kernel software development for decades. I just attended Plugfest 30 for file system filter driver developers.  This is a Microsoft event to ensure high-quality standards for products with filter drivers like ours. We were also at the first Plugfest nearly two decades ago. In addition, we also wrote the Windows NTFS file system component to allow safe, live file defragmentation for Windows NT dating back to the Windows NT 3.51 release.  That by itself is an interesting story, but I’ll leave that for another time.

Anyway, we finally realized that our crystal ball prediction about an increase in the frequency of Windows Server reboots due to Windows Update cycles (patch Tuesday?) was a little less clear than we had hoped. Accepting that this problem wasn’t going away, we set out to create our own Filter Manager to provide a mechanism that allowed filter drivers on stacks not supported by the Microsoft Filter Manager to be inserted and removed without the reboot requirement. This was something we’ve been considering, talked about with other software vendors in a similar situation, and even prototyped before. The time had finally come where we needed to facilitate our customers in getting the significant increased performance from our software immediately instead of waiting for reboot opportunities.

We took our decades of experience and knowledge of Windows Operating System internals and experience developing Kernel software and aimed it at giving our customers the relief from this limitation. The result is in our latest release of V-locity® 7.0, Diskeeper® 18, and SSDkeeper™ 2.0. 

We’d love to hear your stories about how this revolutionary enablement technology has made a difference for you and your organization.

Tags:

Diskeeper | V-Locity

RecentComments

Comment RSS

Month List

Calendar

<<  December 2019  >>
MoTuWeThFrSaSu
2526272829301
2345678
9101112131415
16171819202122
23242526272829
303112345

View posts in large calendar