Condusiv Technologies Blog

Condusiv Technologies Blog

Blogging @Condusiv

The Condusiv blog shares insight into the issues surrounding system and application performance—and how I/O optimization software is breaking new ground in solving those issues.

Thinking Outside the Box - How to Dramatically Improve SQL Performance, Part 1

by Howard Butler 3. April 2019 04:10

If you are reading this article, then most likely you are about to evaluate V-locity® or Diskeeper® on a SQL Server (or already have our software installed on a few servers) and have some questions about why it is a best practice recommendation to place a memory limit on SQL Servers in order to get the best performance from that server once you’ve installed one of our solutions.

To give our products a fair evaluation, there are certain best practices we recommend you follow.  Now, while it is true most servers already have enough memory and need no adjustments or additions, a select group of high I/O, high performance, or high demand servers, may need a little extra care to run at peak performance.

This article is specifically focused on those servers and the best-practice recommendations below for available memory. They are precisely targeted to those “work-horse” servers.  So, rest assured you don’t need to worry about adding tons of memory to your environment for all your other servers.

One best practice we won’t dive into here, which will be covered in a separate article, is the idea of deploying our software solutions to other servers that share the workload of the SQL Server, such as App Servers or Web Servers that the data flows through.  However, in this article we will shine the spotlight on best practices for SQL Server memory limits.

We’ve sold over 100 million licenses in over 30 years of providing Condusiv® Technologies patented software.  As a result, we take a longer term and more global view of improving performance, especially with the IntelliMemory® caching component that is part of V-locity and Diskeeper. We care about maximizing overall performance knowing that it will ultimately improve application performance.  We have a significant number of different technologies that look for I/Os that we can eliminate out of the stream to the actual storage infrastructure.  Some of them look for inefficiencies caused at the file system level.  Others take a broader look at the disk level to optimize I/O that wouldn’t normally be visible as performance robbing.  We use an analytical approach to look for I/O reduction that gives the most bang for the buck.  This has evolved over the years as technology changes.  What hasn’t changed is our global and long-term view of actual application usage of the storage subsystem and maximizing performance, especially in ways that are not obvious.

Our software solutions eliminate I/Os to the storage subsystem that the database engine is not directly concerned with and as a result we can greatly improve the speed of I/Os sent to the storage infrastructure from the database engine.  Essentially, we dramatically lessen the number of competing I/Os that slow down the transaction log writes, updates, data bucket reads, etc.  If the I/Os that must go to storage anyway aren’t waiting for I/Os from other sources, they complete faster.  And, we do all of this with an exceptionally small amount of idle, free, unused resources, which would be hard pressed for anyone to even detect through our self-learning and dynamic nature of allocating and releasing resources depending on other system needs.

It’s common knowledge that SQL Server has specialized caches for the indexes, transaction logs, etc.  At a basic level the SQL Server cache does a good job, but it is also common knowledge that it’s not very efficient.  It uses up way too much system memory, is limited in scope of what it caches, and due to the incredible size of today’s data stores and indexes it is not possible to cache everything.  In fact, you’ve likely experienced that out of the box, SQL Server will grab onto practically all the available memory allocated to a system.

It is true that if SQL Server memory usage is left uncapped, there typically wouldn’t be enough memory for Condusiv’s software to create a cache with.  Hence, why we recommend you place a maximum memory usage in SQL Server to leave enough memory for IntelliMemory cache to help offload more of the I/O traffic.  For best results, you can easily cap the amount of memory that SQL Server consumes for its own form of caching or buffering.  At the end of this article I have included a link to a Microsoft document on how to set Max Server Memory for SQL as well as a short video to walk you through the steps.

A general rule of thumb for busy SQL database servers would be to limit SQL memory usage to keep at least 16 GB of memory free.  This would allow enough room for the IntelliMemory cache to grow and really make that machine’s performance 'fly' in most cases.  If you can’t spare 16 GB, leave 8 GB.  If you can’t afford 8 GB, leave 4 GB free.  Even that is enough to make a difference.  If you are not comfortable with reducing the SQL Server memory usage, then at least place a maximum value of what it typically uses and add 4-16 GB of additional memory to the system.  

We have intentionally designed our software so that it can’t compete for system resources with anything else that is running.  This means our software should never trigger a memory starvation situation.  IntelliMemory will only use some of the free or idle memory that isn’t being used by anything else, and will dynamically scale our cache up or down, handing memory back to Windows if other processes or applications need it.

Think of our IntelliMemory caching strategy as complementary to what SQL Server caching does, but on a much broader scale.  IntelliMemory caching is designed to eliminate the type of storage I/O traffic that tends to slow the storage down the most.  While that tends to be the smaller, more random read I/O traffic, there are often times many repetitive I/Os, intermixed with larger I/Os, which wreak havoc and cause storage bandwidth issues.  Also keep in mind that I/Os satisfied from memory are 10-15 times faster than going to flash.  

So, what’s the secret sauce?  We use a very lightweight storage filter driver to gather telemetry data.  This allows the software to learn useful things like:

- What are the main applications in use on a machine?
- What type of files are being accessed and what type of storage I/O streams are being generated?
- And, at what times of the day, the week, the month, the quarter? 

IntelliMemory is aware of the 'hot blocks' of data that need to be in the memory cache, and more importantly, when they need to be there.  Since we only load data we know you’ll reference in our cache, IntelliMemory is far more efficient in terms of memory usage versus I/O performance gains.  We can also use that telemetry data to figure out how best to size the storage I/O packets to give the main application the best performance.  If the way you use that machine changes over time, we automatically adapt to those changes, without you having to reconfigure or 'tweak' any settings.


Stayed tuned for the next in the series; Thinking Outside The Box Part 2 – Test vs. Real World Workload Evaluation.

 

Main takeaways:

- Most of the servers in your environment already have enough free and available memory and will need no adjustments of any kind.
- Limit SQL memory so that there is a minimum of 8 GB free for any server with more than 40 GB of memory and a minimum of 6 GB free for any server with 32 GB of memory.  If you have the room, leave 16 GB or more memory free for IntelliMemory to use for caching.
Another best practice is to deploy our software to all Windows servers that interact with the SQL Server.  More on this in a future article.

 

 

Microsoft Document – Server Memory Server Configuration Options

https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/server-memory-server-configuration-options?view=sql-server-2017

 

Short video – Best Practices for Available Memory for V-locity or Diskeeper

https://youtu.be/vwi7BRE58Io

At around the 3:00 minute mark, capping SQL Memory is demonstrated.

A Deep Dive Into The I/O Performance Dashboard

by Howard Butler 2. August 2018 08:36

While most users are familiar with the main Diskeeper®/V-locity®/SSDkeeper™ Dashboard view which focuses on the number of I/Os eliminated and Storage I/O Time Saved, the I/O Performance Dashboard tab takes a deeper look into the performance characteristics of I/O activity.  The data shown here is similar in nature to other Windows performance monitoring utilities and provides a wealth of data on I/O traffic streams. 

By default, the information displayed is from the time the product was installed. You can easily filter this down to a different time frame by clicking on the “Since Installation” picklist and choosing a different time frame such as Last 24 Hours, Last 7 Days, Last 30 Days, Last 60 Days, Last 90 Days, or Last 180 Days.  The data displayed will automatically be updated to reflect the time frame selected.

 

The first section of the display above is labeled as “I/O Performance Metrics” and you will see values that represent Average, Minimum, and Maximum values for I/Os Per Second (IOPS), throughput measured in Megabytes per Second (MB/Sec) and application I/O Latency measured in milliseconds (msecs). Diskeeper, V-locity and SSDkeeper use the Windows high performance system counters to gather this data and it is measured down to the microsecond (1/1,000,000 second).

While most people are familiar with IOPS and throughput expressed in MB/Sec, I will give a short description just to make sure. 

IOPS is the number of I/Os completed in 1 second of time.  This is a measurement of both read and write I/O operations.  MB/Sec is a measurement that reflects the amount of data being worked on and passed through the system.  Taken together they represent speed and throughput efficiency.  One thing I want to point out is that the Latency value shown in the above report is not measured at the storage device, but instead is a much more accurate reflection of I/O response time at an application level.  This is where the rubber meets the road.  Each I/O that passes through the Windows storage driver has a start and completion time stamp.  The difference between these two values measures the real-world elapsed time for how long it takes an I/O to complete and be handed back to the application for further processing.  Measurements at the storage device do not account for network, host, and hypervisor congestion.  Therefore, our Latency value is a much more meaningful value than typical hardware counters for I/O response time or latency.  In this display, we also provide meaningful data on the percentage of I/O traffic- which are reads and which are writes.  This helps to better gauge which of our technologies (IntelliMemory® or IntelliWrite®) is likely to provide the greatest benefit.

The next section of the display measures the “Total Workload” in terms of the amount of data accessed for both reads and writes as well as any data satisfied from cache. 

 

A system which has higher workloads as compared to other systems in your environment are the ones that likely have higher I/O traffic and tend to cause more of the I/O blender effect when connected to a shared SAN storage or virtualized environment and are prime candidates for the extra I/O capacity relief that Diskeeper, V-locity and SSDkeeper provide.

Now moving into the third section of the display labeled as “Memory Usage” we see some measurements that represent the Total Memory in the system and the total amount of I/O data that has been satisfied from the IntelliMemory cache.  The purpose of our patented read caching technology is twofold.  Satisfy from cache the frequently repetitive read data requests and be aware of the small read operations that tend to cause excessive “noise” in the I/O stream to storage and satisfy them from the cache.  So, it’s not uncommon to see the “Data Satisfied from Cache” compared to the “Total Workload” to be a bit lower than other types of caching algorithms.  Storage arrays tend to do quite well when handed large sequential I/O traffic but choke when small random reads and writes are part of the mix.  Eliminating I/O traffic from going to storage is what it’s all about.  The fewer I/Os to storage, the faster and more data your applications will be able to access.

In addition, we show the average, minimum, and maximum values for free memory used by the cache.  For each of these values, the corresponding Total Free Memory in Cache for the system is shown (Total Free Memory is memory used by the cache plus memory reported by the system as free).  The memory values will be displayed in a yellow color font if the size of the cache is being severely restricted due to the current memory demands of other applications and preventing our product from providing maximum I/O benefit.  The memory values will be displayed in red if the Total Memory is less than 3GB.

Read I/O traffic, which is potentially cacheable, can receive an additional benefit by adding more DRAM for the cache and allowing the IntelliMemory caching technology to satisfy a greater amount of that read I/O traffic at the speed of DRAM (10-15 times faster than SSD), offloading it away from the slower back-end storage. This would have the effect of further reducing average storage I/O latency and saving even more storage I/O time.

Additional Note: For machines running SQL Server or Microsoft Exchange, you will likely need to cap the amount of memory that those applications can use (if you haven’t done so already), to prevent them from ‘stealing’ any additional memory that you add to those machines.

It should be noted the IntelliMemory read cache is dynamic and self-learning.  This means you do not need to pre-allocate a fixed amount of memory to the cache or run some pre-assessment tool or discovery utility to determine what should be loaded into cache.  IntelliMemory will only use memory that is otherwise, free, available, or unused memory for its cache and will always leave plenty of memory untouched (1.5GB – 4GB depending on the total system memory) and available for Windows and other applications to use.  As there is a demand for memory, IntelliMemory will release memory from it’s cache and give this memory back to Windows so there will not be a memory shortage.  There is further intelligence with the IntelliMemory caching technology to know in real time precisely what data should be in cache at any moment in time and the relative importance of the entries already in the cache.  The goal is to ensure that the data maintained in the cache results in the maximum benefit possible to reduce Read I/O traffic. 

So, there you have it.  I hope this deeper dive explanation provides better clarity to the benefit and internal workings of Diskeeper, V-locity and SSDkeeper as it relates to I/O performance and memory management.

You can download a free 30-day, fully functioning trial of our software and see the new dashboard here: www.condusiv.com/try

Everything You Need to Know about SSDs and Fragmentation in 5 Minutes

by Howard Butler 17. November 2016 05:42

When reading articles, blogs, and forums posted by well-respected (or at least well intentioned people) on the subject of fragmentation and SSDs, many make statements about how (1) SSDs don’t fragment, or (2) there’s no moving parts, so no problem, or (3) an SSD is so fast, why bother? We all know and agree SSDs shouldn’t be “defragmented” since that shortens lifespan, so is there a problem after all?

The truth of the matter is that applications running on Windows do not talk directly to the storage device.  Data is referenced as an abstracted layer of logical clusters rather than physical track/sectors or specific NAND-flash memory cells.  Before a storage unit (HDD or SSD) can be recognized by Windows, a file system must be prepared for the volume.  This takes place when the volume is formatted and in most cases is set with a 4KB cluster size.  The cluster size is the smallest unit of space that can be allocated.  Too large of a cluster size results in wasted space due to over allocation for the actual data needed.  Too small of a cluster size causes many file extents or fragments.  After formatting is complete and when a volume is first written to, most all of the free space is in just one or two very large sections.  Over the course of time as files of various sizes are written, modified, re-written, copied, and deleted, the size of individual sections of free space as seen from the NTFS logical file system point of view becomes smaller and smaller.  I have seen both HDD and SSD storage devices with over 3 million free space extents.  Since Windows lacks file size intelligence when writing a file, it never chooses the best allocation at the logical layer, only the next available – even if the next available is 4KB. That means 128K worth of data could wind up with 32 extents or fragments, each being 4KB in size. Therefore SSDs do fragment at the logical Windows NTFS file system level.  This happens not as a function of the storage media, but of the design of the file system.

Let’s examine how this impacts performance.  Each extent of a file requires its own separate I/O request. In the example above, that means 32 I/O operations for a file that could have taken a single I/O if Windows was smarter about managing free space and finding the best logical clusters instead of the next available. Since I/O takes a measurable amount of time to complete, the issue we’re talking about here related to SSDs has to do with an I/O overhead issue.

Even with no moving parts and multi-channel I/O capability, the more I/O requests needed to complete a given workload, the longer it is going to take your SSD to access the data.  This performance loss occurs on initial file creation and carries forward with each subsequent read of the same data.  But wait… the performance loss doesn’t stop there.  Once data is written to a memory cell on an SSD and later the file space is marked for deletion, it must first be erased before new data can be written to that memory cell.  This is a rather time consuming process and individual memory cells cannot be individually erased, but instead a group of adjacent memory cells (referred to as a page) are processed together.  Unfortunately, some of those memory cells may still contain valuable data and this information must first be copied to a different set of memory cells before the memory cell page (group of memory cells) can be erased and made ready to accept the new data.  This is known as Write Amplification.  This is one of the reasons why writes are so much slower than reads on an SSD.  Another unique problem associated with SSDs is that each memory cell has a limited number of times that a memory cell can be written to before that memory cell is no longer usable.  When too many memory cells are considered invalid the whole unit becomes unusable.  While TRIM, wear leveling technologies, and garbage collection routines have been developed to help with this behavior, they are not able to run in real-time and therefore are only playing catch-up instead of being focused on the kind of preventative measures that are needed the most.  In fact, these advanced technologies offered by SSD manufacturers (and within Windows) do not prevent or reverse the effects of file and free space fragmentation at the NTFS file system level.

The only way to eliminate this surplus of small, tiny writes and reads that (1) chew up performance and (2) shorten lifespan from all the wear and tear is by taking a preventative approach that makes Windows “smarter” about how it writes files and manages free space, so more payload is delivered with every I/O operation. That’s exactly why more users run Condusiv’s Diskeeper® (for physical servers and workstations) or V-locity® (for virtual servers) on systems with SSD storage. For anyone who questions how much value this approach adds to their systems, the easiest way to find out is by downloading a free 30-day trial and watch the “time saved” dashboard for yourself. Since the fastest I/O is the one you don’t have to write, Condusiv software understands exactly how much time is saved by eliminating multiple, fractured writes with fewer, larger contiguous writes. It even has an additional feature to cache reads from idle, available DRAM (15X faster than SSD), which further offloads I/O bandwidth to SSD storage. Especially for businesses with many users accessing a multitude of applications across hundreds or thousands of servers, the time savings are enormous.

 

ATTO Benchmark Results with and without Diskeeper 16 running on a 120GB Samsung SSD Pro 840. The read data caching shows a 10X improvement in read performance.

Setting the Record Straight - Windows 7 Fragmentation, SSDs, and You

by Howard Butler 21. January 2012 14:50

In today’s well connected world of electronics and instant communications I received a text from a friend asking if I had seen the recent PC World magazine (February, 2012).  He said it had some tidbit of information concerning one of my favorite subjects; system performance, defragmentation, and SSDs.  I located a copy here at the office and found the article. As I read the first line I realized the debate on the virtues of defragmentation especially on SSDs will be one that goes on indefinitely as no one really talks about the issue with supporting hard facts and numbers.  Most articles are rehashing ideas and opinions long since debunked.  They continue to surface because very few truly understand the intricacies of the Windows NTFS file system and that of the storage media, whether it is rotating magnetic hard disks or electronic solid state disks.

So let’s set the record straight… Fragmentation is exponentially more of a problem with today’s data explosion. Defragmenting once a week will still cause the user to experience slowdowns from the degradation effects and doesn’t address the issue when files are initially being written.  And yes, never do a traditional defrag on SSDs.

NTFS file and free space fragmentation happens far more frequently than you might guess.  It has the potential to happen as soon as you install the operating system.  It can happen when you install applications or system updates, access the internet, download and save photos, create e-mail, office documents, etc…  It is a normal occurrence and behavior of the computer system, but does have a negative effect on over all application and system performance.  As fragmentation happens the computer system and underlying storage is performing more work than necessary.  Each I/O request takes a measurable amount of time.  Even in SSD environments there is no such thing as an “instant” I/O request.  Any time an application requests to read or write data and that request is split into additional I/O requests it causes more work to be done.   This extra work causes a delay right at that very moment in time.  Whoever thought that defragmenting once a month or weekly was good enough, simply didn’t understand fragmentation.

Disk drives have gotten faster over the years, but so have CPUs.  In fact, the gap between the difference in speed between hard disks and CPU has actually widened.  This means that applications can get plenty of CPU cycles, but they are still starving to get the data from the storage.  What’s more, the amount of data that is being stored has increased dramatically.  Just think of all those digital photos taken and shared over the holidays.  Each photo use to be approximately 1MB in size, now they are exceeding 15MB per photo and some go way beyond that.  Video editing and rendering and storage of digital movies have also become quite popular and as a result applications are manipulating hundreds of Gigabytes of data.  With typical disk cluster sizes of 4k, a 15MB size file could potentially be fragmented into nearly 4,000 extents.  This means an extra 4,000 disk I/O requests are required to read or write the file.  No matter what type of storage, it will simply take longer to complete the operation.

Suppose I chose to do some editing of my family videos on Tuesday evening.  Even the built-in defragmentation tool in Windows 7 doesn’t do me much good because it isn’t schedule to run until Wednesday morning at 1:00am.  This also means that quite a bit of fragmentation has built up since the previous week when it last ran.  Maybe I’ll manually run it, but that can take quite a while and I’ve wasted time that I would have rather spent on my project.  Unfortunately, the Windows built-in defragmentation utility doesn’t prevent fragmentation so even after running it manually; I still will wind up with fragmentation and slow access speed of my newly created files. 

I’ve often thought about why Wednesday at 1:00am was chosen as the time to schedule defragmentation.  Why isn’t it scheduled all the time?   It is because there could be system resource conflicts that either interfere with getting the task done or the defragmentation process has difficulty throttling back under a variety of conditions.  Regardless, this wait a week to clean up fragmentation doesn’t really help me when I need it most.

As pointed out in the article, the built-in defragmenter does not have the technology advancement to properly deal with fragmentation and SSDs. The physical placement of data on an SSD doesn’t really matter like it does on regular magnetic HDDs.  With an SSD there is no rotational latency or seek time to contend with.  Many experts assume that fragmentation is no longer a problem, but the application data access speed isn’t just defined in those terms.  Each and every I/O request performed takes a measurable amount of time.  SSD’s are fast, but they are not instantaneous.  Windows NTFS file system does not behave any differently because the underlying storage is an SSD vs. HDD and therefore fragmentation still occurs.  Reducing the unnecessary I/O’s by preventing and eradicating the fragmentation reduces the number of I/O requests and as a result speeds up application data response time and improve the overall lifespan of the SSD.  In essence, this makes for more sequential I/O operations which is generally faster and outperforms random writes.

In addition, SSD’s require that old data be erased before new data is written over it, rather than just writing over the old information as with HDDs.  This doubles the wear and tear and can cause major issues with the speed performance and lifespan of the SSD.  Most SSD manufactures have very sophisticated wear-leveling technologies to help with this. The principle issue is write speed degradation due to free space fragmentation.  Small free spaces scattered across the SSD causes the NTFS file system to write a file in fragmented pieces to those small available free spaces.  This has the effect of causing more random I/O traffic that is slower than sequential operations.

I think I have clearly made my point….  The built-in defragmenter in Windows 7 is not a solution for neither the consumer/home user, nor the enterprise business user.  Data access speeds are far more critical in the business world where time is money.  In the enterprise environment there are generally many more files that are used by higher number of users that are accessing data across shared type of storage such as SAN.  Even virtual platforms benefit from the same points covered.  This opens the door and is the reason why robust solutions such as Diskeeper exist.  More data about Diskeeper and the superior technology it offers can be found at http://www.diskeeper.com.

RecentComments

Comment RSS

Month List

Calendar

<<  April 2019  >>
MoTuWeThFrSaSu
25262728293031
1234567
891011121314
15161718192021
22232425262728
293012345

View posts in large calendar