Condusiv Technologies Blog

Condusiv Technologies Blog

Blogging @Condusiv

The Condusiv blog shares insight into the issues surrounding system and application performance—and how I/O optimization software is breaking new ground in solving those issues.

Condusiv Launches SSDkeeper Software that Guarantees “Faster than New” Performance for PCs and Physical Servers and Extends Longevity of SSDs

by Brian Morin 17. January 2017 09:30

The company that sold over 100 Million Diskeeper® licenses for hard disk drive systems, now releases SSDkeeper™ to keep solid-state drive systems running longer while performing “faster than new.”

Every Windows PC or physical server fitted with a solid-state drive (SSD) suffers from very small, fractured writes and reads, which dampen optimal SSD performance and ultimately erodes the longevity of SSDs from write amplification issues. SSDkeeper’s patented software ensures large, clean contiguous writes and reads for more payload with every I/O operation, reduced Program/Erase (P/E) cycles that shorten SSD longevity, and boosts performance even further with its ability to cache hot reads within idle, available DRAM.

Solid-state drives can only handle a number of finite writes before failing. Every write kicks off P/E cycles that shorten SSD lifespan otherwise known as write amplification. By reducing the number of writes required for any given file or workload, SSDkeeper significantly boosts write performance speed while also reducing the number of P/E cycles that would have otherwise been executed. This enables individuals and organizations to reclaim the write speed of their SSD drives while ensuring the longest life possible.

Patented Write Optimization

SSDkeeper’s patented write optimization engine (IntelliWrite®) prevents excessively small, fragmented writes and reads that rob the performance and endurance of SSDs. SSDkeeper ensures large, clean contiguous writes from Windows, so maximum payload is carried with every I/O operation. By eliminating the “death by a thousand cuts” scenario of many, tiny writes and reads that slow system performance, the lifespan of an SSD is also extended due to reduction in write amplification issues that plague all SSD devices.

Patented Read Optimization

SSDkeeper electrifies Windows system performance further with an additional patented feature - dynamic memory caching (IntelliMemory®). By automatically using idle, available DRAM to serve hot reads, data is served from memory which is 12-15X faster than SSD and further reduces wear to the SSD device. The real genius in SSDkeeper’s DRAM caching engine is that nothing has to be allocated for cache. All caching occurs automatically. SSDkeeper dynamically uses only the memory that is available at any given moment and throttles according to the need of the application, so there is never an issue of resource contention or memory starvation. If a system is ever memory constrained at any point, SSDkeeper's caching engine will back off entirely. However, systems with just 4GB of available DRAM commonly serve 50% of read traffic. It doesn't take much available memory to have a big impact on performance.

Enhanced Reporting

If you ever wanted to know how much Windows inefficiencies were robbing system performance, SSDkeeper tracks time saved due to elimination of small, fragmented writes and time saved from every read request that is served from DRAM instead of being served from the underlying SSD. Users can leverage SSDkeeper’s built-in dashboard to see what percentage of all write requests are reduced by sequentializing otherwise small, fractured writes and what percentage of all read requests are cached from idle, available DRAM.

SSDkeeper is a lightweight file system driver that runs invisibly in the background with near-zero intrusion on system resources. All optimizations occur automatically in real-time.

While SSDkeeper provides the same core patented functionality and features as the latest Diskeeper® 16 for hard disk drives (minus defragmentation functions for hard disk drives only), the benefit to a solid-state drive is different than to a hard disk drive. Hard disk drives do not suffer from write amplification that reduces longevity. By eliminating excessively small writes, IntelliWrite goes beyond improved write performance but extends endurance as well.

Available in Professional and Server Editions

>SSDkeeper Professional for Windows PCs with SSD drives greatly enhances the performance of corporate laptops and desktops.

>SSDkeeper Server speeds physical server system performance of the most I/O intensive applications such as MS-SQL Server by 2X to 10X depending on the amount of idle, unused memory.  

>Options include Diskeeper Administrator management console to automate network deployment and management across hundreds or thousands of PCs or servers.  

>A free 30-day software trial download is available at http://www.condusiv.com/evaluation-software/

>Now available for purchase on our online store:  http://www.condusiv.com/purchase/SSDKeeper/

 

How Can I/O Reduction Software Guarantee to Solve the Toughest Performance Problems?

by Brian Morin 14. January 2017 01:00

The #1 request I’ve been getting from customers is a white board video that succinctly explains the two silent killers of VM performance and how our I/O reduction guarantees to solve performance problems, so applications run perfectly on every Windows server.

Expensive backend storage upgrades should ONLY take place when needing more capacity – not more performance. Anytime I tell someone our I/O reduction software guarantees to solve their toughest performance problems…the very first response is invariably the same…HOW? Not only have I answered this question hundreds of times, our own customers find themselves answering this question repeatedly to other team members or new hires.

To make this easier, I’ve answered it all here in this 10-min White Board Video ->, or you can continue reading.

 Most of us have been upgrading hardware to get more performance ever since we can remember. It’s become so engrained, it’s often times the ONLY approach we think of when needing a performance upgrade.

For many organizations, they don’t necessarily need a performance boost on EVERY application, but they need it on one or two I/O intensive applications. To throw a new all-flash array or new hybrid array at a performance problem ends up being the most expensive and disruptive way to solve a performance problem when all you have to do is the same thing thousands of our customers have done: simply try our I/O reduction software on any Windows server and watch the application run at least 50% faster and in many cases 2X-10X faster.

Most IT professionals are unaware of the fact that as great as virtualization has been for server efficiency, the one downside is how it adds complexity to the data path. On top of that, Windows doesn’t play well in a virtual environment (or any environment where it is abstracted from the physical layer). This means I/O characteristics that are a lot smaller, more fractured and more random than they need to be – the perfect trifecta for bad storage performance.

This “death by a thousand cuts” scenario means systems are processing workloads about 50% slower than they should. Condusiv’s I/O reduction software solves this problem by displacing many small tiny writes and reads with large, clean contiguous writes and reads. As huge as that patented engine is for our customers, it’s not the only thing we’re doing to make applications run smoothly. Performance is further electrified by establishing a tier-0 caching strategy - automatically using idle, available memory to serve hot reads. This is the same battle-tested technology that has been OEM’d by some of the largest out there – Dell, Lenovo, HP, SanDisk, Western Digital, just to name a few.

Although we might be most known for our first patented engine that solves Windows write inefficiencies to HDDs or SSDs, more and more customers are discovering just how important our patented DRAM caching engine is. If any customer can maintain even just 4GB of available memory to be used for cache, they most often see cache hit rates in the range of 50%. That means serving data out of DRAM, which is 15X faster than SSD and opens up even more precious bandwidth to and from storage for everything else. Other customers who really need to crank up performance are simply provisioning more memory on those systems and seeing >90% cache hit rates.

See all this and more described in the latest Condusiv I/O Reduction White Board video that explains eeevvvveeerything you need to know about the problem, how we solve it, and the typical results that should be expected in the time it takes you to drink a cup of coffee. So go get a cup of coffee, sit back, relax, and see how we can solve your toughest performance problems – guaranteed.

 

Everything You Need to Know about SSDs and Fragmentation in 5 Minutes

by Howard Butler 17. November 2016 05:42

When reading articles, blogs, and forums posted by well-respected (or at least well intentioned people) on the subject of fragmentation and SSDs, many make statements about how (1) SSDs don’t fragment, or (2) there’s no moving parts, so no problem, or (3) an SSD is so fast, why bother? We all know and agree SSDs shouldn’t be “defragmented” since that shortens lifespan, so is there a problem after all?

The truth of the matter is that applications running on Windows do not talk directly to the storage device.  Data is referenced as an abstracted layer of logical clusters rather than physical track/sectors or specific NAND-flash memory cells.  Before a storage unit (HDD or SSD) can be recognized by Windows, a file system must be prepared for the volume.  This takes place when the volume is formatted and in most cases is set with a 4KB cluster size.  The cluster size is the smallest unit of space that can be allocated.  Too large of a cluster size results in wasted space due to over allocation for the actual data needed.  Too small of a cluster size causes many file extents or fragments.  After formatting is complete and when a volume is first written to, most all of the free space is in just one or two very large sections.  Over the course of time as files of various sizes are written, modified, re-written, copied, and deleted, the size of individual sections of free space as seen from the NTFS logical file system point of view becomes smaller and smaller.  I have seen both HDD and SSD storage devices with over 3 million free space extents.  Since Windows lacks file size intelligence when writing a file, it never chooses the best allocation at the logical layer, only the next available – even if the next available is 4KB. That means 128K worth of data could wind up with 32 extents or fragments, each being 4KB in size. Therefore SSDs do fragment at the logical Windows NTFS file system level.  This happens not as a function of the storage media, but of the design of the file system.

Let’s examine how this impacts performance.  Each extent of a file requires its own separate I/O request. In the example above, that means 32 I/O operations for a file that could have taken a single I/O if Windows was smarter about managing free space and finding the best logical clusters instead of the next available. Since I/O takes a measurable amount of time to complete, the issue we’re talking about here related to SSDs has to do with an I/O overhead issue.

Even with no moving parts and multi-channel I/O capability, the more I/O requests needed to complete a given workload, the longer it is going to take your SSD to access the data.  This performance loss occurs on initial file creation and carries forward with each subsequent read of the same data.  But wait… the performance loss doesn’t stop there.  Once data is written to a memory cell on an SSD and later the file space is marked for deletion, it must first be erased before new data can be written to that memory cell.  This is a rather time consuming process and individual memory cells cannot be individually erased, but instead a group of adjacent memory cells (referred to as a page) are processed together.  Unfortunately, some of those memory cells may still contain valuable data and this information must first be copied to a different set of memory cells before the memory cell page (group of memory cells) can be erased and made ready to accept the new data.  This is known as Write Amplification.  This is one of the reasons why writes are so much slower than reads on an SSD.  Another unique problem associated with SSDs is that each memory cell has a limited number of times that a memory cell can be written to before that memory cell is no longer usable.  When too many memory cells are considered invalid the whole unit becomes unusable.  While TRIM, wear leveling technologies, and garbage collection routines have been developed to help with this behavior, they are not able to run in real-time and therefore are only playing catch-up instead of being focused on the kind of preventative measures that are needed the most.  In fact, these advanced technologies offered by SSD manufacturers (and within Windows) do not prevent or reverse the effects of file and free space fragmentation at the NTFS file system level.

The only way to eliminate this surplus of small, tiny writes and reads that (1) chew up performance and (2) shorten lifespan from all the wear and tear is by taking a preventative approach that makes Windows “smarter” about how it writes files and manages free space, so more payload is delivered with every I/O operation. That’s exactly why more users run Condusiv’s Diskeeper® (for physical servers and workstations) or V-locity® (for virtual servers) on systems with SSD storage. For anyone who questions how much value this approach adds to their systems, the easiest way to find out is by downloading a free 30-day trial and watch the “time saved” dashboard for yourself. Since the fastest I/O is the one you don’t have to write, Condusiv software understands exactly how much time is saved by eliminating multiple, fractured writes with fewer, larger contiguous writes. It even has an additional feature to cache reads from idle, available DRAM (15X faster than SSD), which further offloads I/O bandwidth to SSD storage. Especially for businesses with many users accessing a multitude of applications across hundreds or thousands of servers, the time savings are enormous.

 

ATTO Benchmark Results with and without Diskeeper 16 running on a 120GB Samsung SSD Pro 840. The read data caching shows a 10X improvement in read performance.

Top 5 Questions from V-locity and Diskeeper Customers

by Brian Morin 20. April 2016 05:00

After having chatted with 50+ customers the last three months, I’ve heard the same five questions enough times to turn it into a blog entry, and a lot of it has to do with flash:

 

1. Do Condusiv products still “defrag” like in the old days of Diskeeper?

No. Although users can use Diskeeper to manually defrag if they so choose, the core engines in Diskeeper and V-locity have nothing to do with defragmentation or physical disk management. The patented IntelliWrite® engine inside Diskeeper and V-locity adds a layer of intelligence into the Windows operating system enabling it improve the sequential nature of I/O traffic with large contiguous writes and subsequent reads, which improves performance benefit to both SSDs and HDDs. Since I/O is being streamlined at the point of origin, fragmentation is proactively eliminated from ever becoming an issue in the first place. Although SSDs should never be “defragged,” fragmentation prevention has enormous benefits. This means processing a single I/O to read or write a 64KB file instead of needing several I/O. This alleviates IOPS inflation of workloads to SSDs and cuts down on the number of erase cycles required to write any given file, improving write performance and extending flash reliability.

 

2. Why is it more important to solve Windows write inefficiencies in virtual environments regardless of flash or spindles on the backend? 

Windows write inefficiencies are a problem in physical environments but an even bigger problem in virtual environments due to the fact that multiple instances of the OS are sitting on the same host, creating a bottleneck or choke point that all I/O must funnel through. It’s bad enough if one virtual server is being taxed by Windows write inefficiencies and sending down twice as many I/O requests as it should to process any given workload…now amplify that same problem happening across all the VMs on the same host and there ends up being a tsunami of unnecessary I/O overwhelming the host and underlying storage subsystem. The performance penalty of all of this unnecessary I/O ends up getting further exacerbated by the “I/O Blender” that mixes and randomizes the I/O streams from all the VMs at the point of the hypervisor before sending out to storage a very random pattern, the exact type of pattern that chokes flash performance the most - random writes. V-locity’s IntelliWrite® engine writes files in a contiguous manner which significantly reduces the amount of I/O required to write/read any given file. In addition, IntelliMemory® caches reads from available DRAM. With both engines reducing I/O to storage, that means the usual requirement from storage to process 1GB via 80K I/O drops to 60K I/O at a minimum, but often down to 50K I/O or 40K I/O. This is why the typical V-locity customer sees anywhere from 50-100% more throughput regardless of flash or spindles on the backend because all the optimization is occurring where I/O originates.

VMware’s own “vSphere Monitoring and Performance Guide” calls for “defragmentation of the file system on all guests” as its top performance best practice tip behind adding more memory. When it comes to V-locity, nothing ever has to be “defragged” since fragmentation is proactively eliminated from ever becoming a problem in the first place.

 

3. How Does V-locity help with flash storage? 

One of the most common misnomers is that V-locity is the perfect complement to spindles, but not for flash. That misnomer couldn’t be further from the truth. The fact is, most V-locity customers run V-locity on top of a hybrid (flash & spindles) array or all-flash array. And this is because without V-locity, the underlying storage subsystem has to process at least 35% more I/O than necessary to process any given workload.

As much as virtualization has been great for server efficiency, the one downside is the complexity introduced to the data path, resulting in I/O characteristics that are much smaller, more fractured, and more random than it needs to be. This means flash storage systems are processing workloads 30-50% slower than they should because performance is suffering death-by-a-thousand cuts from all this small, tiny, random I/O that inflates IOPS and chews up throughput. V-locity streamlines I/O to be much more efficient, so twice as much data can be carried with each I/O operation. This significantly improves flash write performance and extends flash reliability with reduced erase cycles. In addition, V-locity establishes a tier-0 caching strategy using idle, available DRAM to cache reads. As little as 3GB of available memory drives an average of 40% reduction in response time (see source). By optimizing writes and reads, that means V-locity drives down the amount of I/O required to process any given workload. Instead of needing 80K I/O to process a GB of data, users typically only need 50K I/O or sometimes even less.

For more on how V-locity complements hybrid storage or all-flash storage, listen to the following OnDemand Webinar I did with a flash storage vendor (Nimble) and a mutual customer who uses hybrid storage + V-locity for a best-of-breed approach for I/O performance.

 

4. Is V-locity’s DRAM caching engine starving my applications of precious memory by caching? 

No. V-locity dynamically uses what Windows sees as available and throttles back if an application requires more memory, ensuring there is never an issue of resource contention or memory starvation. V-locity even keeps a buffer so there is never a latency issue in serving back memory. ESG Labs examined the last 3,500 VMs that tested V-locity and noted a 40% average reduction in response time (see source). This technology has been battle-tested over 5 years across millions of licenses with some of largest OEMs in the industry.

 

5. What is the difference between V-locity and Diskeeper? 

Diskeeper is for physical servers while V-locity is for virtual servers. Diskeeper is priced per OS instance while V-locity is now priced per host, meaning V-locity can be installed on any number of virtual servers on that host. Diskeeper Professional is for physical clients. The main feature difference is whereas Diskeeper keeps physical servers or clients running like new, V-locity accelerates applications by 50-300%. While both Diskeeper and V-locity solve Windows write inefficiencies at the point of origin where I/O is created, V-locity goes a step beyond by caching reads via idle, available DRAM for 50-300% faster application performance. Diskeeper customers who have virtualized can opt to convert their Diskeeper licenses to V-locity licenses to drive value to their virtualized infrastructure.

 

Stay tuned on the next major release of Diskeeper coming soon that may inherit similar functionality from V-locity.

Largest-Ever I/O Performance Study

by Brian Morin 28. January 2016 09:10

Over the last year, 2,654 IT Professionals took our industry-first I/O Performance Survey, which makes it the largest I/O performance survey of its kind. The key findings from the survey reveal an I/O performance struggle for virtualized organizations as 77% of all respondents indicated I/O performance issues after virtualizing. The full 17 page report is available for download at http://learn.condusiv.com/2015survey.html.

Key findings in the survey include:

- More than 1/3rd of respondents (36%) are currently experiencing staff or customer complaints regarding sluggish applications running on MS SQL or Oracle

- Nearly 1/3rd of respondents (28%) are so limited by I/O bottlenecks that they have reached an "I/O ceiling" and are unable to scale their virtualized infrastructure

- To improve I/O performance since virtualizing, 51% purchased a new SAN, 8% purchased PCIe flash cards, 17% purchased server-side SSDs, 27% purchased storage-side SSDs, 16% purchased more SAS spindles,       6% purchased a hyper-converged appliance

- In the coming year, to remediate I/O bottlenecks, 25% plan to purchase a new SAN, 8% plan to purchase a hyper-converged appliance, 10% will purchase SAS spindles, 16% will purchases server-side SSDs, 8% will   purchase PCIe flash cards, 27% will purchase storage-side SSDs, 35% will purchase nothing in the coming year

- Over 1,000 applications were named when asked to identify the top two most challenging applications to support from a systems performance standpoint. Everything in the top 10 was an application running on top of   a database

- 71% agree that improving the performance of one or two applications via inexpensive I/O reduction software to avoid a forklift upgrade is either important or urgent for their environment

As much as virtualization has provided cost-savings and improved efficiency at the server-level, those cost savings are typically traded-off for backend storage infrastructure upgrades to handle the new IOPS requirements from virtualized workloads. This is due to I/O characteristics that are much smaller, more fractured, and more random than they need to be.  The added complexity that virtualization introduces to the data path via the “I/O blender” effect that randomizes I/O from disparate VMs, and the amplification of Windows write inefficiencies at the logical disk layer erodes the relationship between I/O and data, generating a flood of small, fractured I/O. This compounding effect between the I/O blender and Windows write inefficiencies creates “death by a thousand cuts” regarding system performance, creating the perfect trifecta for poor performance – small, fractured, random I/O.

Since native virtualization out-of-the box does nothing to solve this problem, organizations are left with little choice but accept the loss of throughput from these inefficiencies and overbuy and overprovision for performance from an IOPS standpoint since they are twice as IOPS dependent than they actually need to be…except for Condusiv customers who are using V-locity® I/O reduction software to see 50-300% faster application performance on the hardware they already have by solving this root cause problem at the VM OS-layer.

Note - Respondents from companies with employee sizes under 100 employees were excluded from the results, so results would not be skewed by the low end of the SMB market.

Month List

Calendar

<<  July 2017  >>
MoTuWeThFrSaSu
262728293012
3456789
10111213141516
17181920212223
24252627282930
31123456

View posts in large calendar