Condusiv Technologies Blog

Condusiv Technologies Blog

Blogging @Condusiv

The Condusiv blog shares insight into the issues surrounding system and application performance—and how I/O optimization software is breaking new ground in solving those issues.

V-locity 6.0 Solves Death by a Thousand Cuts in Virtual Environments

by Brian Morin 12. August 2015 08:04

If you haven’t already heard the pre-announcement buzz on V-locity® 6.0 I/O reduction software that made a splash in the press, it’s being released in a couple weeks. To understand why it’s significant and why it’s an unprecedented 3X FASTER than its predecessor is to understand the biggest factor that dampens application performance the most in virtual environments - the problem of increasingly smaller, fractured, and random I/O. That kind of I/O profile is akin to pouring molasses on compute and storage systems. Processing I/O with those characteristics makes systems work much harder than necessary to process any given workload. Virtualized organizations stymied by sluggish performance related to their most I/O intensive applications suffer in large part to a problem that we call “death by a thousand cuts” – I/O that is smaller, more fractured, and more random than it needs to be.

Organizations tend to overlook solving the problem and reactively attempt to mask the problem with more spindles or flash or a forklift storage upgrade. Unfortunately, this approach wastes much of any new investment in flash since optimal performance is being robbed by I/O inefficiencies at the Windows OS layer and also at the hypervisor layer.

V-locity® version 6 has been built from the ground-up to help organizations solve their toughest application performance challenges without new hardware. This is accomplished by optimizing the I/O profile for greater throughput while also targeting the smallest, random I/O that is cached from available DRAM to reduce latency and rid the infrastructure of the kind of I/O that penalizes performance the most.

Although much is made about V-locity’s patented IntelliWrite® engine that increases I/O density and sequentializes writes, special attention was put into V-locity’s DRAM read caching engine (IntelliMemory®) that is now 3X more efficient in version 6 due to changes in the behavioral analytics engine that focuses on "caching effectiveness" instead of "cache hits.”

Leveraging available server-side DRAM for caching is very different than leveraging a dedicated flash resource for cache whether that be PCI-e or SSD. Although DRAM isn’t capacity intensive, it is exponentially faster than a PCI-e or SSD cache sitting below it, which makes it the ideal tier for the first caching tier in the infrastructure. The trick is in knowing how to best use a capacity-limited but blazing fast storage medium.

Commodity algorithms that simply look at characteristics like access frequency might work for  capacity intensive caches, but it doesn’t work for DRAM. V-locity 6.0 determines the best use of DRAM for caching purposes by collecting data on a wide range of data points (storage access, frequency, I/O priority, process priority, types of I/O, nature of I/O (sequential or random), time between I/Os) - then leverages its analytics engine to identify which storage blocks will benefit the most from caching, which also reduces "cache churn" and the repeated recycling of cache blocks. By prioritizing the smallest, random I/O to be served from DRAM, V-locity eliminates the most performance robbing I/O from traversing the infrastructure. Administrators don’t need to be concerned about carving out precious DRAM for caching purposes as V-locity dynamically leverages available DRAM. With a mere 4GB of RAM per VM, we’ve seen gains from 50% to well over 600%, depending on the I/O profile.

With V-locity 5, we examined data from 2576 systems that tested V-locity and shared their before/after data with Condusiv servers. From that raw data, we verified that 43% of all systems experienced greater than 50% reduction in latency on reads due to IntelliMemory. While that’s a significant number in its own right by simply using available DRAM, we can’t wait to see how that number jumps significantly for our customers with V-locity 6.

Internal Iometer tests reveal that the latest version of IntelliMemory in V-locity 6.0 is 3.6X faster when processing 4K blocks and 2.0X faster when processing 64K blocks.

Jim Miller, Senior Analyst, Enterprise Management Associates had this to say, "V-locity version 6.0 makes a very compelling argument for server-side DRAM caching by targeting small, random I/O - the culprit that dampens performance the most. This approach helps organizations improve business productivity by better utilizing the available DRAM they already have. However, considering the price evolution of DRAM, its speed, and proximity to the processor, some organizations may want to add additional memory for caching if they have data sets hungry for otherworldly performance gains."

Finally, one of our customers, Rich Reitenauer, Manager of Infrastructure Management and Support, Alvernia University, had this to say, "Typical IT administrators respond to application performance issues by reactively throwing more expensive server and storage hardware at them, without understanding what the real problem is. Higher education budgets can't afford that kind of brute-force approach. By trying V-locity I/O reduction software first, we were able to double the performance of our LMS app sitting on SQL, stop all complaints about performance, stop the application from timing out on students, and avoid an expensive forklift hardware upgrade."

For more on the I/O Inefficiencies that V-locity solves, read Storage Switzerland’s Briefing on V-locity 6.0 ->

Inside SSDs 101

by Michael 31. December 2010 06:08

We have numerous partners and alliances in the  Solid State drive (SSD) space that we interact with regularly. Conversations in related meetings with those allies continues to revolve around the same issue, overcoming performance bottle necks at the storage level. In attacking this problem to get higher performance for things like boot times, application load times, etc., the industry has turned to flash memory, otherwise referred to SSDs (we'll also be announcing a brand new SSD technology in the next few weeks).

The following may be well known to those highly knowlegdable in SSDs, but hopefully helps others less versed in their design.

High end SSDs have proven to yield some very impressive read times, well over double a typical SATA hard disk drive in some cases.

Here are some example published speeds from a few manufacturers/models:

Seagate 7200.12 HDD 500GB, 750GB, and 1TB family

Read/Write speeds (outer tracks, empty drive): 125MB/sec sustained, 160MB/sec peak

Intel X25-M 80GB SSD (MLC)

Sequential Access - Read: up to 250MB/s

Sequential Access - Write: up to 70MB/s

Intel X25-M 120GB SSD (MLC)

Sequential Access - Read: up to 250MB/s

Sequential Access - Write: up to 100MB/s

Intel X25-E 32GB SSD (SLC)

Sequential Access - Read: up to 250 MB/s

Sequential Access - Write: up to 170 MB/s

One of the main reasons for such fast read times is the lack of “seek time” that an SSD has to find and retrieve a piece of data versus a hard drive.  Simply put, a hard drive has to move a magnetic head connected to an arm over a track on a platter and then through various means find the data requested and read or write something.

Now you have to admit, a hard drive does this quite well and very fast, considering the physics involved.

On the other hand an SSD sends an electrical pulse to read the data which is much faster in comparison; give or take on the order of magnitude, double on higher end SSDs. The lack of a moving part cuts the time down considerably.  

Now, writing data to SSDs is a whole other story which leads us down a bit of a rabbit hole so to speak and the main subject of this blog.

SSD DNA

To start with, let’s look at what an SSD is:

Note, I have borrowed some photos and descriptions from the following site: www.popsci.com/category/tags/ssd.

Firstly you simply have a small piece of silicon with a whole lot of transistors that look like the following:

  

Each transistor is 1000 times thinner than a human hair. In essence, each transistor either holds an electrical charge or it doesn’t.  In the case of SSDs, a transistor that is charged equals the value 0 and the ones that are not equal a value of 1. That is the extent of it. 

In the above photo the transistor that is charged has a circle around the “e” which stands for electrons representing the charge.   

Now, to read what’s inside these transistors an electrical pulse is sent to them and by reading the signal from the pulse sent it’s able to tell which are charged and which are not. All in all this is a pretty fast operation. Even writing to a transistor for the first time is pretty fast as it’s empty with no charge to begin with. 

BUT…… what happens when you have to write to an area of the SSD that has already been written to?  Here in lays the problem and the unfortunate “Achilles heel” for SSDs. With a hard drive you can just simply flip the bit in place and call it a day.  In fact this is a faster operation on a hard drive then it is on an SSD, over the long run. Now aside from the extra step it takes to erase an SSD it gets a lot worse. But in order to understand this we need to view how data is laid out on an SSD:

The smallest unit is a single transistor, also known as a cell. Imagine the dot shown is a top view of a single transistor magnified a gazillion times.

This single transister holds a single bit of data, i.e. a 1 or a 0.  The next unit size up is a called a page and holds 4KB of data.   To put this in perspective, there are 8,192 bits in one kilobyte so each page contains 32,768 transistors.   A page is the smallest unit that can be written to on the SSD.  This means that even if the data you are writing is only 1,500 bits in size, it will use up the entire 4KB of space and make it unusable for writing other data.  You only get to write once to it until its been erased for reuse again.

In fact, to update the data within this page, the data would have to be copied first, updated and rewritten to a new page leaving the old page unusable till it’s been erased.  The controller has to earmark it for clean up so it can be used again later.

Now, even though this page has been earmarked for being erased, it may not get erased for some time because it’s actually part of a bigger group of data called a “block”.  A block is a group of pages as illustrated below:

                  

 

The number of pages that make up a block may vary from one SSD model to another and can be very large (in the megabytes).  On one such SSD we have tested on, this happens to be 128k, which would be a group 32 pages.  This data block (32 pages) is actually what’s called an “erase block”.  An SSD only erases one data “block” at a time.  So back to our example of this page with old data in it: theoretically this page could sit around a while before this block of data is erased by the SSD.  It’s plausible this could have in some cases security of data issues but that’s a subject for research and test. 

Now, when data is updated in a page as discussed earlier it has to be moved to another location unbeknownst to the file system and internally mapped by the controller on the device to keep track of it as illustrated below:

There is obviously overhead associated with this. So a page, for argument sake, could have three states of existence:

Ready to be used (it’s erased or been erased thus all transistors have been set to 1) as in the clear little boxes above;

Used -- as in the blue boxes above;

Dirty (containing invalid data that needs to be erased) as in the black boxes above.  

The overhead in handling dirty data is huge, and referred to as garbage collection.  Garbage collection is the process of moving good data out of areas where old data exists in order to erase the old data in order to reclaim pages that can be written to again.

Doing a lot of this garbage collection activity creates a lot of activity called “write amplification”.

This is the disease that SSDs are plagued with which kills their write performance, particularly as the device fills up.  It’s also what shortens the life span of the device.

The following is a definition or description from Wikipedia that I think is pretty good:

“Write amplification (WA) is a phenomenon associated with Flash memory and solid-state drives (SSDs). Because Flash memory must be erased before it can be rewritten, the process to perform these operations results in moving (or rewriting) user data and metadata more than once. This multiplying effect increases the number of writes required over the life of the SSD which shortens the time it can reliably operate. The increased writes also consume bandwidth to the Flash memory which mainly reduces random write performance to the SSD. Many factors will affect the write amplification of an SSD, some can be controlled by the user and some are a direct result of the data written to and usage of the SSD.”

Now, there is a comment that is made in the above description that is interesting, and could lead to read performance in some cases getting degraded, which is the “increased writes” consuming bandwidth to the Flash memory interrupting a read operation.

Now, I don’t say this as a fact but rather postulating whether or not reads are affected. 

The overhead in writes required by the Windows NTFS file system to do just one IO could be considered extreme from the SSD’s point of view.  To create and write one bit of information, such as a notepad doc with a single number in it, requires an update to the MFT file by creating a record of the operation, update to a directory file and any other metadata files such as journal files that are keeping track of operations at the time. The point is, for every one write of user data there are several writes occurring to keep track of it. 

Current file systems were designed based on hard disk drive characteristics, not SSDs.  From the SSD’s point a view, NTFS writes with wild abandon. This puts a lot of overhead on the SSD controllers with a lot of overhead on data mapping and housekeeping at the controller level of the SSD which hard drives don’t have to worry about much.  A hard drive typically only has to re-map data when it has a bad sector on it as an example.

NTFS file system thinks for example that a file should be split in two (split IOs) because it thinks it doesn’t have a contiguous free space for the file that is being updated.  Yet the SSD may have, during its garbage collection process created a space or remapped data clusters but the operating system doesn’t know this and vice versa. 

The current TRIM functionality is supposed to help SSDs within Windows 7 but it's far from being a panacea to the write amplification issue. 

Different types of SSD

SLC stands for “single level cell”

MLC stands for “multi level cell”

TLC stands for “tender loving care". okay not really, just checking to make sure you are paying attention. It really stands for" tri level cell”; really. 

SLC is faster than MLC and TLC. 

The design of an MLC and an SLC are pretty much the same.  The difference is an MLC is able to put more than one value in a single transistor (referred to as a cell) by layering the data within the cell.  Typically two or more bits are able to be placed in a single cell with MLC versus one bit in SLC. 

So MLC is able to contain twice as much data versus SLC.  That’s the plus side.  The down side is MLC in reading and writing data to a single cell has to be very precise and has a lot of overhead logic wise, so MLC ends up slower than SLC due to the preciseness that is required to determine if two values exist and what those values are in a single cell. Also the life cycle of MLC becomes 10x shorter than SLC.  The following is a great white paper that describes the differences very well and how voltages are used to read the values, etc. http://www.supertalent.com/datasheets/SLC_vs_MLC%20whitepaper.pdf

The difference between TLC and MLC is NOT more transistors.  L stands for "level", referring to voltage level to a transistor, not multiple levels or numbers of transistors.  Again the above link along with a few other sites lays this out fairly well. 

The difference between one flash memory or SSD is not so much the chip itself but the supporting controller and its design.  This includes creating multiple data channels, on-board Raid and other fancy and expensive solutions.  The other differentiator is software within the controller managing all of the mapping of data, moving data around, etc.  Sounds like the age-old fragmentation problem to me again; just at a slightly different level.

RecentComments

Comment RSS

Month List

Calendar

<<  August 2019  >>
MoTuWeThFrSaSu
2930311234
567891011
12131415161718
19202122232425
2627282930311
2345678

View posts in large calendar