Condusiv Technologies Blog

Condusiv Technologies Blog

Blogging @Condusiv

The Condusiv blog shares insight into the issues surrounding system and application performance—and how I/O optimization software is breaking new ground in solving those issues.

Inside SSDs 101

by Michael 31. December 2010 06:08

We have numerous partners and alliances in the  Solid State drive (SSD) space that we interact with regularly. Conversations in related meetings with those allies continues to revolve around the same issue, overcoming performance bottle necks at the storage level. In attacking this problem to get higher performance for things like boot times, application load times, etc., the industry has turned to flash memory, otherwise referred to SSDs (we'll also be announcing a brand new SSD technology in the next few weeks).

The following may be well known to those highly knowlegdable in SSDs, but hopefully helps others less versed in their design.

High end SSDs have proven to yield some very impressive read times, well over double a typical SATA hard disk drive in some cases.

Here are some example published speeds from a few manufacturers/models:

Seagate 7200.12 HDD 500GB, 750GB, and 1TB family

Read/Write speeds (outer tracks, empty drive): 125MB/sec sustained, 160MB/sec peak

Intel X25-M 80GB SSD (MLC)

Sequential Access - Read: up to 250MB/s

Sequential Access - Write: up to 70MB/s

Intel X25-M 120GB SSD (MLC)

Sequential Access - Read: up to 250MB/s

Sequential Access - Write: up to 100MB/s

Intel X25-E 32GB SSD (SLC)

Sequential Access - Read: up to 250 MB/s

Sequential Access - Write: up to 170 MB/s

One of the main reasons for such fast read times is the lack of “seek time” that an SSD has to find and retrieve a piece of data versus a hard drive.  Simply put, a hard drive has to move a magnetic head connected to an arm over a track on a platter and then through various means find the data requested and read or write something.

Now you have to admit, a hard drive does this quite well and very fast, considering the physics involved.

On the other hand an SSD sends an electrical pulse to read the data which is much faster in comparison; give or take on the order of magnitude, double on higher end SSDs. The lack of a moving part cuts the time down considerably.  

Now, writing data to SSDs is a whole other story which leads us down a bit of a rabbit hole so to speak and the main subject of this blog.

SSD DNA

To start with, let’s look at what an SSD is:

Note, I have borrowed some photos and descriptions from the following site: www.popsci.com/category/tags/ssd.

Firstly you simply have a small piece of silicon with a whole lot of transistors that look like the following:

  

Each transistor is 1000 times thinner than a human hair. In essence, each transistor either holds an electrical charge or it doesn’t.  In the case of SSDs, a transistor that is charged equals the value 0 and the ones that are not equal a value of 1. That is the extent of it. 

In the above photo the transistor that is charged has a circle around the “e” which stands for electrons representing the charge.   

Now, to read what’s inside these transistors an electrical pulse is sent to them and by reading the signal from the pulse sent it’s able to tell which are charged and which are not. All in all this is a pretty fast operation. Even writing to a transistor for the first time is pretty fast as it’s empty with no charge to begin with. 

BUT…… what happens when you have to write to an area of the SSD that has already been written to?  Here in lays the problem and the unfortunate “Achilles heel” for SSDs. With a hard drive you can just simply flip the bit in place and call it a day.  In fact this is a faster operation on a hard drive then it is on an SSD, over the long run. Now aside from the extra step it takes to erase an SSD it gets a lot worse. But in order to understand this we need to view how data is laid out on an SSD:

The smallest unit is a single transistor, also known as a cell. Imagine the dot shown is a top view of a single transistor magnified a gazillion times.

This single transister holds a single bit of data, i.e. a 1 or a 0.  The next unit size up is a called a page and holds 4KB of data.   To put this in perspective, there are 8,192 bits in one kilobyte so each page contains 32,768 transistors.   A page is the smallest unit that can be written to on the SSD.  This means that even if the data you are writing is only 1,500 bits in size, it will use up the entire 4KB of space and make it unusable for writing other data.  You only get to write once to it until its been erased for reuse again.

In fact, to update the data within this page, the data would have to be copied first, updated and rewritten to a new page leaving the old page unusable till it’s been erased.  The controller has to earmark it for clean up so it can be used again later.

Now, even though this page has been earmarked for being erased, it may not get erased for some time because it’s actually part of a bigger group of data called a “block”.  A block is a group of pages as illustrated below:

                  

 

The number of pages that make up a block may vary from one SSD model to another and can be very large (in the megabytes).  On one such SSD we have tested on, this happens to be 128k, which would be a group 32 pages.  This data block (32 pages) is actually what’s called an “erase block”.  An SSD only erases one data “block” at a time.  So back to our example of this page with old data in it: theoretically this page could sit around a while before this block of data is erased by the SSD.  It’s plausible this could have in some cases security of data issues but that’s a subject for research and test. 

Now, when data is updated in a page as discussed earlier it has to be moved to another location unbeknownst to the file system and internally mapped by the controller on the device to keep track of it as illustrated below:

There is obviously overhead associated with this. So a page, for argument sake, could have three states of existence:

Ready to be used (it’s erased or been erased thus all transistors have been set to 1) as in the clear little boxes above;

Used -- as in the blue boxes above;

Dirty (containing invalid data that needs to be erased) as in the black boxes above.  

The overhead in handling dirty data is huge, and referred to as garbage collection.  Garbage collection is the process of moving good data out of areas where old data exists in order to erase the old data in order to reclaim pages that can be written to again.

Doing a lot of this garbage collection activity creates a lot of activity called “write amplification”.

This is the disease that SSDs are plagued with which kills their write performance, particularly as the device fills up.  It’s also what shortens the life span of the device.

The following is a definition or description from Wikipedia that I think is pretty good:

“Write amplification (WA) is a phenomenon associated with Flash memory and solid-state drives (SSDs). Because Flash memory must be erased before it can be rewritten, the process to perform these operations results in moving (or rewriting) user data and metadata more than once. This multiplying effect increases the number of writes required over the life of the SSD which shortens the time it can reliably operate. The increased writes also consume bandwidth to the Flash memory which mainly reduces random write performance to the SSD. Many factors will affect the write amplification of an SSD, some can be controlled by the user and some are a direct result of the data written to and usage of the SSD.”

Now, there is a comment that is made in the above description that is interesting, and could lead to read performance in some cases getting degraded, which is the “increased writes” consuming bandwidth to the Flash memory interrupting a read operation.

Now, I don’t say this as a fact but rather postulating whether or not reads are affected. 

The overhead in writes required by the Windows NTFS file system to do just one IO could be considered extreme from the SSD’s point of view.  To create and write one bit of information, such as a notepad doc with a single number in it, requires an update to the MFT file by creating a record of the operation, update to a directory file and any other metadata files such as journal files that are keeping track of operations at the time. The point is, for every one write of user data there are several writes occurring to keep track of it. 

Current file systems were designed based on hard disk drive characteristics, not SSDs.  From the SSD’s point a view, NTFS writes with wild abandon. This puts a lot of overhead on the SSD controllers with a lot of overhead on data mapping and housekeeping at the controller level of the SSD which hard drives don’t have to worry about much.  A hard drive typically only has to re-map data when it has a bad sector on it as an example.

NTFS file system thinks for example that a file should be split in two (split IOs) because it thinks it doesn’t have a contiguous free space for the file that is being updated.  Yet the SSD may have, during its garbage collection process created a space or remapped data clusters but the operating system doesn’t know this and vice versa. 

The current TRIM functionality is supposed to help SSDs within Windows 7 but it's far from being a panacea to the write amplification issue. 

Different types of SSD

SLC stands for “single level cell”

MLC stands for “multi level cell”

TLC stands for “tender loving care". okay not really, just checking to make sure you are paying attention. It really stands for" tri level cell”; really. 

SLC is faster than MLC and TLC. 

The design of an MLC and an SLC are pretty much the same.  The difference is an MLC is able to put more than one value in a single transistor (referred to as a cell) by layering the data within the cell.  Typically two or more bits are able to be placed in a single cell with MLC versus one bit in SLC. 

So MLC is able to contain twice as much data versus SLC.  That’s the plus side.  The down side is MLC in reading and writing data to a single cell has to be very precise and has a lot of overhead logic wise, so MLC ends up slower than SLC due to the preciseness that is required to determine if two values exist and what those values are in a single cell. Also the life cycle of MLC becomes 10x shorter than SLC.  The following is a great white paper that describes the differences very well and how voltages are used to read the values, etc. http://www.supertalent.com/datasheets/SLC_vs_MLC%20whitepaper.pdf

The difference between TLC and MLC is NOT more transistors.  L stands for "level", referring to voltage level to a transistor, not multiple levels or numbers of transistors.  Again the above link along with a few other sites lays this out fairly well. 

The difference between one flash memory or SSD is not so much the chip itself but the supporting controller and its design.  This includes creating multiple data channels, on-board Raid and other fancy and expensive solutions.  The other differentiator is software within the controller managing all of the mapping of data, moving data around, etc.  Sounds like the age-old fragmentation problem to me again; just at a slightly different level.

New Diskeeper 2010 update (14.0.913)

by Michael 30. December 2010 10:37

 There's a new build of Diskeeper 2010 available that includes the following fixes:

  • Performance improvements in InvisiTasking related to 8+ core systems and eliminating unnecessary context switching.
  • Fixed I-FAAST measurement that would fail on some busy systems and consequently keep getting rescheduled.
  • Improved defragmentation process to avoid keeping extremely fragmented files open for long periods of time.
  • Fixed problem with occasionally incorrectly detecting the Volume Shadow Copy state.
  • Improved the SSD detection code to correctly detect certain types of SSDs.
  • Fixed an installation problem with the PATH variable longer than 1024 characters.

Tags: , , , , , ,

Diskeeper

Defragmenting IT Healthcare

by Michael 20. December 2010 05:18

Joe Marion is founder and Principal of Healthcare Integration Strategies, specializing in the integration of imaging technologies with the overall healthcare IT landscape. His blog (at Healthcare Informatics) covers challenges and opportunities specifically relevant to optimizing Healthcare IT initiatives.

Medical images are a significant percentage of the the world's storage requirements, and have been predicted to encompass an even greater percentage of future storage demand. In Joe's recent blog post he posed the question "Is Defragmentation a Boon to Healthcare IT Performance?"

In his post he includes personal observations and insight into performance implications fragmentation can incur on IT as healthcare departments themselves consolidate and standardize application use:

"With departmental solutions, there very likely was less emphasis on system tools such as defragmentation applications.  Now that PACS technology is becoming more intertwined with the rest of IT, there should be greater emphasis on inclusion of these tools.  In addition, server virtualization can mean that previously independent applications are now part of a virtual server farm."

He also makes the astute observation that centralizing computing and storage magnifies bottlenecks, making a solution such as defragmentation increasingly more vital:

"The addition of disk-intensive applications such as speech recognition and imaging could potentially impact the overall performance of these applications.  As data storage requirements within healthcare grow, the problem will potentially get worse.  Think of the consequence of managing multiple 3000-slice CT studies and performing multiple 3D analyses.  As more advanced visualization applications go the client-server route, the performance of a central server doing the 3D processing could be significantly impacted."

You can read Joe's blog here.

  

Tags: , , ,

Defrag | Diskeeper | IntelliWrite | V-Locity

Storage VMotion and GOS fragmentation

by Michael 3. December 2010 06:57

I had a test run here internally in order to make a point about what does, or more specifically "does not", happen when you VMotion/SVMotion a Windows Guest OS (GOS). We wanted to demonstrate that, while VMware is copying the VM to another host/storage, it does nothing about the internal fragmentation of files in Windows.

We felt this was a valuable demonstration as one of the old (1980s) ways to "fix" fragmentation was to copy off the files/backup, reformat the volume, and then copy back/restore. This offered a degree of success, but required taking the data offline in order to get rid of most of the fragmentation. On a side note, backing up/copying fragmented files takes a lot longer than it would on contiguous and ordered files.

Anyway, S/VMotion is such a cool feature because it works on live VMs. So, if the VMDK movement somehow did align/reorder files in Windows, it could be a great solution to Windows file system fragmentation! So here's how we tested...

1. Setup 2 ESX 4.1 Servers with iSCSI storage and vCenter with SVMotion capability.

2. Create a VM with Windows 7 in one of the ESX Server storage (Ex: Storage1) and a 20 GB Thin virtual disk.

3. Using an internal tool, create moderate fragmentation on the virtual disk (80k fragments, average fragments per file around 3.0, around 50% free space).

4. Install V-locity with all features (e.g. defrag, IntelliWrite, etc...) disabled. This is just so we can run a fragmentation analysis and save the reports.

5. Save the "Before SVMotion" analysis report, and then stop V-locity Windows Service (to make sure it is entirely inactive).

6. Using SVMotion move the live VM to the other ESX Server storage (Ex: Storage2).

7. Once the move is completed, restart the V-locity Windows Service and perform a post "After SVMotion" analysis.

8. Save this job report.

We saw what we expected, given VMotion leverages Changed Block Tracking (CBT) technology and is block, not file based. I attached the report, so you can see the side-by-side analysis data, files in Windows are not defragmented in an SVMotion. Now, that's not to say possible fragmentation of the VMDK files themsleves (on VMFS datastores) was not affected, but that's a topic for another post. 

Defrag software: What every little boy and girl wants under the tree...right?

by Michael 1. December 2010 06:05

Apparently some of the staff at our UK office have too much time on their hands, and filmed a fun holiday short.

 

Click on the images (above or below) to see Santa and a 6'7" elf and friends defragment X-Mas, load up the sleigh, and then celebrate with some kind of dance-type-thing (I don't know what else I would call it). 

Happy Holidays!

Tags: ,

Diskeeper TV

Month List

Calendar

<<  May 2017  >>
MoTuWeThFrSaSu
24252627282930
1234567
891011121314
15161718192021
22232425262728
2930311234

View posts in large calendar