Condusiv Technologies Blog

Condusiv Technologies Blog

Blogging @Condusiv

The Condusiv blog shares insight into the issues surrounding system and application performance—and how I/O optimization software is breaking new ground in solving those issues.

Do you need to defragment a Mac?

by Michael 2. February 2011 05:54

The purpose of this blog post is to provide some data about fragmentation on the Mac, that I've not seen researched/published elsewhere.

Mac OSX has a defragmenter in the file system itself. Given Mac is open-source, we looked at the code.

During a file open the files get defragmented if the following conditions are met:

1. The file is less than 20MB in size

2. There are more than 7 fragments

3. System has been up for more than 3 minutes

4. A regular file

5. File system is journaled

6. And the file system is not read-only.

So what's Apple's take on the subject? An Apple technical article states this:

Do I need to optimize?

You probably won't need to optimize at all if you use Mac OS X. Here's why:

  • Hard disk capacity is generally much greater now than a few years ago. With more free space available, the file system doesn't need to fill up every "nook and cranny." Mac OS Extended formatting (HFS Plus) avoids reusing space from deleted files as much as possible, to avoid prematurely filling small areas of recently-freed space.
  • Mac OS X 10.2 and later includes delayed allocation for Mac OS X Extended-formatted volumes. This allows a number of small allocations to be combined into a single large allocation in one area of the disk.
  • Fragmentation was often caused by continually appending data to existing files, especially with resource forks. With faster hard drives and better caching, as well as the new application packaging format, many applications simply rewrite the entire file each time. Mac OS X 10.3 Panther can also automatically defragment such slow-growing files. This process is sometimes known as "Hot-File-Adaptive-Clustering."
  • Aggressive read-ahead and write-behind caching means that minor fragmentation has less effect on perceived system performance.

For these reasons, there is little benefit to defragmenting.

Note: Mac OS X systems use hundreds of thousands of small files, many of which are rarely accessed. Optimizing them can be a major effort for very little practical gain. There is also a chance that one of the files placed in the "hot band" for rapid reads during system startup might be moved during defragmentation, which would decrease performance.

If your disks are almost full, and you often modify or create large files (such as editing video, but see the Tip below if you use iMovie and Mac OS X 10.3), there's a chance the disks could be fragmented. In this case, you might benefit from defragmentation, which can be performed with some third-party disk utilities. 
 

Here is my take on that information:

While I have no problem with the lead-in which states probably, the reasons are theoretical. Expressing theory and then an opinion on that theory is fine, so long as you properly indicate it is an opinion. The problem I do have with this is the last sentence before the notation, "For these reasons, there is little benefit to defragmenting.", or more clearly; passing off theory as fact.

Theory, and therefore "reasons" need to be substantiated by actual scientific processes that apply the theory and then either validate or invalidate it. Common examples we hear of theory-as-fact are statements like "SSDs don't have moving parts and don't need to be defragmented". Given our primary business is large enterprise corporations, we hear a lot of theory about the need (or lack thereof) of defragmenting complex and expensive storage systems. In all those cases, testing proves fragmentation (files, free space or both) slows computers down. The reasons sound logical, which dupes readers/listeners into believing the statements are true.

On that note, while the first three are logical, the last "reason" is most likely wrong. Block-based read-ahead caching is predicated on files being sequentially located/interleaved on the same disk "tracks". File-based read-ahead would still have to issue additional I/Os due to fragmentation. Fragmentation of data essentially breaks read-ahead efforts. Could the Mac be predicting file access and pre-loading files into memory well in advance of use, sure. If that's the case I could agree with the last point (i.e. "perceived system performance), but I find this unlikely (anyone reading this is welcome to comment).

They do also qualify the reason by stating "minor fragmentation", to which I would add that that minor fragmentation on Windows may not have "perceived" impact either.

I do agree with the final statement that states "you might benefit from defragmentation" when using large files, although I think might is too indecisive.

Where my opinion comes from:

A few years ago (spring/summer of 2009) we did a research project to understand how much fragmentation existed on Apple Macs. We wrote and sent out a fragmentation/performance analysis tool to select customers who also had Macs at their homes/businesses. We collected data from 198 volumes on 82 Macs (OSX 10.4.x & 10.5.x). 30 of those systems were in use between 1 – 2 years. 

                               

While system specifics are confidential (testers provided us the data under non-disclosure agreements) we found that free space fragmentation was particularly bad in many cases (worse than Windows). We also found an average of a little over 26,000 fragments per Mac, with an average expected performance gain from defrag of about 8%.Our research also found that the more severe cases of fragmentation, where we saw 70k/100k+ fragments, were low on available free space (substantiating that last paragraph in the Apple tech article).

This article also provide some fragmentation studies as well as performance tests. Their data also validates Apple's last paragraph and makes the "might benefit" statement a bit understated.

Your Mileage May Vary (YMMV): 

So, in summary I would recommend defragmenting your Mac. As with Windows, the benefit from defragmenting is proportionate to the amount of fragmentation. Defrag will help. The question is "does defrag help enough to spend the time and money?". The good thing is most Mac defragmenters, just like Diskeeper for Windows, have free demo versions you can trial to see if its worth spending money.

 

Here are some options: 

+ iDefrag (used by a former Diskeeper Corp employee who did graphic design on a Mac)

+ Drive Genius suite (a company we have spoken with in the past)

+ Stellar Drive Defrag (relatively new)

Perhaps this article begs the question/rumor "will there be a Diskeeper for Mac?", to which I would answer "unlikely, but not impossible". The reason is that we already have a very full development schedule with opportunities in other areas that we plan to pursue.

We are keeping an "i" on it though ;-).

Is it wrong/unsafe to defrag an SSD?

by Michael 18. January 2011 08:09

Last week I received an email via the blog that I thought would be good to publish. Graham, a Diskeeper user from the UK asked: "I have been advised that it is wrong to defrag an SSD hard drive. So is it safe to run Diskeeper now that I have a 128Gb ssd in my computer?"

The popular theory that “there are no moving parts” does not accurately lead to the conclusion that fragmentation is not an issue. There is more behind the negative impact of fragmentation than seek time and rotational latency on electro-mechanical drives. Most SSDs suffer from free space fragmentation due to inherent NAND flash limitations. In more severe cases (likely a server issue) the OS overhead from fragmentation is impacted as well.

As always, the “proof is in the pudding”. Tests conclusively show you can regain lost performance by optimizing the file system (in Windows). We have run and published numerous tests (and here -done with Microsoft), but so have many in various tech forums (if you would prefer independent reviews).  

In short, it is advisable to run an occasional consolidation of free space. The frequency you would want to run this depends on how active (writing and deleting files) the system using the SSD is. It also depends on the SSD. A latest gen 128GB SSD from a reputable vendor is going to be all-around better than a 16-32GB SSD from 2-3 years ago.

The HyperFast product (a $10 add–on to Diskeeper) is designed to consolidate free space when it is needed, without “over” doing it. HyperFast is unique as you do not ever need to manually analyze or manually run, or even schedule it. It is smart enough to automatically know what to do and when. A common concern is that defragmentation can wear out an SSD. While that is unlikely unless it is a poorly written defragmenter, the general premise is correct, and is also something HyperFast takes into consideration by design.

Abov pic: You can always add HyperFast anytime after your purchase of Diskeeper.

 

More reading: 

Here are a few blogs we have done on SSDs.

While a bit dated, here is one product review.

Tags:

SSD, Solid State, Flash

Flash Cache Technology: ExpressCache

by Michael 13. January 2011 04:35

ExpressCache is a new, presently shipping, OEM product that we announced at the CES trade show last week.

ExpressCache is a software storage performance solution that requires both an HDD and an SSD (smaller affordables drives are all that is needed). In short, ExpressCache effectively combines the two drives, greatly increasing performance (typically by a factor of 2 or 3) by caching the frequently used data on the SSD!

More Info:

You can read more about ExpressCache here.

Future possibilities:

There are plans to broaden the supported platforms and potentially expand the availability of the product beyond OEMs. The time frame for that is undecided, but it may be later this year. We'll certainly announce any such events on our website and on the blog, so stay tuned.

Tags:

SSD, Solid State, Flash

Diskeeper 2011 outmuscles fragmentation

by Colleen Toumayan 12. January 2011 09:29

Bruce Pechman the "Muscleman of Technology" stopped by out booth at a CES media event. We talked a bit about the upcoming new Diskeeper release and he was kind enough to express his enthusiasm to us in writing so we could publish it:

As a TV Technology Journalist, high-end computer enthusiast, hard core gamer, and a person who is consistently buying the fastest computer money can buy, I need my computers to have super fast performance without any bottle necks every minute of every day.

From expensive SSD’s, to traditional rotational Hard Disk Drives—in every combination you can think of, my computers always run flawlessly and speedily thanks to an awesome software program called Diskeeper!

Right now my primary computer is the “Maingear Shift” with an Intel i980 Extreme Processor overclocked to 4.3 Ghz, it’s almost $6,000 and I can’t tolerate any slowdowns from my dual Western Digital 10,000 RPM VelociRaptors hard drives.

The facts are really quite simple. All types of computers will experience disk fragmentation and it can definitely worsen over time if nothing is done to prevent it. Disk fragmentation is the culprit for many annoying computer symptoms such as slow application performance, long boot times, unexplained system slow downs, crashes, etc. Diskeeper prevents these issues from cropping up.I have been religiously using Diskeeper over the years so my computers can realize the full benefits of system performance and reliability...and I can’t wait to install the new Diskeeper 2011—just set it and “forget about it”!

Whether I’m on deadline, overclocking an Intel Extreme processor, or playing Mafia II with every performance benchmark set to maximum settings, Diskeeper is a must install on every computer I own. 

Bruce Pechman, The Muscleman of Technology (www.MrBicep.com) America’s Best-Built Technology & Fitness Television Personality Muscleman of Technology, LLC Bruce Pechman, The Muscleman of Technology® (Muscleman of Technology, LLC), is a Consumer Technology and Fitness Personality appearing regularly on “Good Morning San Diego” (KUSI-TV) and the KTLA Morning Show in Hollywood. Bruce’s moniker reflects his 20 years of industry technology expertise, and 30 years of fitness training. He has made over 250 live TV appearances on major network television. Visit him at www.mrbicep.com 

Tags:

Defrag | Diskeeper | SSD, Solid State, Flash

Inside SSDs 101

by Michael 31. December 2010 06:08

We have numerous partners and alliances in the  Solid State drive (SSD) space that we interact with regularly. Conversations in related meetings with those allies continues to revolve around the same issue, overcoming performance bottle necks at the storage level. In attacking this problem to get higher performance for things like boot times, application load times, etc., the industry has turned to flash memory, otherwise referred to SSDs (we'll also be announcing a brand new SSD technology in the next few weeks).

The following may be well known to those highly knowlegdable in SSDs, but hopefully helps others less versed in their design.

High end SSDs have proven to yield some very impressive read times, well over double a typical SATA hard disk drive in some cases.

Here are some example published speeds from a few manufacturers/models:

Seagate 7200.12 HDD 500GB, 750GB, and 1TB family

Read/Write speeds (outer tracks, empty drive): 125MB/sec sustained, 160MB/sec peak

Intel X25-M 80GB SSD (MLC)

Sequential Access - Read: up to 250MB/s

Sequential Access - Write: up to 70MB/s

Intel X25-M 120GB SSD (MLC)

Sequential Access - Read: up to 250MB/s

Sequential Access - Write: up to 100MB/s

Intel X25-E 32GB SSD (SLC)

Sequential Access - Read: up to 250 MB/s

Sequential Access - Write: up to 170 MB/s

One of the main reasons for such fast read times is the lack of “seek time” that an SSD has to find and retrieve a piece of data versus a hard drive.  Simply put, a hard drive has to move a magnetic head connected to an arm over a track on a platter and then through various means find the data requested and read or write something.

Now you have to admit, a hard drive does this quite well and very fast, considering the physics involved.

On the other hand an SSD sends an electrical pulse to read the data which is much faster in comparison; give or take on the order of magnitude, double on higher end SSDs. The lack of a moving part cuts the time down considerably.  

Now, writing data to SSDs is a whole other story which leads us down a bit of a rabbit hole so to speak and the main subject of this blog.

SSD DNA

To start with, let’s look at what an SSD is:

Note, I have borrowed some photos and descriptions from the following site: www.popsci.com/category/tags/ssd.

Firstly you simply have a small piece of silicon with a whole lot of transistors that look like the following:

  

Each transistor is 1000 times thinner than a human hair. In essence, each transistor either holds an electrical charge or it doesn’t.  In the case of SSDs, a transistor that is charged equals the value 0 and the ones that are not equal a value of 1. That is the extent of it. 

In the above photo the transistor that is charged has a circle around the “e” which stands for electrons representing the charge.   

Now, to read what’s inside these transistors an electrical pulse is sent to them and by reading the signal from the pulse sent it’s able to tell which are charged and which are not. All in all this is a pretty fast operation. Even writing to a transistor for the first time is pretty fast as it’s empty with no charge to begin with. 

BUT…… what happens when you have to write to an area of the SSD that has already been written to?  Here in lays the problem and the unfortunate “Achilles heel” for SSDs. With a hard drive you can just simply flip the bit in place and call it a day.  In fact this is a faster operation on a hard drive then it is on an SSD, over the long run. Now aside from the extra step it takes to erase an SSD it gets a lot worse. But in order to understand this we need to view how data is laid out on an SSD:

The smallest unit is a single transistor, also known as a cell. Imagine the dot shown is a top view of a single transistor magnified a gazillion times.

This single transister holds a single bit of data, i.e. a 1 or a 0.  The next unit size up is a called a page and holds 4KB of data.   To put this in perspective, there are 8,192 bits in one kilobyte so each page contains 32,768 transistors.   A page is the smallest unit that can be written to on the SSD.  This means that even if the data you are writing is only 1,500 bits in size, it will use up the entire 4KB of space and make it unusable for writing other data.  You only get to write once to it until its been erased for reuse again.

In fact, to update the data within this page, the data would have to be copied first, updated and rewritten to a new page leaving the old page unusable till it’s been erased.  The controller has to earmark it for clean up so it can be used again later.

Now, even though this page has been earmarked for being erased, it may not get erased for some time because it’s actually part of a bigger group of data called a “block”.  A block is a group of pages as illustrated below:

                  

 

The number of pages that make up a block may vary from one SSD model to another and can be very large (in the megabytes).  On one such SSD we have tested on, this happens to be 128k, which would be a group 32 pages.  This data block (32 pages) is actually what’s called an “erase block”.  An SSD only erases one data “block” at a time.  So back to our example of this page with old data in it: theoretically this page could sit around a while before this block of data is erased by the SSD.  It’s plausible this could have in some cases security of data issues but that’s a subject for research and test. 

Now, when data is updated in a page as discussed earlier it has to be moved to another location unbeknownst to the file system and internally mapped by the controller on the device to keep track of it as illustrated below:

There is obviously overhead associated with this. So a page, for argument sake, could have three states of existence:

Ready to be used (it’s erased or been erased thus all transistors have been set to 1) as in the clear little boxes above;

Used -- as in the blue boxes above;

Dirty (containing invalid data that needs to be erased) as in the black boxes above.  

The overhead in handling dirty data is huge, and referred to as garbage collection.  Garbage collection is the process of moving good data out of areas where old data exists in order to erase the old data in order to reclaim pages that can be written to again.

Doing a lot of this garbage collection activity creates a lot of activity called “write amplification”.

This is the disease that SSDs are plagued with which kills their write performance, particularly as the device fills up.  It’s also what shortens the life span of the device.

The following is a definition or description from Wikipedia that I think is pretty good:

“Write amplification (WA) is a phenomenon associated with Flash memory and solid-state drives (SSDs). Because Flash memory must be erased before it can be rewritten, the process to perform these operations results in moving (or rewriting) user data and metadata more than once. This multiplying effect increases the number of writes required over the life of the SSD which shortens the time it can reliably operate. The increased writes also consume bandwidth to the Flash memory which mainly reduces random write performance to the SSD. Many factors will affect the write amplification of an SSD, some can be controlled by the user and some are a direct result of the data written to and usage of the SSD.”

Now, there is a comment that is made in the above description that is interesting, and could lead to read performance in some cases getting degraded, which is the “increased writes” consuming bandwidth to the Flash memory interrupting a read operation.

Now, I don’t say this as a fact but rather postulating whether or not reads are affected. 

The overhead in writes required by the Windows NTFS file system to do just one IO could be considered extreme from the SSD’s point of view.  To create and write one bit of information, such as a notepad doc with a single number in it, requires an update to the MFT file by creating a record of the operation, update to a directory file and any other metadata files such as journal files that are keeping track of operations at the time. The point is, for every one write of user data there are several writes occurring to keep track of it. 

Current file systems were designed based on hard disk drive characteristics, not SSDs.  From the SSD’s point a view, NTFS writes with wild abandon. This puts a lot of overhead on the SSD controllers with a lot of overhead on data mapping and housekeeping at the controller level of the SSD which hard drives don’t have to worry about much.  A hard drive typically only has to re-map data when it has a bad sector on it as an example.

NTFS file system thinks for example that a file should be split in two (split IOs) because it thinks it doesn’t have a contiguous free space for the file that is being updated.  Yet the SSD may have, during its garbage collection process created a space or remapped data clusters but the operating system doesn’t know this and vice versa. 

The current TRIM functionality is supposed to help SSDs within Windows 7 but it's far from being a panacea to the write amplification issue. 

Different types of SSD

SLC stands for “single level cell”

MLC stands for “multi level cell”

TLC stands for “tender loving care". okay not really, just checking to make sure you are paying attention. It really stands for" tri level cell”; really. 

SLC is faster than MLC and TLC. 

The design of an MLC and an SLC are pretty much the same.  The difference is an MLC is able to put more than one value in a single transistor (referred to as a cell) by layering the data within the cell.  Typically two or more bits are able to be placed in a single cell with MLC versus one bit in SLC. 

So MLC is able to contain twice as much data versus SLC.  That’s the plus side.  The down side is MLC in reading and writing data to a single cell has to be very precise and has a lot of overhead logic wise, so MLC ends up slower than SLC due to the preciseness that is required to determine if two values exist and what those values are in a single cell. Also the life cycle of MLC becomes 10x shorter than SLC.  The following is a great white paper that describes the differences very well and how voltages are used to read the values, etc. http://www.supertalent.com/datasheets/SLC_vs_MLC%20whitepaper.pdf

The difference between TLC and MLC is NOT more transistors.  L stands for "level", referring to voltage level to a transistor, not multiple levels or numbers of transistors.  Again the above link along with a few other sites lays this out fairly well. 

The difference between one flash memory or SSD is not so much the chip itself but the supporting controller and its design.  This includes creating multiple data channels, on-board Raid and other fancy and expensive solutions.  The other differentiator is software within the controller managing all of the mapping of data, moving data around, etc.  Sounds like the age-old fragmentation problem to me again; just at a slightly different level.

RecentComments

Comment RSS

Month List

Calendar

<<  May 2018  >>
MoTuWeThFrSaSu
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910

View posts in large calendar