Index Home About Blog
From: Theodore Tso <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: SD/MMC cards: how crappy they are?
Date: Tue, 02 Dec 2008 16:56:27 UTC
Message-ID: <fa.9LbvREnaInlJA39JcucHIPb8fhA@ifi.uio.no>

On Tue, Dec 02, 2008 at 08:30:29AM -0800, H. Peter Anvin wrote:
> > ...maybe it was because of powerfail? I'll try to run badblocks to
> > recover it...
> >
> > ...I did. Badblocks did not help, but cat /dev/zero > /dev/mmc1
> > did.. And yes, thosse 'temporarily bad blocks' seem very much
> > powerfail related.
> >
>
> Power failures can, indeed, do nasty things to SD/MMC cards, especially
> power rail sag in the middle of writes.

If this is your random eject out from your HP laptop problem, note
that random ejects while the card is writing can cause corruption of
the flash translation layer (FTL), which for some really crappy cards,
can permanently damage them; hopefully most of those are gone from the
market, but I wouldn't be positive about that.  The better ones will
have some kind of journalling scheme for their FTL...

Fsck does have a force rewrite option, although it's not the default.
You have to answer "n" to ignore error, and then yes to "force
rewrite".  I should perhaps change that; my worry at the time was a
transient read error tricking e2fsck into blowing away the contents of
what was actually a good sector.  Of course, that will only help
blocks which fsck actually tried reading; it won't help data blocks.

Badblocks -n will fix the problem, since it will do a non-destructive
read/write test over the entire disk.  Patches to add an
forced-rewrite mode to the standard r/o badblocks sweep (so we only
write to a sector that has a read error) would be gratefully accepted.

						- Ted




From: "H. Peter Anvin" <hpa@zytor.com>
Newsgroups: fa.linux.kernel
Subject: Re: SD/MMC cards: how crappy they are?
Date: Tue, 02 Dec 2008 18:00:29 UTC
Message-ID: <fa.iQV0EvgK4YAYsOCMrg/dOA0TYiQ@ifi.uio.no>

Theodore Tso wrote:
>
> If this is your random eject out from your HP laptop problem, note
> that random ejects while the card is writing can cause corruption of
> the flash translation layer (FTL), which for some really crappy cards,
> can permanently damage them; hopefully most of those are gone from the
> market, but I wouldn't be positive about that.  The better ones will
> have some kind of journalling scheme for their FTL...
>

I have seen flash cards die permanently from having a partition table it
didn't like written to it.  Yes, the microcontroller on the flash card
tried to interpret the partition table, assumed to be MS-DOS style, and
would crash.

	-hpa


From: "H. Peter Anvin" <hpa@zytor.com>
Newsgroups: fa.linux.kernel
Subject: Re: SD/MMC cards: how crappy they are?
Date: Thu, 04 Dec 2008 19:04:00 UTC
Message-ID: <fa.bwRTR+vxIZ8YllQNYm5LopQEBXQ@ifi.uio.no>

Pavel Machek wrote:
>>>
>> I have seen flash cards die permanently from having a partition table it
>> didn't like written to it.  Yes, the microcontroller on the flash card
>> tried to interpret the partition table, assumed to be MS-DOS style, and
>> would crash.
>
> Aha... that explains why I killed few flashcards by tar xzvf /dev/sdX files
> ... hopefully thats fixed in the better/bigger cards now.
>

Also had a batch of cards which would silently "correct" the partition
table for you to align the partitions to its flash erase blocks.

	-hpa


From: Theodore Tso <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: document ext3 requirements
Date: Sun, 04 Jan 2009 22:07:02 UTC
Message-ID: <fa.BmrGL6qgJ5g6eYMC9T8TruiI9a0@ifi.uio.no>

On Sun, Jan 04, 2009 at 01:49:49PM -0600, Rob Landley wrote:
>
> Want to document the granularity issues with flash, while you're at it?
>
> An inherent problem with using flash as a normal block device is that the
> flash erase size is bigger than most filesystem sector sizes.  So when you
> request a write, it may erase and rewrite the next 64k, 128k, or even a couple
> megabytes on the really _big_ ones.
>
> If you lose power in the middle of that, ext3 won't notice that data in the
> "sectors" _after_ the one your were trying to write to got trashed.

True enough, although the newer SSD's will have this problem addressed
(although at least initially, they are **far** more costly than the
el-cheapo 32GB SD cards you can find at the checkout counter at Fry's
alongside battery-powered shavers and trashy ipod speakers).

I will stress again, that most of this doesn't belong in
Documentation/filesystems/ext3.txt, as most of this is *not*
ext3-specific.

						- Ted


From: Theodore Tso <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: document ext3 requirements
Date: Mon, 05 Jan 2009 20:20:05 UTC
Message-ID: <fa.377DMq2lPMyaHxadPnApFSJFoCg@ifi.uio.no>

On Mon, Jan 05, 2009 at 02:15:44PM -0500, Martin K. Petersen wrote:
>
> It works some of the time.  But in reality if you yank power halfway
> during a write operation the end result is undefined.
>
> The saving grace for normal users is that the potential corruption is
> limited to a couple of sectors.

A few years ago it was asserted to me that the internal block size for
spinning magnetic media was around 32k.  So if the hard drive doesn't
have enough of a capacitor or other energy reserve to complete its
internal read-modify-write cycle, attempts to read the 32k chunk of
disk could result in hard ECC failures that would cause the blocks in
question to all return uncorrectiable read errors when they are
accessed.

Of course, if the memory goes south first, and you're in the middle of
streaming a 128k update to the inode the filesystem, and the power
fails, and the memory start returning garbage during the DMA
operation, you may have much bigger problems.  :-)

So it's probably more than "a couple of sectors"....

> The current suck of flash SSDs is that the erase block size amplifies
> this problem by at least one order of magnitude, often two.  I have a
> couple of SSDs here that will leave my filesystem in shambles every time
> the machine crashes.  I quickly got tired of reinstalling Fedora several
> times per week so now my main machine is back to spinning media.

The erase block size is typically 1 to 4 megabytes, from my
understanding.  So yeah, that's easily 1-2 orders of magnitude.  Worse
yet, flash's sequential streaming write speeds are much slower than
hard drive's (anywhere from a factor of 3 to 12 depending on
cheap/trashy the flash drive happens to be), so that opens the time
window even further, by possibly as much as another order of magnitude.

I also suspect that HDD manufactures have learned various tricks (due
to enterprise storage/database vendors leaning on them) to make the
drives appear more atomic in the face of hard drive errors, and also,
in Pavel's case, as I recall he was using the card in a laptop where
the SD card protruded slightly from the laptop case, and it was very
easy for it to get dislodged, meaning that power failures during
writes were even more likely than you would expect with a fixed HDD or
SDD which is secured into place using screws or other more reliable
mounting hardware.

Put all of this together, given that Pavel's Really Trashy 32GB SD was
probably the full 3 orders of magnitude worse than traditional HDD,
and he was having many more failures due to physical mounting issues,
it's not surprising that most people haven't see problems with
traditional HDD's, even none of this is guaranteed by the hard drive
vendors.

> The people that truly and deeply care about this type of write atomicity
> (i.e. enterprises) deploy disk arrays that will do the right thing in
> face of an error.  This involves NVRAM, mirrored caches, uninterruptible
> power supplies, etc.  Brute force if you will.

Don't forget non-cheasy mounting options so an accidental brush
against the side of the unit doesn't cause the hard drive to become
disconnected from system and suffer a power drop.  I guess that gets
filed under "Brute force" as well.  :-)

							- Ted

P.S.  I feel obliged to point out that in my Lenovo X61s, the SD card
is flush with the laptop case when inserted, and I've never had a
problem with the SD card prematurely ejected during operation.   :-)


Index Home About Blog