Index Home About Blog
From: Theodore Ts'o <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: soft update vs journaling?
Date: Sun, 22 Jan 2006 09:32:25 UTC
Message-ID: <fa.e8cfc8v.imc121@ifi.uio.no>
Original-Message-ID: <20060122093144.GA7127@thunk.org>

On Sun, Jan 22, 2006 at 01:42:38AM -0500, John Richard Moser wrote:
> Soft Update appears to have the advantage of not needing multiple
> writes.  There's no need for journal flushing and then disk flushing;
> you just flush the meta-data.

Not quite true; there are cases where Soft Update will have to do
multiple writes, when a particular block containing meta-data has
multiple changes in it that have to be committed to the filesystem at
different times in order to maintain consistency; this is particularly
true when a block is part of the inode table, for example.  When this
happens, the soft update machinery has to allocate memory for a block
and then undo changes to that block which come from transactions that
are not yet ready to be written to disk yet.

In general, though, it is true that Soft Updates can result in fewer
disk writes compared to filesystems that utilizing traditional
journaling approaches, and this might even be noticeable if your
workload is heavily skewed towards metadata updates.  (This is mainly
true in benchmarks that are horrendously disconneted to the real
world, such as dbench.)

One major downside with Soft Updates that you haven't mentioned in
your note, is that the amount of complexity it adds to the filesystem
is tremendous; the filesystem has to keep track of a very complex
state machinery, with knowledge of about the ordering constraints of
each change to the filesystem and how to "back out" parts of the
change when that becomes necessary.

Whenever you want to extend a filesystem to add some new feature, such
as online resizing, for example, it's not enough to just add that
feature; you also have to modify the black magic which is the Soft
Updates machinery.  This significantly increases the difficulty to add
new features to a filesystem, and can add as a roadblock to people
wanting to add new features.  I can't say for sure that this is why
BSD UFS doesn't have online resizing yet; and while I can't
conclusively blame the lack of this feature on Soft Updates, it is
clear that adding this and other features is much more difficult when
you are dealing with soft update code.

> Also, soft update systems mount instantly, because there's no
> journal to play back, and the file system is always consistent.

This is only true if you don't care about recovering lost data blocks.
Fixing this requires that you run the equivalent of fsck on the
filesystem.  If you do, then it is major difference in performance.
Even if you can do the fsck scan on-line, it will greatly slow down
normal operations while recovering from a system crash, and the
slowdown associated with doing a journal replay is far smaller in
comparison.

> Unfortunately, journaling uses a chunk of space.  Imagine a journal on a
> USB flash stick of 128M; a typical ReiserFS journal is 32 megabytes!
> Sure it could be done in 8 or 4 or so; or (in one of my file system
> designs) a static 16KiB block could reference dynamicly allocated
> journal space, allowing the system to sacrifice performance and shrink
> the journal when more space is needed.  Either way, slow media like
> floppies will suffer, HARD; and flash devices will see a lot of
> write/erase all over the journal area, causing wear on that spot.

If you are using flash, use a filesystem which is optimized for flash,
such as JFFS2.  Otherwise, note that in most cases disk space is
nearly free, so allocating even 128 megs for the journal is chump
change when you're talking about a 200GB or larger hard drive.

Also note that if you have to use slow media, one of the things which
you can do is use a separate (fast) device for your journal; there is
no rule which says the journal has to be on the slow device.

						- Ted


From: Theodore Ts'o <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: soft update vs journaling?
Date: Sun, 22 Jan 2006 21:03:33 UTC
Message-ID: <fa.yDqex1Y3CtWo1P5Gv2snXHD8G2g@ifi.uio.no>
Original-Message-ID: <20060122210238.GA28980@thunk.org>

On Sun, Jan 22, 2006 at 01:54:23PM -0500, John Richard Moser wrote:
> > Whenever you want to extend a filesystem to add some new feature, such
> > as online resizing, for example, it's not enough to just add that
>
> Online resizing is ever safe?  I mean, with on-disk filesystem layout
> support I could somewhat believe it for growing; for shrinking you'd
> need a way to move files around without damaging them (possible).  I
> guess it would be.
>
> So how does this work?  Move files -> alter file system superblocks?

The online resizing support in ext3 only grows the filesystems; it
doesn't shrink it.  What is currently supported in 2.6 requires you to
reserve space in advance.  There is also a slight modification to the
ext2/3 filesystem format which is only supported by Linux 2.6 which
allows you to grow the filesystem without needing to move filesystem
data structures around; the kernel patches for actualling doing this
new style of online resizing aren't yet in mainline yet, although they
have been posted to ext2-devel for evaluation.

> A passive-active approach could passively generate a list of inodes from
> dentries as they're accessed; and actively walk the directory tree when
> the disk is idle.  Then a quick allocation check between inodes and
> whatever allocation lists or trees there are could be done.

That doesn't really help, because in order to release the unused disk
blocks, you have to walk every single inode and keep track of the
block allocation bitmaps for the entire filesystem.  If you have a
really big filesystem, it may require hundreds of megabytes of
non-swappable kernel memory.  And if you try to do this in userspace,
it becomes an unholy mess trying to keep the userspace and in-kernel
mounted filesystem data structures in sync.

						- Ted


From: Theodore Ts'o <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: soft update vs journaling?
Date: Mon, 23 Jan 2006 07:25:28 UTC
Message-ID: <fa.6nml0MAc4bROanMGA9NIT9X9MXg@ifi.uio.no>
Original-Message-ID: <20060123072447.GA8785@thunk.org>

On Sun, Jan 22, 2006 at 05:44:08PM -0500, Kyle Moffett wrote:
> From my understanding of HFS+/HFSX, this is actually one of the
> nicer bits of that filesystem architecture.  It stores the data-
> structures on-disk using extents in such a way that you probably
> could hot-resize the disk without significant RAM overhead (both grow
> and shrink) as long as there's enough free space.

Hot-shrinking a filesystem is certainly possible for any filesystem,
but the problem is how many filesystem data structures you have to
walk in order to find all the owner of all of the blocks that you have
to relocate.  That generallly isn't a RAM overhead problem, but the
fact that in general, most filesystems don't have an efficient way to
answer the question, "who owns this arbitrary disk block?"  Having
extents means you have a slightly more efficient encoding system, but
it still is the case that you have to check potentially every file in
the filesystem to see if it is the owner of one of the disk blocks
that needs to be moved when you are shrinking the filesystem.

You could of course design a filesystem which maintained a reverse map
data structure, but it would slow the filesystem down since it would
be a separate data structure that would have to be updated each time
you allocated or freed a disk block.  And the only use for such a data
structure would be to make shrinking a filesystem more efficient.
Given that this is generally not a common operation, it seems unlikely
that a filesystem designer would choose to make this particular
tradeoff.

							- Ted


Index Home About Blog