Index Home About Blog
Newsgroups: fa.linux.kernel
From: "Theodore Ts'o" <tytso@mit.edu>
Subject: Re: readdir loses renamed files
Original-Message-ID: <20041025123722.GA5107@thunk.org>
Date: Mon, 25 Oct 2004 12:47:58 GMT
Message-ID: <fa.e8bnc94.hmg120@ifi.uio.no>

On Mon, Oct 25, 2004 at 04:21:57AM +0300, Timo Sirainen wrote:
> I'd have thought this had already been asked many times before, but
> google didn't show me anything.
>
> My problem is that mails in a large maildir get temporarily lost. This
> happens because readdir() never returns a file which was just rename()d
> by another process. Either new or the old name would have been fine,
> but it's not returned at all.
>
> Is there a chance this could get fixed? Every OS/filesystem I've tested
> so far has had the same problem, so I'll have to implement some extra
> locking anyway (so much for maildir being lockless), but it would be
> nice to have at least one OS where it works without the extra locking
> overhead.

In some cases it won't even just get lost, but the old and new name
can both be returned.  For example, if you assume the use of a simple
non-tree, linked-list implementation of a directory, such can be found
in Solaris's ufs, BSD 4.3's FFS, Linux's ext2 and minix filesystems,
and many others, and you have a fully tightly packed directory (i.e.,
no gaps), with the directory entry "foo" at the beginning of the file,
and readdir() has already returned the first "foo" entry when some
other application renames it "Supercalifragilisticexpialidocious", the
new name will not fit in the old name's directory location, so it will
be placed at the end of the directory --- where it will be returned by
readdir() a second time.

This is not a bug; the POSIX specification explicitly allows this
behavior.  If a filename is renamed during a readdir() session of a
directory, it is undefined where that neither, either, or both of the
new and old filenames will be returned.

And that's because there's no good way to do this without trashing the
performance of the system, especially when most applications don't
care.  (Do you really want your entire system running significantly
slower, penalizing all other applications on your system, just because
of one stupid/badly-written application?)  In order to do this, the
kernel would have to atomically snapshot the directory --- even one
which might be several megabytes in length, and store a copy of it in
memory, while the application calls readdir().  Several processes
could perform a denial-of-service attack by starting to call
readdir(), and then stopping.  This would end up locking up huge
amounts of non-pageable system memory, and cause the system to come
down in a hurry.

							- Ted


Newsgroups: fa.linux.kernel
From: "Theodore Ts'o" <tytso@mit.edu>
Subject: Re: readdir loses renamed files
Original-Message-ID: <20041028170642.GA8220@thunk.org>
Date: Thu, 28 Oct 2004 17:17:49 GMT
Message-ID: <fa.e5s7bp8.h6012e@ifi.uio.no>

On Thu, Oct 28, 2004 at 11:34:26AM +0200, Matthias Andree wrote:
> Please - is it really necessary that application writers are offended in
> this way? Timo is investing enormous time and effort in writing a *good*
> application, and he's effectively seeking a way to *robustly* deal with
> Maildir format mail storage. Please leave it at "readdir/getdents don't
> work the way you expect and cannot for this and that reason."
>
> Timo tries to implement a *robust* Maildir reader and has just bumped
> into the flaws of DJB's "no-locking" store.

That's true, I should also have included badly-/stupidly- designed
mailstore architectures in the list of possibilities.  :-)

Stepping back for a moment, do you really need such semantics?  After
all, when you're dealing with Maildir, even if you're not dealing with
rename(), you still have to worry about programs deleting or inserting
(or moving between Mailboxes) messages out from under you while you
are doing the readdir() scan.

Since by definition Maildir is lockless, it is expected that
applications be able to deal with such changes.  If they can't, either
the design of Maildir is busted (and I have my own opinions of DJB's
designs, which aren't worth going into here) or the application is
busted.  Take your pick.

> Just some rough thoughts:
>
> 1. the number of open file handles (including directories seen as files
>    for a moment at least) is limited per process, and I'd think the
>    number of directories open can be lower

But directory sizes are unlimited --- they could conceivably be
hundreds of megabytes, and so it's not reasonable to require the
kernel to do the snapshot using unpageable kernel memory.

> 2. versioned information might provide what the application wants
>    without locking up the system

Not given the POSIX readdir/opendir interface!

(And if we have the freedom to redesign the readdir POSIX interface,
there are plenty of other changes I'd make along the way.  Nuking
telldir and seekdir would be near the top of the list.  If you want
an example of truly brain-damaged design, just take telldir and
seekdir... please!)

> 3. the application could be offered an interface for atomic directory
>    reads that requires the application to provide sufficient memory in a
>    single contiguous buffer (making it thread-safe in the same go).

Actually, you can do this today, if you use the underlying
sys_getdents64 system call.  But the application would have to
allocate potentially a very large amount of userspace memory.

						- Ted

Index Home About Blog