Index Home About Blog
From: torvalds@transmeta.com (Linus Torvalds)
Newsgroups: linux.dev.kernel
Subject: Re: NTFS-like streams?
Date: 12 Aug 00 20:49:34 GMT

In article <Pine.LNX.4.21.0008121047190.14835-100000@duckman.distro.conectiva>,
Rik van Riel  <riel@conectiva.com.br> wrote:
>On Sat, 12 Aug 2000, Michael Rothwell wrote:
>> Rik van Riel wrote:
>>
>> > So what we want are directories, and not file streams?
>> > Oh wait, we already have those...
>>
>> Not really. Directories aren't the same thing,
>> and don't serve the same purpose. They're _similar_,
>> but not identical.
>
>So what is The Big Difference(tm) that make file streams
>so much better than directories and so much different?

I'll talk really slowly.

HFS has resource forks.  They are not directories.  Linux cannot handle
them well.

I'm all for handling HFS resource forks. It's called "interoperability".

It's also realizing that maybe, just maybe, UNIX didn't invent every
clever idea out there.  Maybe, just maybe, resource forks are actually a
good idea.  And maybe we shouldn't just say "Oh, UNIX already has
directories, we don't need no steenking resource forks".

Put this another way: don't think about "directories vs resource forks"
at all. Instead, think about the problem of supporting something like
HFS or NTFS _well_ from Linux. How would you do it?

Suggestions welcome. What's your interface of choice for a filesystem
like HFS that _does_ have resource forks? Whether you like them or not
is completely immaterial - they exist.

And usability concerns _are_ real concerns. I'm claiming that the best
interface for such a filesystem would be

	open("file", O_RDONLY)		- opens the default fork
	open("file/Icon", O_RDONLY)	- opens the Icon fork
	open("file/Creator"...

	readdir("file")			- lists the resources that the file has

and I'm also claiming that the Linux VFS layer actually shouldn't have
any fundamental problems with something like this.

Tell me why we shouldn't do it like the above? And DON'T give any crap
about whether resource forks are useful or not, because I claim that
they exist regardless of their usefulness and that we shouldn't just put
our heads in the sand and try to hope that the issue doesn't exist.

		Linus



From: torvalds@transmeta.com (Linus Torvalds)
Newsgroups: linux.dev.kernel
Subject: Re: NTFS-like streams?
Date: 12 Aug 00 21:40:15 GMT

On Sat, 12 Aug 2000, Alexander Viro wrote:
>
> On 12 Aug 2000, Linus Torvalds wrote:
> >
> > and I'm also claiming that the Linux VFS layer actually shouldn't have
> > any fundamental problems with something like this.
>
> 	Shouldn't or doesn't? I can tell you what the current problems
> _are_.

I know there are some problems with it right now, but they should not be
design issues.

Right now the VFS layer checks for S_ISDIR() in a few places, in order to
do the O_DIRECTORY tests etc. That will need changes - but those changes
would be required anyway (ie due to the addition of S_IFCOMPLEX mode bit
in i_mode) for handling of these kinds of files.

> 	a) in a _lot_ of places we are required to distinguish between
> directories and non-directories and yes, a lot of things in userland
> depend on that.

Right. This is the one that actually needs interface changes: something
like

	#define S_IFCOMPLEX	0x10000

in <linux/stat.h>, plus changing of tests. And teaching things like "cp"
about it.

> 	b) unlink() on such beasts. Welcome to fun. And no, it's not
> rmdir() - here we are removing non-empty object.
> 	c) rename() of normal file to such animal and vice versa.
> 	d) rename() of directory <<--->>
> 	e) propagation of chmod() results
> 	f) _if_ we do unlink() - what should happen with
> delete-upon-the-last-iput() semantics?

I think the above are non-VFS issues to a large degree, and will depend on
how the low-level filesystem handles things.

For example, I don't think a filesystem would want to allow the rename
case. The VFS layer doesn't need to know about this: the filesystem would
just return "EINVAL" or something. Same goes for "link()". And same goes
for "chmod()".

The _interesting_ thing is what the filesystem does with the "struct
inode", for example.

I suspect that a filesystem that supports resource forks will just have
the same inode for the whole thing. In effect, they would look somewhat
like hard links to a normal UNIX application. But the filesystem would
save the resource-specific informaiton in the "dentry->d_private" field.

Alternatively, the filesystem might decdie to use separate inodes. The VFS
layer probably doesn't care.

> Care to give semantics for operations in the list above?

Depending on how you want to do the implementation, you can do different
things. I suspect a lot of things just won't be allowed. For example,
doing a chmod() on a fork might do different things - ranging from "refuse
to allow any changes except on the default fork" to "having different
permissions on different resources".

We've had similar issues with the MS-DOS filesystem just because it
doesn't have some of the attributes at all. You can think of resource
forks as having the same kind of "incomplete" file semantics. This is not
a new problem in that sense.

And after all, this to a large degree depends on what the low-level
filesystem capabilities are. A low-level filesystem might easily have
different permissions for different resources. Not very likely, but not
impossible.

Imagine a filesystem that carries a "validation cookie" with every file,
for example. The whole point of such a validation cookie would be that
even the owner of the file couldn't change it - and copying the file would
not be able to copy the validation cookie.

Basically, the VFS layer cannot handle all of these issues 100% right now.
However, none of them look like fundamental problems, and they seem to
fall into the category of "we've never done that before, so we don't have
the support for it yet, but it's a SMOP".

Famous last words.

		Linus



From: torvalds@transmeta.com (Linus Torvalds)
Newsgroups: linux.dev.kernel
Subject: Re: NTFS-like streams?
Date: 13 Aug 00 00:20:09 GMT

On Sun, 13 Aug 2000, Andi Kleen wrote:
>
> There currently is a slight name space collision "(deleted)" is a valid
> filename, but you could change it to //deleted/mount/inode or some such
> to avoid that.

I'd much rather just have a very simple rule: if it doesn't start with a
slash, then it's not a "real" path.

And move "(deleted)" to the front of the name rather than the end.

(Note that the "doesn't start with a slash" rule is for more things than
deleted files: it's pipes, sockets, whatever. So that one is independent
of how we do deleted).

I dislike the "//" syntax a lot. It has potential special semantics in
POSIX, and it _is_ a valid root-based filename even without those
semantics. And some day we may take advantage of the POSIX-blessed "//"
syntax extensions.

In contrast, if you get a path that doesn't start with a '/', then you
know a priori that it cannot be a full pathname. It could obviously be a
relative one, but for something that is supposed to return the full
path that isn't an issue, so there is no possibility for confusion.

		Linus




From: torvalds@transmeta.com (Linus Torvalds)
Newsgroups: linux.dev.kernel
Subject: Re: NTFS-like streams?
Date: 13 Aug 00 00:13:02 GMT

On Sat, 12 Aug 2000, Alexander Viro wrote:
>
> You know what tar(1) will do with you for that, don't you? Same ->st_ino
> with different contents... And unlike procfs, here tar is a Reasonable
> Thing(tm).

But "tar" won't even _see_ the thing. Unless "tar" starts to know about
S_IFCOMPLEX. In which case it's a non-issue.

Remember: unix-only programs will only see the regular data side. They
won't ever see the other resource forks at all. Ergo, they cannot break.

Programs that are aware of S_IFCOMPLEX are aware of it. Ergo, they cannot
break.

In short: not a problem.

> > We've had similar issues with the MS-DOS filesystem just because it
> > doesn't have some of the attributes at all. You can think of resource
>
> So we did. I also remember the hell we had there due to the weird aliases
> mess.

Oh, name case-insensitivity is _much_ worse than forks will ever be.

> Oh, yes. Linus, I would _really_ ask you to postpone the activity in that
> direction until
> 	a) ->revalidate() interface (along with its races) is sorted out
> 	b) ->getattr() will be in place and used by VFS
> 	c) icache hashing by ->i_dev issue is sorted out, quite possibly
> along with the ->st_fstype thing

Note that no way is this going to happen until 2.5.x anyway, so don't
worry.

And it won't happen at all unless somebody starts to care more about
things like HFS than historically people have cared so far. NTFS may end
up being the thing that makes us do this. But it may be that even NTFS
isn't important enough.

		Linus



From: torvalds@transmeta.com (Linus Torvalds)
Newsgroups: linux.dev.kernel
Subject: Re: NTFS-like streams?
Date: 13 Aug 00 17:37:22 GMT

On Sat, 12 Aug 2000, Pavel Machek wrote:
> >
> > and then access the "Icon" resource in it by just doing
> >
> > 	xv ~/myfile/Icon
>
> Sorry, this is not going to work. I played with this with podfuk, and
> xv will probably stat myfile (just for fun), notice it is regular
> file, and refuse to try to open myfile/Icon.
>
> What you however can do is xv ~/myfile#utar/Icon. This actually works
> for me.

I don't think this is a strong argument. Any program that "knows" that it
is handling a POSIX filesystem and simply does part of the work itself is
always going to break on extensions. That's just unavoidable. Adding the
magic string at the end makes "xv" happy, but might easily make something
else that assumes POSIX behaviour unhappy instead (ie somebody else does
'stat("myfile#utar")' and is unhappy because it doesn't exist).

Tough. Whatever we do, complex files are going to act differently from
regular files. Even a HFS approach that looks _exactly_ like a UNIX
filesystem will confuse programs that get unhappy when the resource files
magically disappear when the non-resource file is deleted.

Also, note that we can always break things up: even in the presense of
programs that _require_ POSIX behaviour because they think they know
better than the OS (silly thing to do) you can always just do

	cp ~/myfile/Icon Icon.bpm
	xv Icon.bpm
	cp Icon.bpm ~/myfile/Icon

instead. I'm personally worried not about individual programs not being
able to take advantage of the resources, but about Linux fundamentally not
_supporting_ the notion of resources at all.

So what I want to make sure is that Linux supports the infrastructure for
people to take advantage of resource forks. The fact that not everybody is
going to be able to do so automatically is not my problem.

[ Put another way: I suspect that we won't support resource forks natively
  for another few years, and HFS etc will have their own specialized
  stuff. I don't care all that much. But at the same time I do believe
  that eventually we'll probably have to handle it. And at _that_ point I
  care about the fact that our internal design has to be robust. It
  doesn't have to make everybody happy, but it has to be clean both
  conceptually and from a pure implementation standpoint. I don't want a
  "hack that works". ]

			Linus



From: torvalds@transmeta.com (Linus Torvalds)
Newsgroups: linux.dev.kernel
Subject: Re: NTFS-like streams?
Date: 13 Aug 00 17:16:33 GMT

On Sun, 13 Aug 2000, Rogier Wolff wrote:
>
> the HFS guys made a point of making the filesystem capable of being
> tar-copied. I think that this is a useful feature.

I don't disagree. On the other hand, I have to say that I personally put
ease-of-use before tar-copyability any time.

However, there's a stronger argument against the HFS approach: it
_definitely_ will never work in the schenario that Al outlined - hard
links of complex objects.

Now, HFS doesn't actually have hard links as far as I know, so you may say
"So what?".

The "so what" is simple: are we going to have unified behaviour for
resource forks, or is every damn filesystem that has extended attributes
(whether they be named streams, binary-only EA's, ACL's, whatever) going
to do some ad-hoc name decision for _their_ particular version of their
extensions?

Me, I'd personally prefer to have a _design_. In fact, to some degree
that's actually the only thing I care about.

And the HFS approach fails the "design" criterion. It cannot handle the
NTFS case at all.

Note that some NTFS people have advocated the NTFS design: special
functions for setting and accessing the NTFS EA's. And that is _equally_
short-sighted. It misses the point entirely: I'm not interested in a
HFS-specific hack, and I'm not interested in a NTFS-specific hack.

So what I'm looking for in this discussion an acceptable GoodDesign(tm).
Something that can (a) handle _arbitrary_ extended attributes, no matter
what particular low-level filesystem is underneath and (b) something that
is reasonably intuitive on a user level.

The HFS design fails (a) quite badly. It's just not a possible layout for
the dcache for a hard-linked complex object. Al correctly pointed that out
as an interesting case, and also had the solution for it. But that
solution implies "encapsulating" the whole complex object. Which means
that we cannot spread out the attributes in multiple places.

Personally, I think that spreading out the attributes is also not very
user-friendly, but that's a matter of taste, not a hard cold "this won't
work" kind of argument ;)

Of course, maybe people _want_ different filesystems to just look
different. Maybe a GoodDesign(tm) is not needed. It certainly hasn't been
a big issue so far.

		Linus




From: torvalds@transmeta.com (Linus Torvalds)
Newsgroups: linux.dev.kernel
Subject: Re: NTFS-like streams?
Date: 13 Aug 00 17:50:48 GMT

On Sun, 13 Aug 2000, Michael Rothwell wrote:
>
> While I'm not neccissarily an "NTFS person," I feel
> compelled to point out that NTFS named streams operate
> as normal files, and are accessed via a namespace
> extension, the ":" character. BeFS, HPFS, etc. Extended
> Attributes can be built on top of NTFS-style Named
> Streams by providing accessor functions that simply
> opening a stream, writing a chunk of data to it, and
> closing it again.

Ahh. Ok, then I confused it with the BeFS people.

Or the XFS people.

Anyway, somebody. Somebody was suggesting the utterly limited and
braindead interface of "set_extended_attribute(file, xxxx, yyy, zzzz)"
kind of approach. Which obviously does not handle the generic case.

And I agree with you: I think the only sane _design_ is one that can
handle the generic case.

"Give them rope", as Joan of Arc used to say.

That was the reason UNIX originally did everything as a "stream of bytes".
Because in the end, anything else is too limiting.

Make the extended attributes look like regular files. Then, of course, the
actual low-level implementation may not be able to do everything. Size
limitations etc are a fact of life too. Special naming conventions. All
things we've had to be able to handle since day 1.

		Linus



From: tytso@MIT.EDU (Theodore Y. Ts'o)
Newsgroups: linux.dev.kernel
Subject: Re: NTFS-like streams?
Date: 13 Aug 00 19:58:57 GMT

   Date: 	Sun, 13 Aug 2000 13:42:04 -0400
   From: Michael Rothwell <rothwell@flyingbuttmonkeys.com>

   While I'm not neccissarily an "NTFS person," I feel
   compelled to point out that NTFS named streams operate
   as normal files, and are accessed via a namespace
   extension, the ":" character. BeFS, HPFS, etc. Extended
   Attributes can be built on top of NTFS-style Named
   Streams by providing accessor functions that simply
   opening a stream, writing a chunk of data to it, and
   closing it again. NTFS is the only FS, AFAIK, that
   supports such a generic view of streams. BeFS, the Mac,
   etc. all use Extended Attributes available as name-value
   pairs only via special functions.

Trying to map Extended Attributes on top of the Named Streams
abstraction is at best problematic.

For example, suppose you have an EA:

	Creator=tytso

Now suppose you do the following:

	fd = open("~/Myfile/Creator")
	write(fd, "acox", 4);
	close(fd)

What does the EA contain now?  Is it:

	Creator=acoxo

or is it

	Creator=acox

Under Named Streams, it should be "acoxo", but trying to implement that
on a filesystem that uses Extended Attributes is very strange.

If you go in the other direction, and you write:

	fd = open("~/Myfile/Creator")
	write(fd, "The ", 4);
	write(fd, "Eric ", 5);
	write(fd, "Youngdale ", 10);
	write(fd, "Committee" , 9);
	close(fd);

This is going to be awkward at best for a filesystem with Extended
Attributes, as it will need to do an EA replace operation for each
write, while it simulates the concept of a file pointer ---- which
doesn't exist in the standard EA programming interface:

     int attr_{get,set}(const char *path, const char *attrname,
		   char	*attrvalue, int	*valuelength, int flags);

So as much as I understand the desire for one API "to rule them them all
and in the darkness bind them", I really believe that Extended
Attributes are fundamentally different from Named Streams, and you
really shouldn't try to stretch EA's over the Named Streams procrustean
bed.

							- Ted



From: tytso@MIT.EDU (Theodore Y. Ts'o)
Newsgroups: linux.dev.kernel
Subject: Re: NTFS-like streams?
Date: 14 Aug 00 02:51:29 GMT

   Date: Sun, 13 Aug 2000 20:31:27 -0400
   From: Michael Rothwell <rothwell@flyingbuttmonkeys.com>

   > For example, suppose you have an EA:
   >
   >         Creator=tytso
   >
   > Now suppose you do the following:
   >
   >         fd = open("~/Myfile/Creator")
   >         write(fd, "acox", 4);
   >         close(fd)
   >
   > What does the EA contain now?  Is it:
   >
   >         Creator=acoxo
   >
   > or is it
   >
   >         Creator=acox


   NT Services for Macintosh actually implements
   all resource-fork data in one stream, and provides
   structured storage (prob. just a hash table, but I
   don't really know) to do it in. It doesn't map
   each EA to a separate stream. But, we'll play
   your game for a moment. The correct answer is
   B, "acox". It doesn't make sense to do anything
   but wipe out what is there and replace it when
   doing EAs on top of named streams, with a 1:1 EA:NS
   mapping. Leaving what is there and overwriting part of
   it isn't how EAs work, and isn't how EAs on NSes have
   to work.

So what you're saying is that the semantics of read() and write() change
depending on whether your filesystem supports Extended Attributes or
Named Streams.  Note that if "Creator" were a named stream, the answer
would be (A).

Is this enough to convince you that trying to emulate Extended
Attributes in terms of Named Streams is terminally broken?!?  API's are
more than just the function signatures; they are also about semantics.
If the semantics change in a fundamental way depending on what kind of
filesystem you're dealing with, things are really broken.

The other question which this of course raises is what happens if a
filesystem has *both* named streams and extended attributes?  And in
fact, NTFSv5 (shipped with the W2K bug) does in fact have both.  This
should hopefully prove to you that Extended Attributes and Named Streams
are fundamentally different.

							- Ted




From: torvalds@transmeta.com (Linus Torvalds)
Newsgroups: linux.dev.kernel
Subject: Re: NTFS-like streams?
Date: 13 Aug 00 19:42:44 GMT

On Sun, 13 Aug 2000, Alan Cox wrote:

> > But "tar" won't even _see_ the thing. Unless "tar" starts to know about
> > S_IFCOMPLEX. In which case it's a non-issue.
>
> oh wonderful. So you've just broken my backup scripts. Congratulations.

Alan.

Calm down a moment, and THINK.

How hard do you think it is to make the tar-test that does

	if (S_ISDIR(st->st_mode)) {
		... traverse into directories ..
	}

instead be

	#ifdef S_ISCOMPLEX
	#define CAN_TRAVERSE(x) (S_ISCOMPLEX(x) || S_ISDIR(x))
	#else
	#define CAN_TRAVERSE(x)	(S_ISDIR(x))
	#endif

	...

	if (CAN_TRAVERSE(st->st_mode)) {
		.. traverse into directories ..
	}

and suddenly tar _can_ handle resource forks. Sure, you'll need some extra
logic to handle the complex files data too, but really, Alan.

What's the advantage, I hear you say?

The above will work on HFS. But so will the current "tar". Resource forks
and all.

The above will -also- work on NTFS. And the current setup will never do
that.

> tar is already backing up my HFS test partition, including the resource
> forks.

..and it can do so.

The thing is, right now resource forks are only exported on HFS. As far as
I know, the Linux NTFS driver doesn't even try. But people are starting to
be more and more interested in supporting NTFS in a real way, rather than
the partial support it has now. You-know-who etc.

Quite frankly, _eventually_ we'll have to bite the bullet and handle
resource forks. Maybe HFS will continue to use the current setup. Who
knows? But wouldn't it be nice to have a unified way of handling it? And
complain all you like, but the HFS way just cannot be the unified way.

There are actually problems with the current HFS hackery: one of the
problems is that because it splits things up in different directories, you
have multiple dentries pointing to the same inode. That's fine: the dentry
code has no trouble with that per se (hard links), but I suspect it causes
races on create/remove.

At the very least, I hope the virtual ".resource" directory is the same
physical inode as the directory it resides in, because otherwise the basic
"dir->i_sem" concurrency protection simply won't work.

(To me it looks like that isn't the case. Race city. Nobody probably
cares, but it's an example of the fact that HFS is actually buggy as it is
implemented right now. Exactly because the VFS layer doesn't understand
what it is that HFS is trying to do).

Do you see the problem now? Is pointing you to a real bug enough?

		Linus




From: torvalds@transmeta.com (Linus Torvalds)
Newsgroups: linux.dev.kernel
Subject: Re: NTFS-like streams?
Date: 13 Aug 00 22:07:06 GMT

On Sun, 13 Aug 2000, Alan Cox wrote:
>
> I'd very much like it to be unified. I can see that very well. It needs to
> be unified in a way I can serve it over NFS to boxes that dont make that
> assumption and create the same layout trivially on a non resource forked
> fs.

Fair enough. I do think that it is basically impossible to try to create a
non-POSIX filesystem with POSIX semantics. There's always going to be some
problem - exporting resources over NFS (that doesn't know about them) is
going to cause some quite interesting problems with caching issues on the
clients we export them to, for example.

> > At the very least, I hope the virtual ".resource" directory is the same
> > physical inode as the directory it resides in, because otherwise the basic
> > "dir->i_sem" concurrency protection simply won't work.
>
> If it has the same inode number lots of other stuff breaks so I fear it doesnt

Ehh.. The "lots of stuff breaks" is, I assume, basically again just "tar".
Nothing else really ever tends to care about inode numbers.

Note that the tradeoff of "can potentially cause filesystem corruption" vs
"well, at least 'tar' is happy with the layout" is not a trade-off where I
would have considered the happiness of 'tar' to be of all that noticeable
importance.

But apparently that's the choice the HFS filesystem has made. Shades of
Windows, in my opinion: "yeah, we know it is broken, but we preferred some
hard-to-trigger filesystem corruption to breaking a legacy program that
couldn't understand the new filesystem features".

> Im not arguing about needing to do something. I just think the solutions so
> far all have large holes in them. And no - I dont have a better one to offer 8(

We could in theory play games with "readdir()". Do things like return two
entries with the same name, and with different st_mode information. Now
THAT would be confusing, but it might fool "tar" into doing exactly the
right thing without any changes, in fact (first opening the file as a
plain file and saving that away, then opening the same file as a directory
and saving that away).

Yeah, I think POSIX would consider that a no-no ;)

		Linus



From: torvalds@transmeta.com (Linus Torvalds)
Newsgroups: linux.dev.kernel
Subject: Re: NTFS-like streams?
Date: 13 Aug 00 19:12:18 GMT

On Sun, 13 Aug 2000, Alan Cox wrote:
>
> We already do that for Apple HFS. We create a fake directory for each dir
> which is called .AppleDouble and contains the resource fork. It works pretty
> well on the whole. rename() has some suprises and a generic unix cp command
> will lose the resource fork but it works ok.

See earlier emails on why this is unusable for NTFS due to hardlink
capabilities.

> > filesystems. Sane and usable. Things like "fd_open()" make sense even
> > without resource forks - it's kind of a private extension of the notion of
> > "current working directory", after all.
>
> fd_open is interestingly dangerous for security unless carefully considered.
> But yes it should be sat down and thought through

Note that if fd_open is dangerous for security, then so is "fchdir()".
Because you can emulate fd_open() with fchdir()+regular open. Slowly, but
still.

> > Maybe in the future, if we support resource forks on other filesystems
>
> We already do. Have done since 1.3.something.

No.

We have had ad-hoc support for some special cases.

Linux hasn't supported them as a notion.

> hfs works fine. This is a debate about an existing solved non problem.

Alan, do you really mean to say that every filesystem that has resource
forks should solve the problem over and over again, and in different
manners?

It's already been made clear that the HFS solutions is unacceptable for
NTFS.

In contrast, the NTFS solution _would_ be acceptable for HFS.

Think about it. Which one is the better solution?

		Linus


Index Home About Blog