Index Home About Blog
Date: 	Thu, 4 Oct 2001 08:49:24 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: Security question: "Text file busy" overwriting executables but
Newsgroups: fa.linux.kernel

On 4 Oct 2001, Eric W. Biederman wrote:
>
> First what user space really wants is the MAP_COPY.  Which is
> MAP_PRIVATE with the guarantee that they don't see anyone else's changes.

Which is a completely idiotic idea, and which is only just another example
of how absolutely and stunningly _stupid_ Hurd is.

The thing with MAP_COPY is that how do you efficiently _detect_ somebody
elses changes on a page that you haven't even read in yet?

So you have a few choices, all bad:

 - immediately reading in everything, basically turning the mmap() into a
   read. Obviously a bad idea.

 - mark the inode as a "copy" inode, and whenever somebody writes to it,
   you not only make sure that you do copy-on-write on the page cache page
   (which, btw, is pretty much impossible - how did you intend to find all
   the other _non_COPY_ users that _want_ coherency).

   You also have to make sure that if somebody changes the page, you have
   to read in the old contents first (not normally needed for most
   changes that write over at least a full block), but you also have to
   save the old page somewhere so that the mapping can use it if it faults
   it in later. And how the hell do you do THAT? Especially as you can
   have multiple generations of inodes with different sets of "MAP_COPY"
   on different contents..

   In short, now you need filesystem versioning at a per-page level etc.

Trust me. The people who came up with MAP_COPY were stupid. Really. It's
an idiotic concept, and it's not worth implementing.

And this all for what is a administration bug in the first place.

In short: just say NO TO DRUGS, and maybe you won't end up like the Hurd
people.

		Linus


Date: 	Sat, 13 Oct 2001 10:13:17 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: Security question: "Text file busy" overwriting executables but
Newsgroups: fa.linux.kernel

On Sat, 13 Oct 2001, Jamie Lokier wrote:
>
> I can think of an efficiency-related use for MAP_COPY, and it has
> nothing to do with shared libraries:
>
>  - An editor using mmap() to read a file.

No, you're thinking the wrong way.

Trust me, MAP_COPY really _is_ stupid, and the Hurd is a piece of crap.

People who think MAP_COPY is a good idea are people who cannot think about
the implications of it, and cannot think about the alternatives.

In particular, you claim that you could use "mmap()" for "read()", and
speed up the application that way. Ok, fair enough.

Now, somebody who _isn't_ stupid (and that, of course, is me), immediately
goes "well, _duh_, why don't you speed up read() instead?".

The fact is, all the problems that "MAP_COPY" has just go away if you
instead of thinking about a mmap(), you think about doing a "read()" and
just marking the pages PAGE_COPY if they are exclusive.

In short: MAP_COPY is braindamaged, because it doesn't have enough
information at the right level to do a reasonable job of it. What people
want to use it for is really to emulate "read()" efficiently using mmap,
and _nothing_ else. That is the only reason for it ever existing, and the
fact is, that clearly shows just how _stupid_ the whole thing is.

You might as well just do a read() in the first place.

Your arguments are
 - read() implies a memcpy()
 - read() dirties pages and causes more memory pressure

but you don't actually _question_ those arguments.

I will tell you that doing a read() that _acts_ like the MAP_COPY you so
want is a LOT easier than doing MAP_COPY in the first place.

Why?

 - a read() call doesn't have any "history" - it doesn't leave (bogus)
   VM data around like MAP_COPY does. MAP_COPY says "I want these pages to
   have the contents they did _when_I_did_the_mapping_", which is a
   temporal shift that just doesn't make sense in any sane VM model, and
   which inherently implies versioning.

 - a read() can fairly easily just do the optimization

	(a) if we're reading a large area
	(b) if the offset and the destination are page-aligned
	(c) if the page is exclusive (ie no existing other owners)
		then
	just do the page move instead of the copy, and mark the page as
	PAGE_COPY

   Every other use of the page that can change it (ie a shared writable
   mapping, or a "write()" call) will now check the PAGE_COPY bit on the
   _page_, and just say "ok, I'll allocate a new page, and atomically
   switch the ones, and leave the old page untouched and remove it from
   the page cache"

   (And the swap-out logic has to turn a PAGE_COPY page into a swap-cache
   page - this is the real downside, because it implies that we will have
   to write it out to swap if we're low on memory, unlike a real mmap)

Notice? Same as MAP_COPY, but without any global state.

And notice how this is actually conceptually much closer to what you
actually _want_ to use MAP_COPY for.

Could we implement MAP_COPY as such a read()? Yes, sure. But that's just
confusing the issue - why call it a mmap() at all, when it isn't. The day
when Hurd is so common that we want to emulate its braindamages is not
going to be in my life-time, I suspect.

		Linus



Date: 	Sat, 13 Oct 2001 12:23:47 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: Security question: "Text file busy" overwriting executables but
Newsgroups: fa.linux.kernel

On Sat, 13 Oct 2001, Jamie Lokier wrote:
>
> In fact it was proposed here on this list years ago, and I think you
> argued against it (TLB flush costs).  The costs and kernel
> infrastructure have changed and maybe the idea could be revisited now.

It's still not entirely unlikely that doing VM mappings is simply more
expensive than just doing a memcpy. The TLB invalidate is only part of the
issue - you also have the page table walk, the VM lock, and the fact that
PAGE_COPY itself ends up being overhead.

Which is why the PAGE_COPY kind of read() optimization is _probably_ only
worth it if the user asks for it directly (or automatically only for large
reads together with single-threaded applications).

The explicit flag is probably a good idea also because of usage patterns
(PAGE_COPY is a slowdown _if_ the file is actually written to or even
mapped shared).

		Linus



Date: 	Sat, 13 Oct 2001 15:19:16 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: Security question: "Text file busy" overwriting executables but
Newsgroups: fa.linux.kernel

On Sat, 13 Oct 2001, Jamie Lokier wrote:
>
> There are applications (GCC comes to mind) which are using mmap() to
> read files now because it is measurably faster than read(), for
> sufficiently large source files.
>
> I don't know where the optimal costs lie.

The gcc people tested it, and their cut-off point is at 30kB or so.
Anything smaller than that is faster to just "read()".

Now, that's a traditional mmap(), though, which has more overhead than a
"read-with-PAGE_COPY" would have. The pure mmap() approach has the actual
page fault overhead too, along with having to do "fstat()" and "munmap()".

		Linus



From: torvalds@transmeta.com (Linus Torvalds)
Subject: Re: Security question: "Text file busy" overwriting executables but
Date: 	Sat, 13 Oct 2001 22:27:30 +0000 (UTC)
Newsgroups: fa.linux.kernel

In article <Pine.LNX.4.33.0110131219520.8900-100000@penguin.transmeta.com>,
Linus Torvalds  <torvalds@transmeta.com> wrote:
>
>The explicit flag is probably a good idea also because of usage patterns
>(PAGE_COPY is a slowdown _if_ the file is actually written to or even
>mapped shared).

Actually, I missed the obvious case: quite often when you do a "read()",
the reader itself will end up writing to the area read into.  In which
case doing the PAGE_COPY would also slow down measurably, due to the
extra overhead of the copy-on-write fault (which not just does the copy
that we tried to avoid, but will take a fault and more VM locks). 

So if we want to do this optimization, we _definitely_ want it to be
explicitly controlled by a flag, like O_DIRECT is.  There are just too
many cases where it's a pessimization, and while the user can often tell
before-hand, the kernel simply cannot. 

		Linus


Date: 	Sun, 14 Oct 2001 08:40:30 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: Security question: "Text file busy" overwriting executables but
Newsgroups: fa.linux.kernel

On 14 Oct 2001, Eric W. Biederman wrote:
>
> Hmm.  read-with-PAGE_COPY may not be any faster than read as you still
> read all of the data into memory, so you have almost the same latency.
> mmap might work better because of better overlapping of I/O and cpu
> processing.

Most of the time, you either have the IO overhead (and whether you use
read or mmap won't matter all that much, because you're IO limited), or
the thing is cached.

For gcc, it's cached 99% of the time, because most of the IO ends up being
header files (this is, of course, assuming that you're compiling a big
project, but if you're not, the big overhead is in loading _gcc_, not in
the pages it reads).

> Also read-with-PAGE_COPY has some really interesting implications for the
> page out routines.  Because anytime you start the page out you have to
> copy the page.  Not exactly when you want to increase the memory presure.

No no. Read my thing again. On swap-out, you just move the thing to the
swap cache.

Sure, that removes it from the regular cache, and that's possibly a
performance problem. But

> And not at all suitable for shared libraries.

No. Why would you "read" shared libraries? read is read, mmap is mmap. If
you want mmap, use mmap. Don't mess it up with MAP_COPY, which is not mmap
at all.

		Linus



Index Home About Blog