Index Home About Blog
Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: What's left over.
Original-Message-ID: <Pine.LNX.4.44.0211011107470.4673-100000@penguin.transmeta.com>
Date: Fri, 1 Nov 2002 19:33:56 GMT
Message-ID: <fa.od9tdnv.ika4bd@ifi.uio.no>

On Fri, 1 Nov 2002, Joel Becker wrote:
>
> 	I always liked the AIX dumper choices.  You could either dump to
> the swap area (and startup detects the dump and moves it to the
> filesystem before swapon) or provide a dedicated dump partition.  The
> latter was prefered.
> 	Either of these methods merely require the dumper to correctly
> write to one disk partition.  This is about as simple as you are going
> to get in disk dumping.

Ehh.. That was on closed hardware that was largely designed with and for
the OS.

Alan isn't worried about the "which sector do I write" kind of thing.
That's the trivial part. Alan is worried about the fact that once you know
which sector to write, actually _doing_ so is a really hard thing. You
have bounce buffers, you have exceedingly complex drivers that work
differently in PIO and DMA modes and are more likely than not the _cause_
of a number of problems etc.

And you have a situation where interrupts are not likely to work well
(because you crashed with various locks held), so the regular driver
simply isn't likely to work all that well.

And you have a situation where there are hundreds of different kinds of
device drivers for the disk.

In other words, the AIX situation isn't even _remotely_ comparable. A
large portion of the complexity in the PC stability space is in device
drivers. It's the thing I worry most about for 2.6.x stabilization, by
_far_.

And if you get these things wrong, you're quite likely to stomp on your
disk. Hard. You may be trying to write the swap partition, but if the
driver gets confused, you just overwrote all your important data. At which
point it doesn't matter if your filesystem is journaling or not, since you
just potentially overwrote it.

In other words: it's a huge risk to play with the disk when the system is
already known to be unstable. The disk drivers tend to be one of the main
issues even when everything else is _stable_, for chrissake!

To add insult to injury, you will not be able to actually _test_ any of
the real error paths in real life. Sure, you will be able to test forced
dumps on _your_ hardware, but while that is fine in the AIX model ("we
control the hardware, and charge the user five times what it is worth"),
again that doesn't mean _squat_ in the PC hardware space.

See?

		Linus



From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [2/3] 2.6.22-rc2: known regressions v2
Date: Fri, 25 May 2007 16:47:26 UTC
Message-ID: <fa.icCuoafaQlYppcCMXtIZ2OpK2hw@ifi.uio.no>

On Fri, 25 May 2007, Chris Newport wrote:
>
> Maybe we should take a hint from Solaris.

No. Solaris is shit. They make their decisions based on "we control the
hardware" kind of setup.

> If the kernel crashes Solaris dumps core to swap and sets a flag.
> At the next boot this image is copied to /var/adm/crashdump where
> it is preserved for future debugging. Obviously swap needs to be
> larger than core, but this is usually the case.

(a) it's not necessarily the case at all on many systems

(b) _most_ crashes that are real BUG()'s (rather than WARN_ON()'s) leave
    the system in such a fragile state that trying to write to disk is the
    _last_ thing you should do.

    Linux does the right thing: it tries to not make bugs fatal.
    Generally, you should see an oops, and things continue. Or a
    WARN_ON(), and things continue. But you should avoid the "the machine
    is now dead" cases.

(c) have you looked at the size of drivers lately? I'd argue that *most*
    bugs by far happen in something driver-related, and most of our source
    code is likely drivers.

    Writing to disk when the biggest problem is a driver to begin with
    is INSANE.

So the fact is, Solaris is crap, and to a large degree Solaris is crap
exactly _because_ it assumes that it runs in a "controlled environment".

Yes, in a controlled environment, dumping the whole memory image to disk
may be the right thing to do. BUT: in a controlled environment, you'll
never get the kind of usage that Linux gets. Why do you think Linux (and
Windows, for that matter) took away a lot of the market from traditional
UNIX?

Answer: the traditional UNIX hardware/control model doesn't _work_. People
want more flexibility, both on a hardware side and on a usage side. And
once you have the flexibility, the "dump everything to disk" is simply not
an option any more.

Disk dumps etc are options at things like wall street. But look at the bug
reports, and ask yourself how many of them happen at Wall Street, and how
many of them would even be _relevant_ to somebody there?

So forget about it. The whole model is totally broken. We need to make
bug-reports short and sweet, enough so that random people can
copy-and-paste them into an email or take a digital photo. Anything else
IS TOTALLY INSANE AND USELESS!

			Linus


From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [2/3] 2.6.22-rc2: known regressions v2
Date: Fri, 25 May 2007 17:22:07 UTC
Message-ID: <fa.tZhvzJYxdoE95D4RnVhrX/DFJn4@ifi.uio.no>

On Fri, 25 May 2007, Alan Cox wrote:
>
> There is an additional factor - dumps contain data which variously is -
> copyright third parties, protected by privacy laws, just personally
> private, security sensitive (eg browser history) and so on.

Yes.

I'm sure we've had one or two crashdumps over the years that have actually
clarified a bug.

But I seriously doubt it is more than a handful.

> Diskdump (and even more so netdump) are useful in the hands of a
> developer crashing their own box just like kgdb, but not in the the
> normal and rational end user response of  "its broken, hit reset"

Amen, brother.

Even for developers, I suspect a _lot_ of people end up doing "ok, let's
bisect this" or some other method to narrow it down to a specific case,
and then staring at the source code once they get to that point.

At least I hope so. Even in user space, you should generally use gdb to
get a traceback and perhaps variable information, and then go look at the
source code.

Yes, dumps can (in theory) be useful for one-off issues, but I doubt many
people have ever been able to get anything much more out of them than from
a kernel "oops" message.

For developers, I can heartily recommend the firewire-based remote debug
facilities that the PowerPC people use. I've used it once or twice, and it
is fairly simple and much better than a full dump (and it works even when
the CPU is totally locked up, which is the best reason for using it).

But 99% of the time, the problem doesn't happen on a developer machine,
and even if it does, 90% of the time you really just want the traceback
and register info that you get out of an oops.

			Linus


From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [2/3] 2.6.22-rc2: known regressions v2
Date: Fri, 25 May 2007 17:52:25 UTC
Message-ID: <fa.4mEzpISndQt+mlmulymUpbJZ1Z4@ifi.uio.no>

On Fri, 25 May 2007, Andrew Morton wrote:
>
> > > There is an additional factor - dumps contain data which variously is -
> > > copyright third parties, protected by privacy laws, just personally
> > > private, security sensitive (eg browser history) and so on.
> >
> > Yes.
>
> We're uninterested in pagecache and user memory and they should be omitted
> from the image (making it enormously smaller too).

The people who would use crash-dumps (big sensitive firms) don't trust
you.

And they'd be right not to trust you. You end up having a _lot_ of
sensitive data even if you avoid user memory and page cache. The network
buffers, the dentries, and just stale data that hasn't been overwritten.

So if you end up having secure data on that machine, you should *never*
send a dump to somebody you don't trust. For the financial companies
(which are practically the only ones that would use dumps) there can even
be legal reasons why they cannot do that!

> That leaves security keys and perhaps filenames, and these could probably
> be addressed.

It leaves almost every single kernel allocation, and no, it cannot be
addressed.

How are you going to clear out the network packets that you have in
memory? They're just kmalloc'ed.

> > I'm sure we've had one or two crashdumps over the years that have actually
> > clarified a bug.
> >
> > But I seriously doubt it is more than a handful.
>
> We've had a few more than that, but all the ones I recall actually came
> from the kdump developers who were hitting other bugs and who just happened
> to know how to drive the thing.

Right, I don't dispute that some _developers_ might use dumping. I dispute
that any user would practically ever use it.

And even for developers, I suspect it's _so_ far down the list of things
you do, that it's practically zero.

> > But 99% of the time, the problem doesn't happen on a developer machine,
> > and even if it does, 90% of the time you really just want the traceback
> > and register info that you get out of an oops.
>
> Often we don't even get that: "I was in X and it didn't hit the logs".

Yes.

> You can learn a hell of a lot by really carefully picking through kernel
> memory with gdb.

.. but you can learn equally much with other methods that do *not* involve
the pain and suffering that is a kernel dump.

Setting up netconsole or the firewire tools is much easier. The firewire
thing in particular is nice, because it doesn't actually rely on the
target having to even know about it (other than enabling the "remote DMA
access" thing once on bootup).

If you've ever picked through a kernel dump after-the-fact, I just bet you
could have done equally well with firewire, and it would have had _zero_
impact on your kernel image. Now, contrast that with kdump, and ask
yourself: which one do you think is worth concentrating effort on?

 - kdump: lots of code and maintenance effort, doesn't work if the CPU
   locks up, requires a lot of learning to go through the dump.

 - firewire: zero code, no maintenance effort, works even if the CPU locks
   up. Still does require the same learning to go through the end result.

Which one wins? I know which one I'll push.

		Linus


Index Home About Blog