Compiler barriers (Linus Torvalds)

Index Home About Blog

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH 0/24] make atomic_read() behave consistently across all
Date: Tue, 21 Aug 2007 05:50:49 UTC
Message-ID: <fa.I8x1v8BCYtKSAvPrCIyhoodvMX0@ifi.uio.no>

On Mon, 20 Aug 2007, Chris Snook wrote:
>
> What about barrier removal?  With consistent semantics we could optimize a
> fair amount of code.  Whether or not that constitutes "premature" optimization
> is open to debate, but there's no question we could reduce our register wiping
> in some places.

Why do people think that barriers are expensive? They really aren't.
Especially the regular compiler barrier is basically zero cost. Any
reasonable compiler will just flush the stuff it holds in registers that
isn't already automatic local variables, and for regular kernel code, that
tends to basically be nothing at all.

Ie a "barrier()" is likely _cheaper_ than the code generation downside
from using "volatile".

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH 0/24] make atomic_read() behave consistently across all
Date: Tue, 21 Aug 2007 16:52:11 UTC
Message-ID: <fa.pnnfO2qJDYbEXfFExJ1tl+4zlnU@ifi.uio.no>

On Tue, 21 Aug 2007, Chris Snook wrote:
>
> Moore's law is definitely working against us here.  Register counts, pipeline
> depths, core counts, and clock multipliers are all increasing in the long run.
> At some point in the future, barrier() will be universally regarded as a
> hammer too big for most purposes.

Note that "barrier()" is purely a compiler barrier. It has zero impact on
the CPU pipeline itself, and also has zero impact on anything that gcc
knows isn't visible in memory (ie local variables that don't have their
address taken), so barrier() really is pretty cheap.

Now, it's possible that gcc messes up in some circumstances, and that the
memory clobber will cause gcc to also do things like flush local registers
unnecessarily to their stack slots, but quite frankly, if that happens,
it's a gcc problem, and I also have to say that I've not seen that myself.

So in a very real sense, "barrier()" will just make sure that there is a
stronger sequence point for the compiler where things are stable. In most
cases it has absolutely zero performance impact - apart from the
-intended- impact of making sure that the compiler doesn't re-order or
cache stuff around it.

And sure, we could make it more finegrained, and also introduce a
per-variable barrier, but the fact is, people _already_ have problems with
thinking about these kinds of things, and adding new abstraction issues
with subtle semantics is the last thing we want.

So I really think you'd want to show a real example of real code that
actually gets noticeably slower or bigger.

In removing "volatile", we have shown that. It may not have made a big
difference on powerpc, but it makes a real difference on x86 - and more
importantly, it removes something that people clearly don't know how it
works, and incorrectly expect to just fix bugs.

[ There are *other* barriers - the ones that actually add memory barriers
  to the CPU - that really can be quite expensive. The good news is that
  the expense is going down rather than up: both Intel and AMD are not
  only removing the need for some of them (ie "smp_rmb()" will become a
  compiler-only barrier), but we're _also_ seeing the whole "pipeline
  flush" approach go away, and be replaced by the CPU itself actually
  being better - so even the actual CPU pipeline barriers are getting
  cheaper, not more expensive. ]

For example, did anybody even _test_ how expensive "barrier()" is? Just
as a lark, I did

	#undef barrier
	#define barrier() do { } while (0)

in kernel/sched.c (which only has three of them in it, but hey, that's
more than most files), and there were _zero_ code generation downsides.
One instruction was moved (and a few line numbers changed), so it wasn't
like the assembly language was identical, but the point is, barrier()
simply doesn't have the same kinds of downsides that "volatile" has.

(That may not be true on other architectures or in other source files, of
course. This *does* depend on code generation details. But anybody who
thinks that "barrier()" is fundamentally expensive is simply incorrect. It
is *fundamnetally* a no-op).

		Linus

Index Home About Blog