Index Home About Blog
From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH 0/5] ftrace: to kill a daemon
Date: Fri, 08 Aug 2008 19:06:04 UTC
Message-ID: <fa.OSkcdau22kgmqddE0pfq+qzCPUo@ifi.uio.no>

On Fri, 8 Aug 2008, Steven Rostedt wrote:
>
> Can a processor be preempted in a middle of nops?

Sure. If you have two nops in a row (and the kernel definition of the NOP
array does _not_ guarantee that it's a single-instruction one), you may
get a profile hit (ie any interrupt) on the second one. It's less
_likely_, but it certainly is not architecturally in any way guaranteed
that the kernel "nop[]" tables would be atomic.

> What do nops do for a  processor?

Depends on the microarchitecture. But many will squash it in the decode
stage, and generate no uops for them at all, so it's purely a decode
throughput issue and has absolutely _zero_ impact for any later CPU
stages.

> Can it skip them nicely in one shot?

See above. It needs to decode them, and the decoder itself may well have
some limitations - for example, the longer nops may not even decode in all
decoders, which is why some uarchs might prefer two short nops to one long
one, but _generally_ a nop will not make it any further than the decoder.
But separate nops do count as separate instructions, ie they will hit all
the normal decode limits (mostly three or four instructions decoded per
cycle).

> I'm assuming that jmp is more expensive than the nops because otherwise
> a jmp 0 would have been used as a 5 byte nop.

Yes. A CPU core _could_ certainly do special decoding for 'jmp 0' too, but
I don't know any that do. The 'jmp' is much more likely to be special in
the front-end and the decoder, and can easily cause things like the
prefetcher to hickup (ie it tries to start prefetching at the "new" target
address).

So I would absolutely _expect_ a 'jmp' to be noticeably more expensive
than one of the special nop's that can be squashed by the decoder.

A nop that is squashed by the decoder will literally take absolutely no
other resources. It doesn't even need to be tracked from an instruction
completion standpoint (which _may_ end up meaning that a profiler would
actually never see a hit on the second nop, but it's quite likely to
depend on which decoder it hits etc).

			Linus

Index Home About Blog