Timer wrapping-around in C (Johannes Stezenbach; Linus Torvalds)

Index Home About Blog

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] CFS scheduler, -v8
Date: Sat, 05 May 2007 17:46:11 UTC
Message-ID: <fa.1osJe/cWYY5GNYi1E4hit+UIV3c@ifi.uio.no>

On Sat, 5 May 2007, Esben Nielsen wrote:
>
> I have been wondering why you use usigned for timers anyway. It is also like
> that in hrtimers. Why not use signed and avoid (almost) all worries about wrap
> around issues. The trick is that when all
>   a < b
> is be replaced by
>   a - b < 0
> the code will work on all 2-complement machines even if the (signed!) integers
> a and b wrap around.

No. BOTH of the above are buggy.

The C language definition doesn't allow signed integers to wrap (ie it's
undefined behaviour), so "a-b < 0" can be rewritten by the compiler as a
simple signed "a < b".

And the unsigned (or signed) "a < b" is just broken wrt any kind of
wrap-around (whether wrapping around zero or the sign bit).

So the _only_ valid way to handle timers is to
 - either not allow wrapping at all (in which case "unsigned" is better,
   since it is bigger)
 - or use wrapping explicitly, and use unsigned arithmetic (which is
   well-defined in C) and do something like "(long)(a-b) > 0".

Notice? The signed variant is basically _never_ correct.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] CFS scheduler, -v8
Date: Sun, 06 May 2007 17:46:38 UTC
Message-ID: <fa.mukcUBZHXqDb03J9/AnFeyYkO6o@ifi.uio.no>

On Sun, 6 May 2007, Ingo Molnar wrote:
>
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > So the _only_ valid way to handle timers is to
> >  - either not allow wrapping at all (in which case "unsigned" is better,
> >    since it is bigger)
> >  - or use wrapping explicitly, and use unsigned arithmetic (which is
> >    well-defined in C) and do something like "(long)(a-b) > 0".
>
> hm, there is a corner-case in CFS where a fix like this is necessary.
>
> CFS uses 64-bit values for almost everything, and the majority of values
> are of 'relative' nature with no danger of overflow. (They are signed
> because they are relative values that center around zero and can be
> negative or positive.)

Well, I'd like to just worry about that for a while.

You say there is "no danger of overflow", and I mostly agree that once
we're talking about 64-bit values, the overflow issue simply doesn't
exist, and furthermore the difference between 63 and 64 bits is not really
relevant, so there's no major reason to actively avoid signed entries.

So in that sense, it all sounds perfectly sane. And I'm definitely not
sure your "292 years after bootup" worry is really worth even considering.

When we're really so well off that we expect the hardware and software
stack to be stable over a hundred years, I'd start to think about issues
like that, in the meantime, to me worrying about those kinds of issues
just means that you're worrying about the wrong things.

BUT.

There's a fundamental reason relative timestamps are difficult and almost
always have overflow issues: the "long long in the future" case as an
approximation of "infinite timeout" is almost always relevant.

So rather than worry about the system staying up 292 years, I'd worry
about whether people pass in big numbers (like some MAX_S64 approximation)
as an approximation for "infinite", and once you have things like that,
the "64 bits never overflows" argument is totally bogus.

There's a damn good reason for using only *absolute* time. The whole
"signed values of relative time" may _sound_ good, but it really sucks in
subtle and horrible ways!

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] CFS scheduler, -v8
Date: Mon, 07 May 2007 16:31:11 UTC
Message-ID: <fa.77dk21H7rTR/Xgrn6p5vqly3Q6w@ifi.uio.no>

On Mon, 7 May 2007, Esben Nielsen wrote:
>
> What is (long)(a-b) ? I have tried to look it up in the C99 standeard but I
> can't find it. Maybe it is in the referred LIA-1 standeard, which I can't find
> with google.

I don't worry about non-2's-complement machines (they don't exist, and
likely won't exist in the future either).

So I worry about compilers rewriting my code.

So "(long)(a-b) < 0" (with "a" and "b" being unsigned long) is basically a
portable way of testing the high bit of the result.

> I think the best would be to use "a-b > ULONG_MAX/2" when you mean "a<b" as
> that should be completely portable.

That certainly works too, but the difference is irrelevant, since Linux is
unlikely to work on insane machines anyway (ie we do make a lot of other
assumptions about the architecture, being two's-complement is the least of
those).

So you basically shouldn't worry about hardware: everybody is pretty much
the same. You should worry about *compilers* - that's where the
differences show up.

So "(long)(a-b)" may be "implementation defined" (but since
implementations are all 2's complement, we don't care), but a signed
"(a-b)" that over/overflows is *undefined*, and that is much worse because
it means that the compiler can do some funky stuff, and _that_ is a real
practical worry.

And no, I also don't worry about porting Linux to 18-bit machines, or to
ternary CPU's.

			Linus

From: Johannes Stezenbach <js@linuxtv.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] CFS scheduler, -v8
Date: Mon, 07 May 2007 18:39:48 UTC
Message-ID: <fa.+k2erSwbHTdqnfXDd8PzDjW6n5U@ifi.uio.no>

On Mon, May 07, 2007, Linus Torvalds wrote:
> On Mon, 7 May 2007, Esben Nielsen wrote:
> >
> > What is (long)(a-b) ? I have tried to look it up in the C99 standeard but I
> > can't find it. Maybe it is in the referred LIA-1 standeard, which I can't find
> > with google.

C99 defines unsigned overflow semantics, but it doesn't say anything
about signed overflow, thus it's undefined -- and you have a hard
time finding it out.

However, I have no clue *why* it's undefined and not
implementation defined. Does someone know?

> I don't worry about non-2's-complement machines (they don't exist, and
> likely won't exist in the future either).

I think DSPs can do saturated arithmetics (clamp to min/max
values instead of wrap around). Not that it matters for Linux...

> So I worry about compilers rewriting my code.

gcc has -fwrapv and -ftrapv to change signed integer overflow
behaviour.

One baffling example where gcc rewrites code is when
conditionals depend on signed integer overflow:

$ cat xx.c
#include <assert.h>

int foo(int a)
{
	assert(a + 100 > a);
	return a;
}

int bar(int a)
{
	if (a + 100 > a)
		a += 100;
	return a;
}
$ gcc -Wall -Wextra -fomit-frame-pointer -c xx.c
$ objdump -dr xx.o

xx.o:     file format elf32-i386

Disassembly of section .text:

00000000 <foo>:
   0:   8b 44 24 04             mov    0x4(%esp),%eax
   4:   c3                      ret

00000005 <bar>:
   5:   83 44 24 04 64          addl   $0x64,0x4(%esp)
   a:   8b 44 24 04             mov    0x4(%esp),%eax
   e:   c3                      ret

The assert and the condition were just dropped
by gcc -- without any warning.

gcc-4.2 will add -fstrict-overflow and -Wstrict-overflow.
http://gcc.gnu.org/gcc-4.2/changes.html

Johannes

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] CFS scheduler, -v8
Date: Mon, 07 May 2007 18:57:52 UTC
Message-ID: <fa.Qcdex90nX406A32cFskNg/nDRK4@ifi.uio.no>

On Mon, 7 May 2007, Johannes Stezenbach wrote:
>
> One baffling example where gcc rewrites code is when
> conditionals depend on signed integer overflow:

Yes. This is one of my favourite beefs with gcc. Some of the optimization
decisions seem to make no sense.

Your example is a good one, but my private beef has been in alias
handling. Alias analysis is an important part of optimization, and there's
two kinds: the static (and exact, aka "safe") kind that you can do
regardless of any language definitions, because you *know* that you aren't
actually changing behaviour, and the additional type-based heuristics that
the C language allows.

So which ones would you expect a compiler to consider more important?

And which one do you think gcc will use?

Right. You can have static analysis that *guarantees* that two objects
alias, but if gcc determines that they have different types and thus might
not alias, it decides to use the heuristic instead of the firm knowledge,
and generate code that doesn't work.

"Because the language definition allows it".

Oh well.

		Linus

Index Home About Blog