64-bit divide (Jamie Lokier; Linus Torvalds)

Index Home About Blog

From: torvalds@transmeta.com (Linus Torvalds)
Newsgroups: linux.dev.kernel
Subject: Re: 64 bit divide/mod in 2.4.0-test5
Date: 4 Aug 00 23:24:44 GMT

In article <d3n1itxely.fsf@lxplus015.cern.ch>,
Jes Sorensen  <jes@linuxcare.com> wrote:
>
>Well you are seeing those because it's deprecated, you should use a
>shift operator instead if in any way possible.

Indeed. Basically, I _know_ that I could include libgcc, but I also do
not believe that people should get code that looks simple but takes
hundreds of cycles to complete.

The fact that 64-bit divisions do not work naturally has actually saved
us a number of times from people just not thinking about what an
expensive operation such a division is - quite often there is a trivial
shift that can do the same thing in two cycles or similar.

There's some special-case code in <asm/div64.h>, which is used mainly
for printk(), but can be used for other stuff too: a 64-by-64 divide is
usually much slower than a 64-by-32 divide which is actually the one
that most people are looking for anyway (but gcc isn't clever enough to
know this). That can be used if you _really_ think you need the divide.
I've yet to see a real reason for it outside the printk() stuff..

		Linus

From: torvalds@transmeta.com (Linus Torvalds)
Newsgroups: linux.dev.kernel
Subject: Re: 64 bit divide/mod in 2.4.0-test5
Date: 6 Aug 00 05:40:04 GMT

On Sun, 6 Aug 2000, Jamie Lokier wrote:
>
> You're aware of the algorithm for printing digits that doesn't use
> any divisions, and decided not to use it, right?

I'm aware of at least one such algorithm, yes. However, that one is
designed for a single base (or a small number of bases) and basically just
creates a nice table in memory of the powers-of-base or uses pre-computed
tables.

That one only uses simple subtraction to generate the numbers, but would
have been a much bigger change to existing code. The differences for each
base gets moderately ugly too.

Is that the one you're talking about? Basically

	i = MAX_DIGITS;
	do {
		unsigned long long thispower = power_table[--i];
		char c = '0';
		while (thispower < num) {
			num -= thispower;
			c++;
		}
		output(c);
	} while (i);

or is there something more clever that can handle arbitrary bases cleanly
(not that we actually use arbitraty bases: the basic vsprintf routines
could handle them, but we limit the bases to the normal 8, 10 and 16
anyway - and as 8 and 16 can be handled with shifting only one really
requires any real computation)?

		Linus

From: lk@tantalophile.demon.co.uk (Jamie Lokier)
Newsgroups: linux.dev.kernel
Subject: Re: 64 bit divide/mod in 2.4.0-test5
Date: 6 Aug 00 19:57:16 GMT

Linus Torvalds wrote:
> I'm aware of at least one such algorithm, yes. However, that one is
> designed for a single base (or a small number of bases) and basically just
> creates a nice table in memory of the powers-of-base or uses pre-computed
> tables.
> [...]
> Is that the one you're talking about?

Yes, except instead of precomputing, you can generate power_table each
time by doing successive multiplications with the power.  You only need
to generate as many entries as digits in the current value.  The
multiplications are trivial if it's a constant power.

Of course 10 is the only significant non-power-of-two base anyway.

I've never tried that algorithm but I would guess it's faster than long
division.

-- Jamie

From: torvalds@transmeta.com (Linus Torvalds)
Newsgroups: linux.dev.kernel
Subject: Re: 64 bit divide/mod in 2.4.0-test5
Date: 6 Aug 00 20:04:46 GMT

On Sun, 6 Aug 2000, Jamie Lokier wrote:
>
> I've never tried that algorithm but I would guess it's faster than long
> division.

Quite possible. I would not be adverse to doing this: the do_div() thing
is there mainly because it was much easier to move the existing code-base
over, and the original vsprintf used division.

If somebody wants to write the code...

		Linus

From: torvalds@transmeta.com (Linus Torvalds)
Newsgroups: linux.dev.kernel
Subject: Re: 64 bit divide/mod in 2.4.0-test5
Date: 5 Aug 00 00:51:17 GMT

In article <8mflv4$lit$1@cesium.transmeta.com>,
H. Peter Anvin <hpa@zytor.com> wrote:
>
>There is also big difference on some architectures between 64/32 = 64
>and 64/32 = 32 (the latter is a single, albeit slow, instruction on
>x86.)  I'm rather surprised gcc doesn't handle this -- it seems as an
>obvious optimization.
>
>Could we get the gcc people to look at this?

Probably not.

Doing 64/32->32 thing as a single instruction on x86 is an illegal
optimization: if your source code looks like

	unsigned long a, b;
	unsigned long long c;

	a = c / b;

then according to the C language the c/b division is actually a
64/promoted64 bit division, and then the 64-bit result casted down to 32
bits.

What's the difference? Overflow. The cast from 64 bits to 32 bits cannot
overflow, so gcc would have to basically always do the 64/32->64 bit
division, and then just silently drop the extra bits in order to get the
right end result.

Now, that 64/32->64 bit division is fairly simple on x86 (certainly
simpler than the 64/64->64 one), and that's actually what do_div() will
use on x86.  But it's two "div" instructions, not one (the first one can
often be optimized away if the upper 32 bits are usually 0, which is the
case 99%+ of the time with most regular distributions of numbers).

Note that the 64/32->64 division is a _lot_ simpler than the 64/64->64
one, so this is definitely still worth doing. But it's not a single
instruction.

		Linus

Index Home About Blog