Index Home About Blog
From: Terje Mathisen <terje.mathisen@hda.hydro.com>
Newsgroups: comp.arch
Subject: Re: Floating point required exponent range?
Date: Fri, 06 Feb 2004 13:24:35 +0100
Message-ID: <c00124$4gb$1@osl016lin.hda.hydro.com>

Nick Maclaren wrote:
> In order to do a fixup, you need the following as well:
>
>     The ability to return a specific, changed result value.
>
>     All operations that depend on the result value must be suspended
>     before the interrupt is taken and restarted afterwards, using
>     the supplied value.
>
> Now, if you think about it, the aspects that are seriously hard to
> do in an out-of-order, pipelined environment are the last two.  The

Pipelining makes it harder, but OoO should simplify things a lot:

An OoO cpu must contain all the hw needed to stop, flush and restart
anything that hasn't actually been comitted (written back to memory),
just so it can recover from stuff like branch predictor misses, right?

Extending this to doing the same for fp problems really doesn't seem
like a huge step to me, so why isn't it done?

Terje

--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"


From: Terje Mathisen <terje.mathisen@hda.hydro.com>
Newsgroups: comp.arch
Subject: Re: Floating point required exponent range?
Date: Fri, 06 Feb 2004 21:10:59 +0100
Message-ID: <c00scj$lcg$1@osl016lin.hda.hydro.com>

Jan C. Vorbrüggen wrote:

>>An OoO cpu must contain all the hw needed to stop, flush and restart
>>anything that hasn't actually been comitted (written back to memory),
>>just so it can recover from stuff like branch predictor misses, right?
>>
>>Extending this to doing the same for fp problems really doesn't seem
>>like a huge step to me, so why isn't it done?
>
>
> Because the fp operations are long latency, and much longer than the
> non-fp ones. If you are going to support precise fp exceptions, that
> will seriously shorten your OoO instruction window, because you cannot
> commit later instructions before the oldest fp instructions commits
> or has at least progressed so far that you can decide it will not generate
> an exception. You can short-cut that point for multiply and divide, but

If an x86 with just 8 registers can maintain (or require!) 80 or more
renamed versions of those, just so that it can keep going for up to
20-40 cycles past a branch that still hasn't been finally determined,
handling 3-5 cycles of fp latency is still easy.

On the latest P4, every L1 cache _hit_ takes 4 cycles, which is almost
exactly the same as most fp operations.

Why should handling one of them be a 'gimme', and the other one 'way too
expensive'?

> not for add and subtract. In addition, in an OoO machine it becomes quite
> complicated to save the content of the input registers to such operations,
> as they might have already been recycled (the physical registers that
> renamed these particular versions of the architectural registers, of course).

Why?

See above, fp (except for FDIV/FSQRT) has comparable latency to _fast_
memory operations.

> But you want those values so that you can understand want was going on when
> the exception occurred. The third mode for the Alpha had the compiler assure
> that the input registers weren't reused in a sufficiently long shadow of
> the fp operations, so that exception software was guaranteed to be able to
> pick up the input values safely. (Well, at least that is how I understand
> it should work.)

That does sound doable if you don't want to include full hw rollback
capability.

Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"


From: old_systems_guy@yahoo.com (John Mashey)
Newsgroups: comp.arch
Subject: Re: Floating point required exponent range?
Date: 7 Feb 2004 19:59:42 -0800
Message-ID: <ce9d692b.0402071959.3811acb1@posting.google.com>

Terje Mathisen <terje.mathisen@hda.hydro.com> wrote in message news:<c00124$4gb$1@osl016lin.hda.hydro.com>...
> Nick Maclaren wrote:
> > In order to do a fixup, you need the following as well:
> >
> >     The ability to return a specific, changed result value.
> >
> >     All operations that depend on the result value must be suspended
> >     before the interrupt is taken and restarted afterwards, using
> >     the supplied value.
> >
> > Now, if you think about it, the aspects that are seriously hard to
> > do in an out-of-order, pipelined environment are the last two.  The
>
> Pipelining makes it harder, but OoO should simplify things a lot:
>
> An OoO cpu must contain all the hw needed to stop, flush and restart
> anything that hasn't actually been comitted (written back to memory),
> just so it can recover from stuff like branch predictor misses, right?
>
> Extending this to doing the same for fp problems really doesn't seem
> like a huge step to me, so why isn't it done?

[Agreeing with Terje].

Imprecise (and other) exceptions go back, at least to 360/91 and
360/67.

Consider 3 kinds of pipelines:
(1) In-order-execute, in-order-completion: imprecise exceptions don't cause
trouble, but of course, FP performance will be slow.

(2) Speculative O-O-O CPus need not have a problem, and at least some
don't, i.e., some do support precise exceptions.  This is not that hard,
because:

(a)  When instructions are actually executed, they can never permanently
     record any change of state, but rather they record:
     - value of result register(s)
     - value of condition codes, flags, etc.
     - an exception flag & type

(b) When branches are resolved, instructions are committed/retired *in order*,
    and only then do interrupts really happen, and they have no trouble
    being precise.  After all, it is not a good idea for speculative CPUs
    to generate exceptions on instructions that only got scheduled due to
    mispredicted branches.

(3) This is really only a problem for in-order-issue, out-of-order completion
CPUs [which is why it showed up in early Alphas, Power chips, etc.]
Of course, there is a fairly cheap technique [used in MIPS R20x0,
R30x0, R4000]for avoiding imprecise FP exceptions.  See Craig Hansen's
US Patent 4,879,676,
"Method and apparatus for precise floating point exceptions
[which is actually fairly clear, unlike many patents. :-)]

Briefly, when an FP op is issued, it does a quick check of the operands
and stalls the pipeline if an exception appears *possible*.
Using less (or more) bits of the exponents makes
this more (or less) conservative.  In practice, it is quite feasible to
keep the number of overly-conservative stalls unnoticable.

See Hennessy & Patterson, 3rd Edition, chapter 3.7, also "exceptions" in
Appendix H (i.e., only online at www.mkp.com/CA3).

Index Home About Blog