Index Home About Blog
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: IA64 integer performance - was Re: The Forrest Curve (annual
Date: Thu, 28 Nov 2002 02:11:22 GMT
Message-ID: <20021127.211122.630@yktvmv.WATSON.IBM.COM>

In article <45022fc8.0211201907.72caccee@posting.google.com>,
 on 20 Nov 2002 19:07:27 -0800,
 iain-3@truecircuits.com (Iain McClatchie) writes:
>Michael> how many more GP registers (pointers) and FP
>Michael> registers (data) do you need to properly pipeline FP calculations ?
>
>This is a very interesting question!
>
>Michael> Let's assume a machine has four parallel FPUs. So we need
>Michael> registers for 20 parallel calculations.
>
>Okay.  Now imagine that you're blocking a matrix multiply.  For each inner
>block, you'll need to do a 4x5 matrix multiply in the register file.
>That's 20 registers for the output, 20 for each input, for 60.  So far,
>less than 64.

         This is not how to do it.  You should do 4xn by nx5 matrix
multiplies.  You need 20 registers to accumulate the inner products,
5 for the current row (in the nx5 matrix) and 1 for the current
column element (in the 4xn matrix).  This will fit in 32 registers.
                        James B. Shearer
PS: Repeat of post which my news server appears to have lost.

Index Home About Blog