Index Home About Blog
Newsgroups: comp.arch
From: mash@mash.wpd.sgi.com (John R. Mashey)
Subject: Re: Why isn't hardware designed to run software? [A: do it in s/w]
Date: Wed, 29 Dec 1993 22:42:21 GMT

In article <rfgCIqGAu.AGv@netcom.com>, rfg@netcom.com
	(Ronald F. Guilmette) writes:

> (Note that I have NOT mentioned Cray.  One poster here mentioned how much
> money would be involved in providing 12.5% more memory on a Cray.  My only
> response is ``Frankly my dear, I don't give a damn.''  I am not likely to
> own one of *those* beaties in the near future anyway, so the whole subject
> seems about as relevant as the fact that I would suffocate if I were suddenly
> transported to the surface of the moon.  That isn't bloody likely to happen
> anytime soon either.)
>
> I think the bottom line is that the kind of feature I'm speaking of is not
> really all that different from the watchpoint registers available on x86
> machines.  These registers provide a fast, hardware-based solution for a
> problem that could (in theory) be done in software.  But as anyone who
> has ever had need of these watchpoint registers can testify, the hardware
> solution definitely has its advantages... and can be had for little cost,
> either in terms of $$$'s or in terms of effects upon overall performance.
> 
> In short, as I said originally, I think its well past time for hardware
> designers to start taking the real-life problems of software developers
> a bit more seriously.  You hardware guys may not get any more dhrystones
> or SPECmarks out of the deal, but you may win some additional converts
> amoung us software people all the same.  (And please don't forget who it
> is that *really* makes your systems attractive to end-users, ok?)

1) This request arises again and again, but *really* one can do a lot
of it [uninitialized variable-checking & subscript-range-checking,
or the related ADA constraint-checks] in software.  [I looked at this
20 years ago in detail and even then, one could do it].  WATFIV often
found bugs in "production" code, to people's great consternation. :-)

2) For some reason, people seem to assume that hardware people are unreceptive
to such thing for no good reason:
	a) We (MIPS folks, H/W + S/W) have looked into debugging/checking
	tag features for almost every chip we've ever done, and we've *never*
	been able to figure out features that were truly useful that didn't
	have severe performance hits/implementation problems of one sort
	or another, despite having among the design teams software people
	that are passionate advocates of such things.  It always came down
	to hardware folks being perfectly willing to:
		a) Tell us (sw folks) the cost of an implementation
		b) If we could tell them *what* we wanted, and that we were
		   sure that what we wanted was worth the price.
		c) And us never being able to design something that we thought
		   was really good enough to be worth it.
		d) Of course, this might just have been lack of imagination
		   on our part, but it might also be because it's hard.
		   It is possible that highly-super-scalar-issue chips may
		   do better, and we'll look at this again for post-T5 chips.
		e) Personally, I'm still convinced there is *some*
		   truly general mechanism that survives high-performance
		   implementations and helps error-checking+tagged,
		   but that we still haven't figured out the right thing,
		   despite conversations with truly world-class folks.
		   Having seen past attempts in hardware turn out not to be
		   quite right, we have to be *sure* we've got
		   something useful.
	b) People have used bad-parity bits, years ago for such purposes ...
	   but not in any modern designs that I know of, and especially not
	   in byte-addressed machines / cached machines.  In particular,
	   the load-from-cache path is *often* in the critical path,
	   and people are often loath to do anything else with it,
	   although the cost varies from design to design.
	c) While the cost of CRAY memory may not be relevant to most people,
	   DRAM still costs money, but even worse, there is a chicken and
	   egg problem due to the cost advantages of standard SIMMs, i.e.,
	   no one would want to use an extra bit/byte, until there are SIMMs
	   that size, and no one will make them until there is a market :-)
	d) Things like watch registers are *really* useful ... but sometimes,
	   software requirements make them less useful than you'd like,
	   i.e., if a debugger lets you set an arbitrary number of
	   watch points, it's hard to do that in hardware :-)  
	
3) In any case, the *ideal* thing is a compiler option that generates
the checks ... but optimizes them well; such "constraint-check elimination"
is found in some ADA compilers, for example. 
In modern optimizers, there is plenty of information around to make many
checks disappear; it would really be nice to see *someone* work hard on'
this problem, using current compilers, and see what the actual
run-time cost would be.  If I had to guess, it's maybe 5-10% for many
kinds of applications, at least for the checking, assuming that there is
not too much overhead for initializing data to "uninitialized" like
WATFIV did.  A general scheme for uninitialized-reference detection could be:

a) For each reference to each variable, maintain a bit that
indicates "known-to-be-defined", computed by:
	a) Any assignment to that variable sets "defined"
	b) Any code in which all paths to that usage are marked defined,
	   means that it is defined.
	c) Any code in which one is not sure that it is defined require
	   generating a check, and then considering it defined.
b) For the entire code, if a check is ever generated for a specific
variable, then generate prolog code to initialize it to the "uninitialized"
value, else one doesn't need this code.  On most machines, floating
point hardware supports NaNs or equivalent, so no code need ever
be generated to do checks, just (possibly) to do the initializations.

c) Start with minimalist optimization, then improve it until it gets
"good enough".

Let's try a few examples:
	C code:
	a) Assume that any function can *assume* that any values passed to it
	   are defined.  [Why is this a good idea?  because quite often the
	   caller knows something is defined, and they might even be constants;
	   why should hardware take any kind of hit whatsoever to check things
	   that are known to be constants].
	b) Any variable initialized in its definition need *never* be checked;
	   const variables (I think) need not be checked.
	c) In a lot of cases, the same optimizations that do value-propagation
	   or code-motion can help you eliminate checks:

	func1(x) {
		int y;
		if (x)
			y = x+1;
		else
			y = -1;
		if (y > 5)
			....
	Needs no undefined-variable checks whatsoever; a good optimizer likely
	knows that y has been set for sure.

	func2(xp) int *xp;
		{

		if (*xp)
			*xp = *xp + 1;
		...

	That needs *one* check, no matter how many times *xp is
	referenced, assuming xp doesn't change [and compilers already
	look hard at pointers].  I.e., thoughout the entire code body
	of func2, the compiler *knows* that the item referenced by *xp
	was checked on the first access, and either it faulted then, or
	not, but it doesn't need to be checked again.

	func3(xp, yp, zp;) int *xp,*yp, *zp;
		{
			int i;
		for (i =0; i <100; i++)
			*zp++ = *xp + *yp;
		...

	This ought to generate 2 checks:
	a) AN optimizer would like to pull the expression *xp + *yp out of
	the loop, but couldn't, due to possible aliasing of zp.
	b) But, there is always at least one *reference* to each of
	*xp and *yp, and the addresses don't change, so the optimizer could
	move the *checks* in front of the loop...

	Compared to C, FORTRAN is probably easier: plusses and minues of
	each might be:

	C					FORTRAN
	- Pointers everywhere			+ No pointers
	+ BSS initialized to zero, by definition- Memory not guaranteed
	- Pointers not guaranteed to non-alias	+ Array parameters not aliased
	- Loop terminations can be complex	+ Loops (in practice) simple
	+ const
	- moderate use of 8 & 16-bit data	+ little use, i.e., if values
	  these may be harder to check		  are 32- and 64-bit, it's
						  easier to find "uninit"
						  values
						+ Heavy use of floating point:
						  most chips have NaNs
	+ Call-by-value				- Call-by-reference 
Anyway, given the kinds of analysis that current good optimizers do,
I don't think this sort of analysis is all that bad to do; it would be nice
to see some research on this [maybe it's around, I haven't been following
it, but I do believe that it's possible to generate quite reasonable code,
and there is *nothing* about this technique that is very hardware-dependent
(except having natural NaNs or not).  

4) Subscript-out-of-range detection/optimization is more work;
maybe somebody who's done this in ADA would comment. I expect this is
easier in FORTRAN than in C, and in any case, there's no chance of doing
this very well in hardware for current machines and languages.

SUMMARY:
1) There are subtle interactions between:
	a) Instruction-set-architecture
	b) Implementation
	c) Language-design [some features cause more trouble than others]
	d) Level of copmpiler technology.
and how much checking can be done easily.

2) There is *little chance* of getting general hardware uninitialized-variable
	checking into currently-popular CPU families, and subscript-checks
	are probably even worse.  About the best that can be hoped for,
	from hardware, are a few instructions (like TRAP on MIPS & others)
	that do a check quickly, expecting to succeed, and trap otherwise),
	and maybe (if someone every can figure out the right thing), some
	hardware pattern-match-and-trap feature general enough to cover
	enough cases to make it worth having.  There is *little chance* of
	using extra bits in the memory system.

3) BUT: current optimizing compilers already keep track of much information
that could help optimize many of the checks away, and they could do it with
existing machines.  Hence, the likeliest hope of achieving this (desirable)
goal is:
	a) Encouraging more compiler research
	b) Encouraging compilers writers to do it.
	   


-john mashey    DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP:    mash@sgi.com 
DDD:    415-390-3090	FAX: 415-967-8496
USPS:   Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311

Index Home About Blog