Index Home About Blog
From: torek@elf.bsdi.com (Chris Torek)
Date: 15 Sep 97 06:16:43 GMT
Newsgroups: comp.std.c,comp.std.c++
Subject: Re: promotion - signed --> unsigned

In article <5vbsuq$rka@nfs0.sdrc.com> <larry.jones@sdrc.com> wrote:
>... In most pre-ANSI implementations, any unsigned operand of any
>size caused the result to also be unsigned.  This was later dubbed
>"unsignededness preserving rules" as opposed to the "value preserving
>rules" that ANSI ultimately adopted.  For what it's worth, that was, in
>my opinion, the single most controversial decision the ANSI committee
>made -- to this day there are those who argue that the other rules are
>better.

Indeed.  An excerpt from my article <5ml7b7$nca@solutions.solon.com>
(edited a bit for clarity and/or mistakes :-) ):

... in which the result of widening any narrow unsigned type was
`unsigned int'.  In ANSI C, it is either int or unsigned int,
depending on the implementation.  For instance:

	unsigned short s = ~(unsigned)0;
	int i; ...
	if (i < s)

In pre-ANSI C (e.g., under the old VAX PCC compiler), this always
meant the same as:

	if ((unsigned int)i < (unsigned int)USHRT_MAX)

In ANSI C, it sometimes means:

	if ((unsigned int)i < (unsigned int)USHRT_MAX)

but sometimes it means:

	if ((signed int)i < (signed int)USHRT_MAX)

Which of these occurs depends on the relative sizes of short and int.
If short is shorter than int, it means the latter; if short is the
same length as int, it means the former.

Thus:

	unsigned short s = USHRT_MAX;
	int i = -1;

	if (i < s)	/* eg, -1 < 65535 */
		printf("ANSI C, and sizeof(int) > sizeof(short)\n");
	else		/* eg, 0xffffU == 0xffffU or 0xffffffffU > 0xffffU */
		printf("pre-ANSI C, or sizeof(int) == sizeof(short)\n");

Either output can occur under a conforming ANSI C system.  This
is why the choice that was made for ANSI C is wrong.  The correct
choice, `unsigned preserving' semantics, does not depend on the
relative sizes of short and int, and will always do an unsigned
comparison, so that we always test UINT_MAX >= USHRT_MAX, and the
second printf fires.  But Plauger insisted on the broken `value
preserving' semantics, arguing that it more often did what most
programmers expected.  The problem with this argument is that it
only does what these programmers expect when sizeof(int) >
sizeof(short) -- when the sizes are the same, it acts like pre-ANSI
C, so programmers cannot assume a signed comparison will occur.
It would be better always to do an unsigned comparison, giving a
fixed answer, rather than an implementation-specific answer.

As the Rationale points out, however, any difference in actual
behavior occurs only rarely.  Since the Rationale is missing from
the ISO edition, I quote:

	In [most] implementations, differences between [the
	result in an unsigned-preserving system and a value-
	preserving system] only appear when these two conditions
	are both true:

	 1. An expression involving an |unsigned char| or
	    |unsigned short| produces an |int|-wide result in
	    which the sign bit is set: i.e., either a unary
	    operation on such a type, or a binary operation in
	    which the other operand is an |int| or ``narrower''
	    type.

	 2. The result of the preceding expression is used in
	    a context in which its signedness is significant:

		o  |sizeof(int) < sizeof(long)| and it is in a
		   context where it must be widened to a long
		   type, or

		o  It is the left operand of the right-shift
		   operator (in an implementation where this
		   shift is defined as arithmetic), or

		o  It is either operand of /, %, <, <=, >, or >=.

	In such circumstances a genuine ambiguity of interpretation
	arises.  The result must be dubbed /questionably signed/ ....
	Of course, /all of these ambiguities can be avoided by a
	judicious use of casts./

[emphasis theirs]

This glosses over the difficulty of obtaining the correct cast when
the type in question is obtained via a typedef from a header whose
contents are not supposed to be examined, and/or which varies from
one system to the next.  (Consider, e.g., `uid_t' from some POSIX
header.  Is it signed?  What are the signed and unsigned variants
of this type?  Do they vary from one system to the next?  [The answer
to the last is a definite "yes".])
--
In-Real-Life: Chris Torek, Berkeley Software Design Inc
El Cerrito, CA	Domain:	torek@bsdi.com	+1 510 234 3167
Antispam notice: unsolicited commercial email will be handled at my
consulting rate; pyramid-scheme mail will be forwarded to the FTC.

Index Home About Blog