Index Home About Blog
From: Dennis Ritchie <dmr@bell-labs.com>
Subject: Re: History question: String literals.
Date: 02 Jun 1998
Newsgroups: comp.std.c

At the time that the C89 committee was working, writable
string literals weren't "legacy code" (Margolin) and what standard
there existed (K&R 1) was quite explicit (A.2.5) that
strings were just a way of initializing a static array.
And as Barry pointed out there were some (mktemp) routines
that used this fact.

I wasn't around for the committee's deliberations on the
point, but I suspect that the BSD utility for fiddling
the assembler code to move the initialization of strings
to text instead of data, and the realization that most
literal strings were not in fact overwritten, was more
important than some very early version of gcc.

Where I think the committee might have missed something
is in failure to find a formulation that explained
the behavior of string literals in terms of const.
That is, if "abc" is an anonymous literal of type
  const char [4]
then just about all of its properties (including the
ability to make read-only, and even to share its storage
with other occurrences of the same literal) are nearly
explained.

The problem with this was not only the relatively few
places that string literals were actually written on, but much
more important, working out feasible rules for assignments
to pointers-to-const, in particular for function's actual
arguments.  Realistically the committee knew that whatever
rules they formulated could not require a mandatory
diagnostic for every func("string") in the existing world.


So they decided to leave "..." of ordinary char array
type, but say one was required not to write over it.

This note, BTW, isn't intended to be read as a snipe
at the formulation in C89.  It is very hard to get things
both right (coherent and correct) and usable (consistent
enough, attractive enough).

	Dennis


Newsgroups: comp.std.c
From: "Douglas A. Gwyn" <gwyn@arl.mil>
Subject: Re: History question: String literals.
Date: Tue, 2 Jun 1998 19:33:57 GMT

Dennis Ritchie wrote:
> And as Barry pointed out there were some (mktemp) routines
> that used this fact.

However, a simple rewrite of such applications remedied this.
(Use an explicit non-const char array, initialized with the
string-literal syntactic form, in place of the in-line literal.)

> I wasn't around for the committee's deliberations on the
> point, but I suspect that the BSD utility for fiddling
> the assembler code to move the initialization of strings
> to text instead of data, and the realization that most
> literal strings were not in fact overwritten, was more
> important than some very early version of gcc.

GCC might have served as an example but not as motivation.
Partly the desire to have string literals in ROMmable data
was to support, er, ROMming.  I vaguely recall having used
a couple of C implementations (before the X3J11 decision was
made) where string literals were either automatically pooled
or stored in a constant data program section.  Given the
existing variety of practice and the availability of an easy
work-around when the original UNIX properties were wanted,
it seemed best to not try to guarantee uniqueness and
writability of string literals.

> Where I think the committee might have missed something
> is in failure to find a formulation that explained
> the behavior of string literals in terms of const.

We explored that possibility, but as you then said...

> The problem with this was ... working out feasible rules
> for assignments to pointers-to-const, in particular for
> function's actual arguments.

We had a hard enough time finding a suitable compromise for
this!  Type qualifiers brought a lot of problems, and the
incorporation of them into C89 has some weird aspects, e.g.
the difference in meaning of const qualification at the
first level and at any other level of pointer to pointer to
... in function parameters.


From: Dennis Ritchie <dmr@bell-labs.com>
Newsgroups: comp.std.c
Subject: Re: On the type of string literal
Date: Fri, 16 Jul 1999 07:45:47 +0100

Isaac Chen asked (with some reordering and redaction by me):

>     The type of string literal is 'array of n char', but its intended
> use is as if it's an 'array of n const char' because the result of
> modifying it is undefined.
> ...
>     Why didn't the Standard treat it as 'char *' and let those who
> need such optimization indicate so like this:
>
>         const char    *pcc = "string";    /* in ROM */
>         char    *pc = "string";  /* in R/W memory */

> If one uses all the string literal as 'const char *' and need such
> optimization but doesn't like to add all those 'const ...', a proper
> compiler switch will do. Even if no such switch exists in some
> compilers, the program would still work.

The early history is that C as described in K&R I, following its
predecessor languages, were unambiguous and explicit in describing
string literals as anonymous, static arrays of characters that
were initialized with the characters; some early routines like
mktemp() explicitly invited overwriting of the characters in
a literal string passed as an argument.

Later, it was realized that this was not necessarily a good idea
for a variety of reasons, even though it is utterly simple to
say and to describe:

 - As a general matter, it just seems pretty unclean.  In some
   sense, the appearance of "abcd" in a program looks sort of
   like a genuine constant.  Maybe one should think of
     char p = "abcd";
     p[2] = 'X';
   as just like
     i = 1;
     ...
     i = 2;
   but somehow it feels different.

 - As a practical matter, particularly in memory-constricted enviroments,
   people wanted to collect string literals and put them in ROM or shared
   memory-protected storage.  If the language rules permit overwriting,
   this can't be done except by agree-upon convention (which would have
   to be outside the language definition).

The ANSI committee that did C89 wanted (for a variety of reasons) to add
the notion of "const" as a type qualifier, basically to announce
that some objects could (if desired) be put in ROM, which would aid
a variety of optimizations and possibilities for verification.  The most
natural idea was to say that string literals, instead of having type
 static char[]
instead were
 static const char[]

The problem was that the rules about conversion during assignment
(including passing as function arguments) of pointers to const-qualified
things into not-const-qualified things meant that practically no program
in existence could avoid a mandatory diagnostic about a constraint
violation if string literals suddenly became 'const'; upwards
compatibility was needed.  It was not good if yesterday you declared a
function argument as 'char *' but today you are required to say 'const
char *' as you hand the function a string literal.

The not-completely-happy result is the current rule set, which says that
even though string literals can't (under the standard) be written
into, they don't have the const qualification attached.  Things
would be easier to describe if they did (or, for that matter, if there
were no 'const').  But that's the way things work out.

	Dennis

Index Home About Blog