Index Home About Blog
From: henry@spsystems.net (Henry Spencer)
Newsgroups: comp.compilers
Subject: Re: Compiler Books?
Date: 1 Nov 2003 12:02:25 -0500
Message-ID: <03-11-007@comp.compilers>
Keywords: books, C

VBDis <vbdis@aol.com> wrote:
>IMO top-down parsers are easier to understand than bottom-up
>parsers. Even if both types can be used for C and Pascal, bottom-up
>parsers are commonly described and used for C, and top-down parsers
>for Pascal and other "Wirthian" languages.
>...if your language has to be somewhat compatible with C,
>you have to go the harder way.

Sorry, not so.  C was designed for top-down parsing; the first C
compiler, Dennis Ritchie's own, used recursive descent with some
flourishes.

Even ANSI C is not that hard to parse top-down.  It requires some work
and thought, not just a mechanical transcription of the standard's
grammar.  There are a few places where it helps to peek ahead one
extra token, to decide which way to go in a timely way.  And of course
you need some interaction with the symbol table to get typedefs right.
Some of this is annoying but none of it is prohibitive; I've done it.

Incidentally, Bjarne Stroustrup is on record as saying that letting
Aho and Johnson talk him out of writing a recursive-descent parser for
the early C++ was a major mistake.  C++ is extremely difficult to
parse using pure bottom-up techniques, and needs a lot of kludges,
while doing it top-down is fairly straightforward and is common in
production compilers.
--
MOST launched 30 June; first light, 29 July; 5arcsec    | Henry Spencer
pointing, 10 Sept; first science, early Oct; all well.  | henry@spsystems.net


From: henry@spsystems.net (Henry Spencer)
Newsgroups: comp.compilers
Subject: Re: Compiler Books? Parsers?
Date: 2 Nov 2003 14:46:21 -0500
Message-ID: <03-11-014@comp.compilers>
Keywords: C, parse, tools

Mohd Hanafiah Abdullah <napi@cs.indiana.edu> wrote:
>I have never used parser generators such as yacc/bison or antlr.
>Would be nice to know people's opinion on manual implementation vs
>automation of parsers, especially from those who have done both...

My opinion:

Automation wins for readability (less clutter obscuring the syntax),
efficiency (table walking is often faster than procedure calling), and
above all, ease of experimentation (particularly because bottom-up
parsing, which is more powerful and imposes fewer constraints, lends
itself to automation but not to hand-crafting).

Manual implementation wins for ease of semantics integration (easy to
interweave with the syntax, and top-down means more context is known),
ease of adding kludges to get around difficult spots, ease of
producing good error messages (again, better knowledge of context),
and avoiding dependence on major tools that may not be available
everywhere.

(One caveat:  as I've hinted at, people often assume that hand-crafted
means top-down and automated means bottom-up, and that confuses the issues
some.  Automated top-down also exists.)
--
MOST launched 30 June; first light, 29 July; 5arcsec    | Henry Spencer
pointing, 10 Sept; first science, early Oct; all well.  | henry@spsystems.net
[Automation also wins for accuracy.  You can be quite confident that
the parser that yacc or another generator writes for you parses
exactly the grammar you gave it, but it's very easy to leave
undiagnosed holes in a hand-written RD parser. -John]



From: henry@spsystems.net (Henry Spencer)
Newsgroups: comp.compilers
Subject: Re: Compiler Books? Parsers?
Date: 8 Nov 2003 01:36:45 -0500
Message-ID: <03-11-028@comp.compilers>
Keywords: C, parse, tools

>[Automation also wins for accuracy.  You can be quite confident that
>the parser that yacc or another generator writes for you parses
>exactly the grammar you gave it, but it's very easy to leave
>undiagnosed holes in a hand-written RD parser. -John]

Yes and no and kind of.  The key problem is having to transform the
representation used in design to the one needed for implementation.
If that transformation isn't trivial, it's easy to make mistakes in
it.

But I think this problem can hit you even in the automation case.  If
the automation has serious constraints on what it will accept, or you
used an eccentric design notation, the transformation can be
non-trivial even though both are grammars.

(As a case in point, it was not until 1984 -- hmm, might have been
1983, my memory is a bit vague -- that there was an LALR(1) grammar
for C.  Even though C had been described using grammars since its
origins in the early 70s, writing a *correct* LALR(1) grammar for it
was hard.  Steve Johnson's late-70s PCC did use a yacc parser, but
cheated a lot.  In particular, PCC refused to accept a number of
unusual syntactic forms that were officially legal C.  This was not
something you could discover by a quick comparison of the yacc grammar
to the C Reference Manual grammar.)
--
MOST launched 30 June; first light, 29 July; 5arcsec    | Henry Spencer
pointing, 10 Sept; first science, early Oct; all well.  | henry@spsystems.net


From: henry@spsystems.net (Henry Spencer)
Newsgroups: comp.compilers
Subject: Re: GCC for BCPL
Date: 11 Dec 2004 12:24:31 -0500
Message-ID: <04-12-043@comp.compilers>
Keywords: parse, question, GCC

Tom Crick  <tc@cs.bath.ac.uk> wrote:
>...a re-evaluation of using Bison to create the parser. The
>original BCPL parsers were recursive-descent and certain language
>features (particularly the optional semicolon problem) seem to point
>towards their use now...

As a (non-BCPL) data point on this, in "The Design and Evolution of C++",
Bjarne Stroustrup comments (p. 68):

  In 1982 when I first planned Cfront [the preprocessor from C++ to C],
  I wanted to use a recursive descent parser because I had experience
  writing and maintaining such a beast, because I liked such parsers'
  ability to produce good error messages, and because I liked the idea
  of having the full power of a general-purpose programming language
  available when decisions had to be made in the parser.  However, being
  a conscientious young computer scientist I asked the experts.  Al Aho
  and Steve Johnson were in the Computer Science Research Center and
  they, primarily Steve, convinced me that writing a parser by hand was
  most old-fashioned, would be an inefficient use of my time, would
  almost certainly result in a hard-to-understand and hard-to-maintain
  parser, and would be prone to unsystematic and therefore unreliable
  error recovery.  The right way was to use an LALR(1) parser generator,
  so I used Al and Steve's YACC.

  For most projects, it would have been the right choice.  For almost
  every project writing an experimental language from scratch, it would
  have been the right choice.  For most people, it would have been the
  right choice.  In retrospect, for me and C++ it was a bad mistake.

He was dealing with much the same situation: C was originally designed
for recursive descent, and fitting it into LALR(1) wasn't easy.  PCC
had a yacc parser for C, but in fact it wasn't right: it handled some
of the more obscure cases, later significant in C++, incorrectly.  An
LALR(1) grammar for C eventually appeared as part of the ANSI C work,
too late.  There have been repeated LALR(1) problems as C++ has
evolved.
--
"Think outside the box -- the box isn't our friend."    |   Henry Spencer
                                -- George Herbert       | henry@spsystems.net



From: henry@spsystems.net (Henry Spencer)
Newsgroups: comp.compilers
Subject: Re: EBNF
Date: 11 Dec 2004 12:24:57 -0500
Message-ID: <04-12-044@comp.compilers>
Keywords: syntax, design

Martin Bravenboer  <martin@cs.uu.nl> wrote:
>...This separation is inspired by performance concerns: by
>separating the input in tokens, the number of elements the parser has to
>consider is reduced.

It is also a very nice "separation of concerns", dividing the problem into
two steps and thus often making it easier to solve.

I have found a scanner/parser separation very useful even in problems
normally thought too simple to require it.  Being able to work at two
separate levels often results in cleaner, more readable code and
simpler solutions to awkward problems.

>...The problem of using a separate
>scanner is that the context of the token in the input cannot be considered.

It is straightforward, in principle, for the parser to inform the scanner
about context.  In practice, this is easy to implement with a recursive-
descent parser but rather harder with a parser generator, especially if
the parser generator uses bottom-up parsing (which typically knows less
about the context).
--
"Think outside the box -- the box isn't our friend."    |   Henry Spencer
                                -- George Herbert       | henry@spsystems.net


Index Home About Blog