Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

General guidelines and resources

  • 0 Replies

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 823
    • View Profile
General guidelines and resources
« March 27, 2017, 12:53:54 PM »
Not many of us will have call to write an entire language's compiler from scratch, but most of us have been called upon to process configuration files, etc., written in a somewhat "high level" language. Such languages should be clean and understandable to humans, so something as cluttered and rigidly machine-oriented as XML is undesirable. This board (subsection of the forum) is oriented towards discussion of methodologies for constructing such translators, interpreters, and compilers. Ad hoc freehand design of such a system is often not a good thing — you really should have some method behind your madness, or your code is likely to be very brittle, difficult to extend or reuse, and generally buggy.

The intent is to discuss a range of language translators, whether you're building utilities to translate old COBOL code to C# (good luck!), read (and possibly write) configuration files, an interpreter for some purpose, or an out-and-out compiler. It can be a standalone program or a module to drop into other code (and thus, the language choice is constrained for you). It can be a full translator to low level machine language (assembly code), a Just-In-Time bytecode compiler, or something else. Whatever you need, whether needed for your work, or just for expanding your skill set.

Some resources to get us started (please suggest more!):

  • Aho & Ullman (first edition) or Aho, Sethi, & Ullman (second edition) Principles of Compiler Design. The famed "Dragon" books referred to by all authors.
  • Lewis, Rosenkrantz, & Stearns Compiler Design Theory.
  • Holub Compiler Design in C. This is not well structured, more of an ad hoc attack on the problem, but useful for bits and pieces showing how to actually do things in building a C compiler. Be sure to attend to the rather lengthy online errata list at some point. There is an online copy (including errata list).
  • Pyster Compiler Design and Construction. This is implementation of a Pascal-like language in itself, in a very informal approach. Fun fact: the output is IBM 360 Assembly Language.
  • Niklaus Wirth Compiler Construction. A brief (131 page) introductory CS course featuring a subset of the Oberon language.
  • Pratt Programming Languages: Design and Implementation. While not going deeply into compilation, it is a good survey and overview of different approaches to languages.
  • Waite & Goos Compiler Construction. This book is lighter on parsing theory, but goes more heavily into the details of error reporting and fixup, optimization, and code generation than do most other books.

  • lex and yacc — these are the Ur translation system, originally from Bell Labs for the Unix operating system, and oriented towards C-like languages. lex is the lexical analysis portion, which tokenizes the input stream and feeds tokens to the syntactical analyzer (Yet Another Compiler Compiler). Despite the flippant name, it has been a workhorse in the field (even if it wasn't necessarily the first).
  • flex and bison — the free (open source) versions of lex and yacc, somewhat more modern and updated in design. Supposedly "flex" is "fast lex", and "bison" is just wordplay on "yacc" (equals the Asian domestic animal "yak"). It has been widely used in all sorts of free software compiler projects.
  • ANTLR — Terence Parr's ANother Tool for Language Recognition, using input much like lex/yacc or flex/bison, but combined in one input file and allowing some context-directed parsing (and thus more flexibility in the language design). It also permits input specifications (e.g., a comma-separated list of expressions) in a more natural manner, closer to the concept of the "railroad track" syntax diagrams (a.k.a. syntax charts or bead diagrams) some of you may have worked with. It has many associated tools, and is Java-based, although there are many back-ends for other support languages.

Please try to keep any algorithms you give in a more or less language-independent style, unless the subject matter is for a specific implementation language. You may love writing in Python, but you want your work to be accessible to someone who doesn't know Python.

Finally, discussion is open for implementing not only "traditional" languages that you find compilers and interpreters for, like C or FORTRAN (with expressions inside control structures), but also "inside-out" languages such as HTML, where keywords and other control structures are often embedded inside running text content. There are still rules that apply (especially for tag nesting), but the structure is often more one of a lot of disconnected little pieces. See Javascript Document Object Model.
« Last Edit: July 28, 2019, 05:15:09 PM by Phil »