Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

Philosophy and approach of Compiler Design books

  • 1 Replies
  • 2264 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
Philosophy and approach of Compiler Design books
« March 27, 2017, 01:33:24 PM »
One thing that's always bugged me about language (and thus, compiler) design is that the books seem to spend 80–90% of their time on detailed mathematical analysis of expression parsing. They leave everything else as exercises for the reader: most of the lexical work, input file handling (especially the ins and outs of embedding include files), error detection/reporting and fixup (to continue compilation, not to make a runnable program), source listings (at what point in file inclusion and macro expansion?), macros and other preprocessing, optimization, and code generation. These are practical matters that need to be dealt with in making a useful application. Some guidance, suggestions, and examples for how to do these things would be appreciated.

Yes, there has to be a certain amount of rigor to the basic operations of lexical and syntactical analysis, or the whole thing falls apart, but authors tend to get so deeply into the mathematics of it (they must love that stuff) that the subject matter gets buried under discussion of alphabets, expressions, productions, set theory, and whatnot. Do you even need to rigidly separate lexical analysis (tokenizing) from syntactical analysis (parsing)? In some cases, feedback from parsing would be useful for determining what a token is and where the input should be split up. Can syntactical analysis of an expression be simply handled with operator precedence and associativity and a couple of stacks? Can control structures (basically, the stuff that wraps around expressions) be simply handled with recursive descent? Are there real world (useful) languages that couldn't be handled this way (perhaps Lisp-like ones)?

Various languages have their own peculiarities. For example, FORTRAN needs some preprocessing to squeeze out whitespace (except within literal constants) while gluing together continuation lines and eliminating comment lines. When doing this, how do you best keep track of the original lines and positions of keywords so that error reporting makes sense? Would it be better to not try to tokenize the input, or even define keywords with \s* between each letter, but to rely on constant feedback from the syntax analysis to determine what's coming up next?  C has an explicit and fairly well-documented preprocessor pass, but for other languages, preprocessing (macro expansions/replacements, file inclusion, etc.) is often vaguely described, if at all. PHP was originally never formally described, and thus its syntax has had a lot of irregularities. At least FORTRAN and COBOL have the excuse that they were set in stone long before modern language design theory was codified!

At the other extreme, some books are so oriented towards "hands on" practical results that they go over background theory very lightly and plunge right into the needed code. This approach can leave the reader, while appreciative that they aren't deeply buried in math, wondering if they have a good grasp of the overall picture. I have yet to meet a book that leaves me fully satisfied with the author's approach.
« Last Edit: April 12, 2017, 08:20:46 AM by Phil »

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
Re: Philosophy and approach of Compiler Design books
« Reply #1: April 13, 2018, 05:25:47 PM »
This is an interesting discussion from a while back that I can across: https://groups.google.com/forum/#!topic/comp.compilers/OecVjAVv32E

I can sympathize with the thread-starter's request for a "cookbook" of compiler writing without all the heavy theory and math. The discussion didn't go on for very long, but did include a lot of literature reviews, and discussion of whether you can write anything useful with a cookbook approach (i.e., not being reasonably well-versed in compiler and language theory).

It got me to thinking: if you take all the chapters devoted to State Machines and such in the "Dragon" books, etc., and stand way back and squint, aren't they just using regular expression parsing to identify keywords and other tokens? It seems those chapters could be replaced by "Read up on RE, and then come back here and we'll talk about using RE to tokenize input."