One thing that's always bugged me about language (and thus, compiler) design is that the books seem to spend 80–90% of their time on detailed mathematical analysis of expression parsing. They leave everything else as exercises for the reader: most of the lexical work, input file handling (especially the ins and outs of embedding include files), error detection/reporting and fixup (to continue compilation, not to make a runnable program), source listings (at what point in file inclusion and macro expansion?), macros and other preprocessing, optimization, and code generation. These are practical matters that need to be dealt with in making a useful application. Some guidance, suggestions, and examples for how to do these things would be appreciated.
Yes, there has to be a certain amount of rigor to the basic operations of lexical and syntactical analysis, or the whole thing falls apart, but authors tend to get so deeply into the mathematics of it (they must love that stuff) that the subject matter gets buried under discussion of alphabets, expressions, productions, set theory, and whatnot. Do you even need to rigidly separate lexical analysis (tokenizing) from syntactical analysis (parsing)? In some cases, feedback from parsing would be useful for determining what a token is and where the input should be split up. Can syntactical analysis of an expression be simply handled with operator precedence and associativity and a couple of stacks? Can control structures (basically, the stuff that wraps around expressions) be simply handled with recursive descent? Are there real world (useful) languages that couldn't be handled this way (perhaps Lisp-like ones)?
Various languages have their own peculiarities. For example, FORTRAN needs some preprocessing to squeeze out whitespace (except within literal constants) while gluing together continuation lines and eliminating comment lines. When doing this, how do you best keep track of the original lines and positions of keywords so that error reporting makes sense? Would it be better to not try to tokenize the input, or even define keywords with \s* between each letter, but to rely on constant feedback from the syntax analysis to determine what's coming up next? C has an explicit and fairly well-documented preprocessor pass, but for other languages, preprocessing (macro expansions/replacements, file inclusion, etc.) is often vaguely described, if at all. PHP was originally never formally described, and thus its syntax has had a lot of irregularities. At least FORTRAN and COBOL have the excuse that they were set in stone long before modern language design theory was codified!
At the other extreme, some books are so oriented towards "hands on" practical results that they go over background theory very lightly and plunge right into the needed code. This approach can leave the reader, while appreciative that they aren't deeply buried in math, wondering if they have a good grasp of the overall picture. I have yet to meet a book that leaves me fully satisfied with the author's approach.