| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by swift 3226 days ago

This is true ideally, but there are several factors which make it not always true in practice.

(1) Often your grammar admits inputs which are not in your language. Less often, it forbids some inputs which are actually valid. This is usually because an accurate grammar would be extremely complex to write and inefficient to parse using a generic algorithm. Languages which combine a context-free grammar with parsing decisions based on semantic information fall into this category (think C++); the "true" grammar would usually be context sensitive and quite hard to work with.

(2) You are sometimes compiling a layered language. Again, consider C++. The program you're compiling is generally actually written in the C preprocessor language! Conceptually, it's translated to C++, and then the C++ code is compiled in a separate phase. In practice, that layering will make it difficult to produce good error messages, and a formal grammar for the combined language would be a mess to say the least.

(3) The structure of the grammar may just be awkward for producing good error messages, and refactoring it to eliminate the problem may not be possible, or it may only be possible by making problem (1) even worse.

In general, the decision of what goes in the lexer, what goes in the grammar, what is handled in semantic actions, and what is dealt with at a later semantic layer is always a trade off. Each layer has different characteristics in terms of computational power, performance, static analyzability, programming flexibility, composability, modularity, and maintainability. It doesn't make sense to insist that you solve all problems at one particular layer. You look at the problem you need to solve and decide how it'll be partitioned between these different layers to maximally benefit from their strengths and minimize their weaknesses.

That's engineering.