Hacker News new | ask | show | jobs
by austincheney 3217 days ago
The boundaries between lexers and parsers differ by the parser in question.

Grammar and syntax are also different, particularly with regards to XML based languages. Syntax are the rules which define the language while grammars are the conventions that define the context in which artifacts in a language instance are interpreted. Whether a language requires terminating semicolons or curly braces are syntax rules. Whether there is an object schema or namespace concern is a grammar issue.

Parsers commonly produce abstract syntax trees (AST), but can produce output in a variety of formats. I prefer parse tables personally. To say that a parser must produce as AST is rather short-sided and inexperienced.

While parsers typically rely upon lexers and compilers typically rely upon parsers there is no law proclaiming computation must occur in that flow. Lexers, parsers, and compilers are all separate steps that can act independently provided a sufficient configuration.

1 comments

> The boundaries between lexers and parsers differ by the parser in question.

Indeed, the division between lexer and parser is somewhat optional, since scannerless parsers can be defined to operate directly on a stream of basic symbols from the language's alphabet, but the usual division reflects notions from automata theory:

- A lexer (a.k.a., lexical analyzer, tokenizer, scanner) is founded, in principle if not in actuality, on a finite state automaton to group input symbols from an alphabet into basic meaningful units ("lexemes" or "tokens") and to classify those units according to their significance (identifiers, keywords, literals, delimiters, etc.). In a natural language context, it can be thought of as producing words from a sequence of letters.

- A parser is founded, in principle if not in actuality, on (usually and at least) a finite state pushdown automaton, which is basically a finite state automaton equipped with a stack which can be manipulated by the transition function. The goal of a parser is to convert a stream of input symbols into one (or sometimes more) derivations (a.k.a. parse trees, [concrete] syntax trees, parses) which represent the structure of the input in terms of a grammar. In a natural language context, it can be thought of as producing sentence diagrams from a sequence of words (or letters in the case of scannerless parsing).

> Grammar and syntax are also different,

No, grammar and syntax are the same. A grammar consists of an array of productions, which are rules that define how symbols in the language may be combined to form valid sentences in the language. Usually, "grammar" refers to a formalism that is sorta like a big regular expression, except that it's not limited to the regular languages (context-free grammars being most common, with extra-parser hacks to support context-sensitive languages if needed), but a grammar is really an abstract mathematical object and need not be formally manifest.

> Syntax are the rules which define the language while grammars are the conventions that define the context in which artifacts in a language instance are interpreted.

Indeed, syntax can be thought of as the rules which define a language (that is, can be thought of as a grammar). However, what you describe as "the conventions that define the context in which artifacts in a language instance are interpreted" is properly called semantics, which are rules for ascribing meaning to phrases in the language.

> Parsers commonly produce abstract syntax trees (AST), but can produce output in a variety of formats. I prefer parse tables personally. To say that a parser must produce as AST is rather short-sided and inexperienced.

This is true; indeed a parser can directly produce any sort of output: a single truth value that indicates whether the input is part of the language that it recognizes, a translation into another language, or even "no" output in the case of an interpreter. However, I'm confused by your usage of "parse tables" here. The term usually refers to the tables that are used to drive a parsing algorithm (i.e., an input to the parsing algorithm) rather than the output of a parser. Would you elaborate for me?

> Lexers, parsers, and compilers are all separate steps that can act independently provided a sufficient configuration.

Indeed, a parser does not necessarily require a lexer, nor must a parser's output be compiled. However, though your statement is technically true, I can't even imagine what a compiler without a parser might look like!