Hacker News new | ask | show | jobs
by haberman 3555 days ago
If you think about it, TeX is just a compiler, right? It operates on trees, and translates some input to some output. I'd love to see the equivalent of LLVM for TeX: a modern, modular implementation that implements known best practices and is easy to integrate into other environments.

Maybe this is something like what LuaTeX is trying to do? It's hard to tell from their web page.

3 comments

No, TeX is not a compiler by a common sense of a compiler (unless you say it compiles to PDF).

TeX's input grammer can change during the TeX run, so I believe it is impossible to make an equivalent of LLVM (unless I misunderstand the concept of LLVM).

> No, TeX is not a compiler by a common sense of a compiler (unless you say it compiles to PDF).

That is exactly what I am saying.

> TeX's input grammer can change during the TeX run, so I believe it is impossible to make an equivalent of LLVM

The important part is not the grammar, but the internal representation. I don't know enough about TeX to know much about its internal representation post-parsing.

> I don't know enough about TeX to know much about its internal representation post-parsing.

IIRC it's a stream of tokens. A token can be a character, a built-in command, a macro etc. During processing of this token stream macros are "expanded", i.e. replaced with their definition (recursively). It is possible to control this expansion process using built-in commands.

The problem with this is that TeX cannot be described using a context-free grammar (CFG). Knuth has discussed this in the past. So, building a compiler for TeX is almost impossible.
> The problem with this is that TeX cannot be described using a context-free grammar (CFG). Knuth has discussed this in the past. So, building a compiler for TeX is almost impossible.

Neither can C++[0], yet there still are C++ compilers out there. It does raise the bar significantly, though.

[0] http://stackoverflow.com/a/14589567

Not entirely what you're describing, but pandoc goes a long way towards being a sort of LLVM for text documents. In order to do all the format conversions, it transforms inputs into a tree-based internal representation, and then translates that into the output format.

Unfortunately it doesn't have a (pure) TeX reader yet, but that could be implemented relatively easily.

If it could be implemented easily, chances are it would have been by now. One big issue is, TeX doesn't run in traditional compiler-like layers (lex,parse,etc.) In TeX, the meaning of the next token (lexer level) can be changed by something happening in the guts of the engine in response to the previous token. So, just as compiling LISP requires an ability to interpret LISP, compiling TeX into some sort of tree structure would require implementing a big chunk of the TeX engine itself in the process.
Well, yes and no. You are absolutely right that a complete implementation of TeX would be difficult, but you could read a subset of the language that is big enough to be useful, including simple macro definitions and commonly used commands, which is exactly what pandoc's LaTeX reader already does.