Hacker News new | ask | show | jobs
by graphene 3544 days ago
Not entirely what you're describing, but pandoc goes a long way towards being a sort of LLVM for text documents. In order to do all the format conversions, it transforms inputs into a tree-based internal representation, and then translates that into the output format.

Unfortunately it doesn't have a (pure) TeX reader yet, but that could be implemented relatively easily.

1 comments

If it could be implemented easily, chances are it would have been by now. One big issue is, TeX doesn't run in traditional compiler-like layers (lex,parse,etc.) In TeX, the meaning of the next token (lexer level) can be changed by something happening in the guts of the engine in response to the previous token. So, just as compiling LISP requires an ability to interpret LISP, compiling TeX into some sort of tree structure would require implementing a big chunk of the TeX engine itself in the process.
Well, yes and no. You are absolutely right that a complete implementation of TeX would be difficult, but you could read a subset of the language that is big enough to be useful, including simple macro definitions and commonly used commands, which is exactly what pandoc's LaTeX reader already does.