Hacker News new | ask | show | jobs
by abdullahkhalids 978 days ago
There are plenty of these markup languages. The reason none of them really challenge tex/latex in its own space, is that they don't aim to do what tex/latex does.

Latex is "typographically-complete". Markdown and friends are explicitly not. HTML+CSS is. But what latex has is a reasonable enough syntax that a human can write it by hand, unlike HTML+CSS. Moreover, the syntax, though clunky [1] is designed, as much as possible, to not interfere with the content that the human is writing.

For instance, Latex uses curly brackets {} for macro arguments, because they are least used brackets for content. So when you are reading a latex source, you know that () and [] are content, and only {} are ambiguous [2]. Nota, uses a mix of all three brackets for its syntax, causing additional pain for the person reading/writing the source.

The replacement for TeX/latex is never going to a simpler language. It is going to a language just as complex as latex. But it can definitely be cleaned up and sped up compared to latex. IMHO, somebody should write tex from scratch, improve it's syntax but otherwise keep it largely unchanged. Basically, any plain latex source using some of the popular packages should continue to compile and give the same output. That is the only reasonable way out.

[1] A typographically-complete language will never have a non-clunky syntax.

[2] Escaped brackets \{1,2,3 \} are literal curly brackets. Personally, I only use them for mathematical sets and have defined a macro \set, so in my documents {} are 99% not ambiguous.

1 comments

> what latex has is a reasonable enough syntax that a human can write it by hand, unlike HTML+CSS. Moreover, the syntax, though clunky [1] is designed, as much as possible, to not interfere with the content that the human is writing

I could not disagree more. LaTeX syntax is not 'clunky', it's a mess, and has intentionally been engineered right from the start to be clever rather than consistent. And it's not the syntax only, the obvious mess that is LaTeX's surface goes right on, right to the heart ("the guts" as TeXnicians prefer to say) of the machinery, where no concern is dealt with separately, and anything can influence and break everything else.

Hell you don't even get a semblance of sane text (string) processing or decent numerical computation. Yes, you can do it, in the way you could use a toothbrush or wet wipes to paint your house.

> Latex is "typographically-complete"

Yes as long as one is ready to ignore the fact that quite a few simple things are quite difficult to achieve in LaTeX, e.g. keeping lines the same height and keep register instead of jumping around whenever a superscript is encountered.

> The replacement for TeX/latex is never going to a simpler language. It is going to a language just as complex as latex.

The complexity of LaTeX is just in part due to the complexities of typesetting. It is complex because of an endless litany of bad design choices. HTML+CSS+JS gets a lot of flak for being too complex, but they pale in comparison. For example[1]:

In order to use numerical codepoints to write 東京, you can write any of:

    ^^^^6771 ^^^^4eac   
    \char"6771 \char"4EAC   
The space between the entities is used to signal the end of the codepoint number, hence to write 東 京 with a space you must use tricks, one of

    \char"6771{} \char"4EAC   
    \char"6771\ \char"4EAC 
In this system, ^^5c represents the backslash. But, unlike reasonable systems which TeX is not one of, using numerical reference doesn't deactivate the backslash's special role as command indicator.

Compare this to XML / HTML 東京 which is a much more reasonable syntax, not any harder to write, and uses an explicit end-of-command marker instead of the 'clever' space which is highly problematic.

[1]: https://agiletribe.wordpress.com/2015/04/07/adding-unicode-c...

> In order to use numerical codepoints to write 東京, you can write any of:

There’s a simpler way:

  \usepackage[utf8x]{inputenc}
  
  東京
Or better still, use XeLaTeX. But that's not the point. The point is that (1) sometimes you don't want the literal codepoint but a numerical reference in your source code; a use case for this would be ` ` instead of a literal ideographic space which might be useful to prevent it from being accidentally elided when at the end of the line.

(2) irrespective of whether you want to use numerical references or not, the example shows that apparently the authors of (La)TeX are unable to use sane syntaxes for their stuff. It's just a very bad idea to terminate your variable-length commands with a space when a space in the output could possibly follow. Same with identifiers: only letters are allowed, no underscores, no digits. You then get names like `\fooBarBazVI` instead of \foo_bar_baz_6 which many would prefer. These are all trifles to be sure, but they're legion, so you get a software that seemingly takes Death By a Thousand Papercuts as a positive design maxime.

LaTeX definitely has many messy parts that need to be cleaned up. Native support for unicode characters and bidi text (which is somewhat implemented by xetex), is mandatory in new-latex.

TeX engine obviously will need to be rewritten completely from scratch for the reasons you suggest.