Hacker News new | ask | show | jobs
by estebank 2166 days ago
I think you hit the nail right in the head, but I would say that whitespace is only one source of signal. Lately I've become convinced that languages need redundancy in their grammar to help with error recovery. Having a more flexible syntax actually hurts usability because valid but semantically incorrect code can't be caught early enough and makes it much harder to give friendly errors. rustc uses whitespace for error recovery in several places, although the most prevalent is checking whether two different elements are on different lines, like when detecting a `:` when a statement ending `;` was actually meant[1].

[1]: https://play.rust-lang.org/?version=stable&mode=debug&editio...

2 comments

> languages need redundancy in their grammar

IMO extra redundancy makes Greek much easier to read than Latin. The upfront learning is a bit more effort, but then reading it is much easier. Unlike Latin it has articles (which are declined and agree with the nouns/adjectives), and the endings on verbs & nouns have way less ambiguity. I can certainly see how the same would apply to programming languages.

Somewhat related is my belief that text based languages have long since outlived their usefulness. With text based languages the compiler creates an abstract syntax tree and a bunch of metadata and then throws it all away afterwords.

Oh sure you can save an object file if and ONLY if it didn't change and it's cached locally.

A perfect system if you edit a file and save it 90% of the compilers work would already done and saved as part of the file format. Topical is the parser would usually have access to the previously unbroken file.

In my experience parsing is pretty robust and fast already, to the point where optimizing that stage of the compiler isn't worth it. Regarding the discrepancy between the AST and the source code, I think that it is worth noting that although there's a relationship between the syntax and the semantics, there need not be a 1:1 mapping between these, both allowing different features look similar (because they are pedagogically similar, like in impl Trait in Rust in argument position and return position) or dissimilar. I could see leveraging ASTs for more accurate diffing algorithms and fancy tools, but none of these would require that the AST be the source of truth.
If you have ABI stability you don't need to do any of that stuff either to code that hasn't changed.

The fastest way to compile code is to not constantly recompile it from scratch.

I mean it sounds like you have an issue with batch compilers more than "text based languages." If you didn't toss out the AST you'd have to serialize it (now you're compiling to two targets at once), and when you want to recompile you need to parse the serialized AST along with the source code for 90% of the same information.
Yes, modern query-oriented compilers (such as TypeScript and C# Roslyn) keep the AST in memory and co-operate closely with the editor via the language server protocol.

I recently saw an example of a language that aims to be AST-first rather than notation-first: it serialises to XML! http://mbeddr.com/

Nothing modern about it, lisp has been doing it for fifty years now. S-exprs also serialize to xml quite nicely ;)
In a query-oriented compiler, you can give it a source location and ask for type infomation, or where is the definition of the symbol at that location, and the compiler will do just enough work to give you the answer, using laziness and memoization to make it reasonably efficient. As far as I know, LISP systems don't work that way: they parse and evaluate (and maybe compile) as soon as an expression is entered, without the kind of laziness you see in a query-oriented compiler.

This recent link was pretty good, I think https://ollef.github.io/blog/posts/query-based-compilers.htm... and this discussion with Anders Hejlsberg https://youtu.be/wSdV1M7n4gQ