Hacker News new | ask | show | jobs
by ecshafer 1045 days ago
Can you write an equivalent piece of code that shows why Rust wins here with more familiarity and leveraging the ecosystem? With compilers I don't think there is a huge amount of using the ecosystem. I think what TFA does is a good case study in trying to be objective: write it both ways and compare.
2 comments

I'm just saying a compiler is a program, and while the more important things are heavy algorithm related, supporting libraries like those for error handling, etc. all still matter and add up. No problem if you disagree - just my perspective. This isn't so much an objective thing as it is a personal opinion.
Personally, it’s not so much that I disagree. It’s more that Ocaml has a top notch ecosystem for compiler writing. That’s probably by far its strongest point with library like Menhir having few equivalent in different ecosystem.

Not that there is anything wrong with Rust if you are ready to pay the price of having so much low level things to deal with.

Can you explain what Menhir does better than other parser frameworks? For what it's worth, I'm not a huge fan of parser frameworks. I tend to prefer hand written parsers, either via a combinator library or fully manually.

Rust has a pretty darn good ecosystem too btw. chumsky for parsing, rowan for syntax trees, salsa for incremental computation, miette for errors, inkwell/walrus for code generation.

Wow those library recommendations are fantastic, especially Chumsky. Thanks for sharing.
Yeah chumsky is fantastic, albeit with some rough compile errors and some unsafe. Still my favorite parser library by far and I’ve used a bunch by now.
It can generate elegant and efficient parsers for LR(1) grammars.

> I tend to prefer hand written parsers, either via a combinator library or fully manually.

That’s common with people used to languages which provide poor parser generators.

What makes them more elegant than the average parser? How’s the error recovery? Can you parse into high fidelity syntax trees efficiently?

I don’t know of many production compilers that use parser generators

You use the generated parser as a platform for experimentation.

If you know the language, and you have a bunch of users, and you are writing a parser for it, by all means, write a parser by hand and give it the best error recovery that you can muster. If you are developing a language and want to do a bunch of experiments, it pays dividends to use a parser generator. And then there is the whole space of DSLs and mini-languages you encounter, where beautiful error messages are a nice-to-have, but you would rather ship a generated parser and move on to more important work.

It’s easy to focus on compilers from the perspective of familiar languages like writing compilers for Rust or for OCaml, but you may end up writing a compiler that gets used by a much smaller number of people, for smaller tasks.

I don’t think so - I’ve written parsers using flex/lex yacc/bison, antlr, a bunch of functional combinator libraries and maybe others I’ve forgotten but now would never consider anything except hand-written recursive descent with an embedded Pratt parser for expressions and precedence.

Simple to write, debug, recover from errors, provide decent error messages, unit test, integrate into build systems, IDEs etc.

I also believe that nearly all the popular compilers these days do something similar - gcc was rewritten a few years ago in this fashion because of the technical benefits I’ve listed above.

I mean, it’s also common with people who value great feedback in their tools.
> That’s common with people used to languages which provide poor parser generators.

The vast majority of real languages have hand-rolled parsers and there is no real reason they shouldn't.

>That’s common with people used to languages which provide poor parser generators.

lmao

There was nothing objective about the article. The moment I saw seemingly random new lines in the rust for no reason, I knew there was going to be a “line counts!!!!” Sentence. When there was, I stopped reading because it was apparent that all matter of objectivity is completely missing.

I don’t even like rust.

It would be nice if somebody could offer what they consider to be an idiomatic Rust solution to this very routine problem in compilation if they believe the author is being intentionally deceiving. The Rust and OCaml code from the article looked decent to me. m
I don't think anyone was claiming the author was being intentionally deceiving, just that they're not very good at writing idiomatic Rust.

The first thing I'd suggest, as someone who's done their fair share of Rust compiler dev, is to stop using `Box` and `Ref` everywhere. Terms should not own their child terms; the idiomatic way to handle this is to use an arena and typed IDs, and pass a reference to the arena when you need access to the underlying data from the ID.

IIRC The Rust compiler just uses Cell<&’a> for child terms, which is a somewhat lighter weight solution that still allows mutation.

Edit: Either I remembered wrong or it's changed. Now it uses some kind of custom wrapper around Box: https://github.com/rust-lang/rust/blob/master/compiler/rustc... So actually, the Rust compiler itself uses owned children in its AST!

So the solution to writing idiomatic code in the language whose essential feature is the ownership semantics and borrow checker is to not use the ownership semantics and borrow checker?
This is sort of a flippant response, but I'll answer it seriously.

The premise of this question is incorrect: you can't not use the borrow checker in Rust. Instead, you are satisfying the borrow checker by ensuring at the point of use (deref of the ID) that there is valid data by providing a provably-live reference to the arena in which it was allocated.

In another language like C/C++, you'd just use a raw pointer, which is ergonomically-nice, but there's no guarantee that it actually points to a valid object. Rust forces you toward a solution that is guaranteed to be safe, and logically makes sense when you think about it -- nodes in a graph should not own other nodes in the graph; instead, they should be owned by the graph itself, and other nodes should simply hold references to each other. This is how you tend to implement safe and efficient graphs in C++ anyways, the only difference being that in Rust your references are indices, whereas in C++ your references are raw pointers.

But compilers deal usually with trees not general graphs.
> The Rust and OCaml code from the article looked decent to me

> RefCell<u32>

I'm not sure.

For the uninitiated, this is a completely harmless choice. But it suggests a potential lack of understanding of some pretty fundamental parts of Rust.
Although I think the implementation of the gensym function is broken, the article does explain that it wasn’t possible to use &mut u32 because multiple references to the value were required. It would be more idiomatic to use Cell rather than RefCell, but there’s nothing really wrong with using RefCell as far as I can see.
I like Rust but this isn't one of it's strong suits. The code in the article is what Rustfmt outputs (playground link below), so it's at least syntactically idiomatic.

https://play.rust-lang.org/?version=stable&mode=debug&editio...