Hacker News new | ask | show | jobs
by mjburgess 679 days ago
I guess I wasn't very clear. I didnt mean to say, as a historical matter, were irrelevant.

I meant to say the idea of a parser generator is a solution to a problem that that real world langs don't really have. When writing a programming language, your issue isnt how much time the parser is going to take to write, or how complex it's going to be. The parser is a relatively trivial part of the problem.

Due to language designers often being taught to develop langauges in this fashion, many have relied on these tools. But the modern view of compliers as "programming langauge UIs" and the focus on DX, i'd argue its actively pathological to use a parser generator.

Much academic work has, til recently, focused on these areas -- whereas today, the bulk of the difficulty is in understanding SSA/LLVM/ARM/Basic Optimizatiosn/etc. details which are "boring, circumstantial" etc. and not really research projects. I was just pointing this out since a lot of people, myself included, go down the EBNF parser-generator rabbit hole and think inventing a langauge is some formal exercise -- when the reality is the opposite: it's standard programming-engineering work.

3 comments

oh, i agree that parsing is not the hard part of writing a compiler, and that compilers classes overemphasize it

but no language starts out as a 'real world lang'; every language is initially a toy language, and only becomes a 'real world lang' in the unlikely case that it turns out to be useful. and parser generators are very useful for creating toy languages. that's why virtually every real world lang you've ever used was implemented first using a parser generator, even if the parser you're using for it now is handwritten

having a formally defined grammar is also very helpful for dx features like syntax highlighting and automated refactoring

The one thing parser generators do is that they ensure that the language you implement actually matches the grammar you wrote. That’s still an important assurance to have.
they also tell you where the grammar is ambiguous, which can be useful
I've created many programming languages. All the ones I finished and were useful did not have grammars that "I wrote".
do you mean https://github.com/mjburgess/Lyssa and https://github.com/mjburgess/Quazar? it's true that i can't find a grammar in either of them (https://github.com/mjburgess/Lyssa/blob/master/src/impl.py#L... seems more forthy than anything else) but (while they are very much the sort of things that i like, thank you for sharing) they also seem somewhat less like 'real-world programming languages' than things like awk, ocaml, or our other example in this thread, gcc c and objective-c until gcc replaced the bison parser with a handwritten one in 02006. that last compiler was the compiler nextstep was built on, which got steve jobs back in control of apple, to replace macos with nextstep. seems pretty real-world to me

maybe you're talking about stuff you haven't released?

Indeed, that's a defunct profile where everything should be private anyway. The reops there are 13/14 years old: these were experiments with using RPython to create languages, I'd guess when I was ~20. The point of those was to profile RPython. I have created real front-ends and compiler backends in C for non-trivial langugaes.

I will soon likely create a probabilistic programming language and compiler.

Are you saying your programming languages don’t have a defined grammar?
The parser defines the grammar. This is quite common in mainstream languages -- iirc, only after some years did python get a formal description of a grammar.
> The parser defines the grammar.

But how can you have assurance which grammar it defines, or that it even defines a well-defined grammar?

I’m well aware that some languages don’t bother defining a proper grammar, or define it without having a mechanism to ensure their implementation matches it, but lacking that assurance is exactly the drawback of not using a parser generator.

i was puzzled about that too and got even more puzzled when i looked at the languages i could find that he's implemented
This still isn't quite correct. Back in the day parsing was a much larger portion of the complexity of a compiler: performance was much more of a concern, as was memory usage. Parser generators were an attempt at helping with that, by allowing programmers to produce more efficient (e.g. table-driven) parsers than what they could have otherwise written by hand. They only really went out of fashion because A) computers got faster and bigger faster than programs got longer, so parsing became less and less of a percentage of total utilization, and B) you can get much better error messages out of recursive-descent parsers.