Hacker News new | ask | show | jobs
by PaulHoule 1520 days ago
It is no fun writing external DSLs when you're stuck with your grandfather's parser generator.

If we had grammars that were really extensible (add "unless(X)" equivalent to "if(!X)" in Java's grammar with a few lines of code) and composable (stick a SQL statement into a Java program with just a few lines of code if you have the SQL and Java grammars) and reversible (turn the AST tree back to source code) it would be a lot more fun writing external DSLs.

Every so often a revolution gets promised, like PEG parsers for Python, but then people get worried about how fast the parser is again and it isn't like Heinlein's Moon is a Harsh Mistress but more like Haldeman's Worlds (as Pete Townsend puts it "We won't get fooled again".)

If parsers were easier to use people might use them 10x as often as they do today.

3 comments

>It is no fun writing external DSLs when you're stuck with your grandfather's parser generator.

Agreed. I've tried various different approaches to building external DSLs, from fully hand-written to language workbenches like xtext [0] and spoofax [1]. I always end up back at hand written, often because of error handling. Creating meaningfully helpful error handling with parser generators always seems hard.

>If we had grammars that were really extensible

Grammar composition is hard. Canonical BNF is top down, meaning alternate clauses are complete and closed in most parser generators. One notable exception is SF3[2] (part of Spoofax). It doesn't require alternate clauses to be grouped and so can support composable languages. It's the most flexible parser-generator I've used, and the syntax is pretty nice too.

Fun fact: SDF3 and Spoofax come from Eelco Visser's group at TU-Delft - the same group that originated Scope Graphs, the basis for Github's Stack Graphs [3].

[0] http://www.eclipse.org/Xtext/ [1] https://www.spoofax.dev/ [2] https://eelcovisser.org/publications/2020/AmorimV20.pdf [3] https://news.ycombinator.com/item?id=29500602

> Grammar composition is hard.

Composing grammars may not be possible, but it's easy to define a new grammar that provides the composition of two other grammars along with boilerplate that serves the purpose of determining whether you're in a "grammar A" context or a "grammar B" context. You just need tokens that aren't valid in grammar A or grammar B.

No statement in such a grammar would be valid in either subgrammar, because of the boilerplate, but they would all be trivially reducible to valid statements in the appropriate subgrammar.

It drives people nuts but I say there are many characters in Unicode

https://unicode-table.com/en/sets/quotation-marks/

if you need a new kind of quote, so you could write

   Integer index = 77;
   Statement s = « SELECT COUNT(*) FROM that WHERE x = @index »;
People ask me "How do you type those on the keyboard?" and I say, "I don't type them on the keyboard, I cut and paste them." I use >128 codepoints all the time and mostly I don't need to because autocomplete works with them. No reason you can't write

   out.println(√(41.0))
I suggest learning to use the Compose key. For instance with no clipboard use at all, it enables «this».

Compose+<+< and then to close compose+>+>

Also see “curly” quotes: compose+<+" and compose+>+". This illustrates the power of composing: I had forgotten the shortcut, because I rarely use it, but I was able to guess it in 3 tries.

Raku has an extensible grammar supporting all the things you mention.
Have you tried writing a DSL in Kotlin?
Internal or external?

I think Java is just fine for internal DSLs, see

https://www.jooq.org/

jooq embeds a Turing complete programming language because it supports Procedural SQL.

I was also hacking on this project

https://github.com/paulhoule/ferocity/

which was about making Java homoiconic. Namely in ferocity you can write

   Expression<String> literal = of("Hello World");
   Expression<byte[]> expression = getBytes(literal);
and then

   expression.asSource() = '"Hello World".getBytes()'
   expression.evaluate() = {72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100}
the evaluation is done with the primitive interpreter strategy of evaluating all of the arguments of the function then calling the function, ...

I got far enough on that project that I discovered a bunch of things like the expression language has extensions over the real Java language pretty naturally for instance if you have a quote function like the quote in LISP you can write programs in the extended language to process syntactic macros.

I convinced myself that the idea is sound and got started on bootstrapping it by building a partial implementation (ferocity0) and code generator to make stubs out of the standard library plus a persistent collections library because that is pretty helpful for building the DSL, then write ferocity1 in ferocity0 wherever it could eliminate boilerplate (like the 8 primitive types.)