Hacker News new | ask | show | jobs
by radford-neal 104 days ago
Many years ago, I created an editor operating on syntax trees that I think is more "hard-core" than this - that is, only tree-oriented operations are done. There is no parsing of text, since entering plain text, rather than a tree, is impossible. Hence, there can be no syntactically invalid programs.

The challenge is getting this to be a useable way of entering programs. I think I made progress on this, but the feasibility varies with the programming language.

I can't run it any more, since the display hardware it assumed is no longer available, but you can read about it at https://ucalgary.scholaris.ca/items/da8b823b-c344-4ffb-aa37-...

6 comments

> The challenge is getting this to be a useable way of entering programs.

Well exactly.

When the path between Program A and Program B can only be valid programs, you are going to end up with either a much longer, less intuitive path, or deleting everything and starting again. It can also be quite possible to invent structures which are valid but have no valid path to creating them.

Well, I think most common transformations work reasonably well. One usually doesn't want to do things completely contrary to the AST, such as convert "while a<b" to "whilea = b".

And it's certainly possible to create any valid AST in the editor I describe. The set of valid trees is extended to those with "holes" in places, which one fills in when entering a program, and it's always possible to do this.

The challenge is one of finding an intuitive user interface, not whether it's possible at all. One issue is that infix notation is unnatural for entering trees (prefix is more natural).

no syntax error editing seems like https://scratch.mit.edu/
> It can also be quite possible to invent structures which are valid but have no valid path to creating them.

I'm curious if you have an example of such a structure?

Pedantically: if, for every valid tree, there exists a bidirectional path to the empty root node, there's always at least one path between all given pairs of valid trees ... albeit one that no developer would ever take.

You are quite right to call this out - and quite right in general. With AST verification alone your statement holds strongly.

I was remembering a professor I had who was very into formal verification and had an IDE that would only accept valid programs - however his validity checks were more than just tree based, they involved passing a type check. Which now you've called me on it, is quite definitely beyond valid AST editing.

The base case: if you have a structure with a mandated cyclical reference that doesn't accept a second (e.g. nil) type, you cannot construct it without creating an intermediate invalid program. This doesn't tend to show up if the cyclical reference is all the same type (A1 -> A2 -> A1) because you can often cheat and self reference, but it does if it is different types (A1 -> B1 -> A1). You can't construct an A without a B, which cannot be constructed without an A.

But that's still not a problem you cry! And quite right, you can edit the type itself to remove and re-add the reference.

That is until the type is built into the language, or a library.

OP clarified, but I understood that as a valid tree without a path to empty node. I don't have an example, but with grammar weird enough it sounds plausible. Especially if some form of type checking is involved, as OP mentioned.
One idea would be for it to work like code completion. Once you start writing a structure the rest is auto-suggested so it does not break the tree.
Pantograph[0] seems to be a more recent attempt to implement the same idea. It is still not a general editor but generalizing it to ranges of tree selections looks promising

[0]: https://pantographeditor.github.io/Pantograph/

I actually used a language and editor like this in a previous large company.

It was an experimental language that they wanted to replace their current application level language and was built ontop of JetBrains MPS https://www.jetbrains.com/mps/ which has that feature among others.

My personal opinion after working with that lang is all of this is that its theoretically interesting. But a dead end in practice along with visual coding etc.

The universality, simplicity, and legibility of text as interface for both humans and machines is too hard to beat. I think LLMs is just the most recent example of this.

Things that don't work in practice:

- You need a special editor, its heavy and slow, you lose all the ecosystem around them.

- You can't just cat or inspect the raw file, you can't see it in terminal at all.

- You need a new version control system, review system, and people need to learn it.

- You can't use any existing tools for working with code, you are essentially starting from scratch, and lose all the benefits of all the work everyone else is doing around tooling, dev saas etc.

- humans don't think in trees, they don't think in syntactically correct programs. Its actually absurdly frustrating writing a program in only syntactically correct edits. 99% of the time you are coding, your work is probably syntactically and semantically incorrect.

The tooling that is able to either make the right tradeoff around strictness or allow the users to make the that tradeoff is what ends up being used in practice. A good example of this typescript with gradual typing, python with type annotations, etc.

I think these editors just fall in a bit too far right on the strictness scale.

These are good points, but are mostly pragmatic, resulting from the rest of the system not being designed around a tree representation. Of course, that can be decisive in practice...

A more fundamental issue (which I mention in the document) is that there is a discordance between the 2D textual display and the underlying tree. I think this becomes more apparent when input is not only keystrokes, but also mouse positioning.

Of course, these disadvantages do not necessarily outweigh the advantages of editing in terms of trees!

What was the issue with MPS? MPS also looks like "custom widgets in normal text-based java" i.e. mixed paradigms and not "ground up ast-based editing".

I'm not sure if I can agree with your other points. I can't think of the last time I cat'd or maniuplated rust/java/go outside a heavy and slow editor. Many languages have more or less "one" editor with all the features that most people use, and often use it for change review too. And it seems weird to characterize needing new tools as a "dead end"... all languages need new tools and won't have them at the start.

Plenty of people use go, rust, and java which are not gradually typed.

Nothing wrong with MPS. I think it was just the language itself was design around ast editing, and projectional editor.

I was really excited about the concept and read some papers on it at the time. Then I ran a real world application of it, and writing code in it was just a bad experience.

I think I didn't explain the gradual typing properly. I mean the broader concept of moving developers from one way of doing things to another.

When the vast majority of people and tools are working a certain way. Introducing a whole new thing that requires them to work a whole different way is a hard sell.

So having a way for people to move gradually from point A to point B at their own pace and pick up benefits along the way is really nice.

One example of this is Typescript with gradual typing. Swapping form js to typescript is fairly easy. You can still use your old tools and over time you can start to adapt new features into your code, and over time everyone gets used to this new stricter/better approach.

I think ast based code like this essentially requires you to throw out everything all at once, not just editor, but literally everyone's workflow today and rebuild from the ground up.

I just don't think it will ever be more than a niche thing unless it can loosen up on strictness and play nice with existing tooling.

You laid out the problems nicely, but these are all fixable problems.
If you want to relive it then simh¹ with mame² might be an option. There appears to be some support for VT11³, along with docs⁴ to use as a starting point.

No, I'm not claiming to have read all one hundred pages already. However, from what I have read I'd love to see a functional demo.

¹ https://simh.trailing-edge.com/

² https://www.mamedev.org/

³ https://github.com/simh/simh/blob/master/PDP11/pdp11_vt.c

https://wiki.mamedev.org/index.php/MAME_and_SIMH

Now you're out of the realm of Ki, but what you're talking about is still being worked on in the modern era, by me! I'm building BABLR which is a modern follow-up to your idea, built on top of a powerful, generic system of parsers which has gaps/holes but no error recovery, so that it works with trees which may be incomplete but which must not be invalid.

The hard part is that we need to be able to talk about these structures. Even just here on this forum we need to be able to communicate precisely about them. I often use · as a typesetting symbol so that I can easily write and read expressions like 2 + · which you would read as "two plus gap". The · symbol is only for typesetting as I say thought because you it's not safe to assume that any one character is reserved for our use in every programming language. Instead we wrap the parts that aren't syntactic in quotes and we use <//> as the symbol for a gap so that it looks more like this:

  <*> "2 + " <//> </>
The * there is a flag on the node, it means this node a leaf of the tree -- a token.

We can parse 2 + · into a proper tree now:

  <BinaryExpression>
    left: <*Number "2" />
    #: " "
    operator: <* "+" />
    #: " "
    right: <//>
  </>
And yes, BABLR can really parse this. If we've piqued your curiosity, our Discord server is currently the hub of our community and we'd love to see you. https://discord.gg/NfMNyYN6cX
Wow! I stumbled onto your paper a while ago when I was looking into structural editing and thinking of a masters in CS, exploring editor interfaces / feedback loops.

(Maybe your paper is famous and it’s not wild that I read it, but it was wild to see after so many years)

I never took that path, spent time in tech industry confused why people didn’t seem interested in structural editing and better editing tools.

Out of curiosity, how do you think LLMs and genai affect the value of structural editors and similar tooling?

Part of me wants to stay disciplined— of course it’s valuable to work efficiently and work on the AST and with a repl. The other part of me gets paid to work on essentially a punch card system (building dev, ship to prod and see what happens)