Hacker News new | ask | show | jobs
by ubolonton_ 845 days ago
What do you mean by “parse of that tree to get useful structures out”? Can you provide some concrete examples?
2 comments

Yeah suppose you write a simple config language like:

  let a = 12;
  let b = a + 5;
  ...

Tree-Sitter will give you a tree like

   Node(type="file", range=..., children=[
     Node(name="let_item", range=... children=[
       Node(name="identifier", range=...)
       Node(name="expression", range=..., children=[
         Node(name="integer_literal", range=...)
   ...
Whereas Nom/Chumsky will give you:

    struct File {
      let_items: Vec<LetItem>,
      ..
    };
    struct LetItem {
      name: String,
      expression: Expression,
    };
    ...
Essentially Tree-Sitter's output is untyped, and ad-hoc, whereas Nom/Chumksy's is fully validated and statically typed.

In some cases Tree-Sitter's output is totally fine (e.g. for syntax highlighting, or rough code intelligence). But if you're going to want to do stuff with the data like actually process/compile it, or provide 100% accurate code intelligence then I think Nom/Chumksy make more sense.

The downsides of Nom/Chunksy are: pretty advanced Rust with lots of generics (error messages can be quite something!), and keeping track of source code spans (where did the `LetItem` come from) can be a bit of a pain, whereas Tree-Sitter does that automatically.

Ok, understood. I was confused by the phrase "parse of that tree".

Tree-sitter's output is closer to being "dynamic" than "untyped", though.

It's not too hard to build a layer on top of tree-sitter (out of the core lib) to generate statically typed APIs. I haven't felt the need for that yet, but it may be worth exploring.

> actually process/compile it

At work, I built a custom embedded DSL, using tree-sitter for parsing. It has worked well enough so far. The dynamically-typed nature of tree-sitter actually made it easier to port the DSL to multiple runtimes.

> provide 100% accurate code intelligence

Totally agree that tree-sitter cannot be used for this, if we are aiming for 100%.

Not the person you’re asking, but basically anything that needs to happen after the initial parsing stage. So you convert your raw text into an AST, but there’s usually some processing you need to do after that.

Maybe you need to optimize the data, maybe you need to do some error checking. Lots of code is syntactically valid but not semantically valid, and usually those semantic errors will persist into the AST (in my limited experience).