| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by IshKebab 890 days ago

Yeah suppose you write a simple config language like:

  let a = 12;
  let b = a + 5;
  ...

Tree-Sitter will give you a tree like

   Node(type="file", range=..., children=[
     Node(name="let_item", range=... children=[
       Node(name="identifier", range=...)
       Node(name="expression", range=..., children=[
         Node(name="integer_literal", range=...)
   ...

Whereas Nom/Chumsky will give you:

    struct File {
      let_items: Vec<LetItem>,
      ..
    };
    struct LetItem {
      name: String,
      expression: Expression,
    };
    ...

Essentially Tree-Sitter's output is untyped, and ad-hoc, whereas Nom/Chumksy's is fully validated and statically typed.

In some cases Tree-Sitter's output is totally fine (e.g. for syntax highlighting, or rough code intelligence). But if you're going to want to do stuff with the data like actually process/compile it, or provide 100% accurate code intelligence then I think Nom/Chumksy make more sense.

The downsides of Nom/Chunksy are: pretty advanced Rust with lots of generics (error messages can be quite something!), and keeping track of source code spans (where did the `LetItem` come from) can be a bit of a pain, whereas Tree-Sitter does that automatically.

1 comments

ubolonton_ 890 days ago

Ok, understood. I was confused by the phrase "parse of that tree".

Tree-sitter's output is closer to being "dynamic" than "untyped", though.

It's not too hard to build a layer on top of tree-sitter (out of the core lib) to generate statically typed APIs. I haven't felt the need for that yet, but it may be worth exploring.

> actually process/compile it

At work, I built a custom embedded DSL, using tree-sitter for parsing. It has worked well enough so far. The dynamically-typed nature of tree-sitter actually made it easier to port the DSL to multiple runtimes.

> provide 100% accurate code intelligence

Totally agree that tree-sitter cannot be used for this, if we are aiming for 100%.