|
Yeah suppose you write a simple config language like: let a = 12;
let b = a + 5;
...
Tree-Sitter will give you a tree like Node(type="file", range=..., children=[
Node(name="let_item", range=... children=[
Node(name="identifier", range=...)
Node(name="expression", range=..., children=[
Node(name="integer_literal", range=...)
...
Whereas Nom/Chumsky will give you: struct File {
let_items: Vec<LetItem>,
..
};
struct LetItem {
name: String,
expression: Expression,
};
...
Essentially Tree-Sitter's output is untyped, and ad-hoc, whereas Nom/Chumksy's is fully validated and statically typed.In some cases Tree-Sitter's output is totally fine (e.g. for syntax highlighting, or rough code intelligence). But if you're going to want to do stuff with the data like actually process/compile it, or provide 100% accurate code intelligence then I think Nom/Chumksy make more sense. The downsides of Nom/Chunksy are: pretty advanced Rust with lots of generics (error messages can be quite something!), and keeping track of source code spans (where did the `LetItem` come from) can be a bit of a pain, whereas Tree-Sitter does that automatically. |
Tree-sitter's output is closer to being "dynamic" than "untyped", though.
It's not too hard to build a layer on top of tree-sitter (out of the core lib) to generate statically typed APIs. I haven't felt the need for that yet, but it may be worth exploring.
> actually process/compile it
At work, I built a custom embedded DSL, using tree-sitter for parsing. It has worked well enough so far. The dynamically-typed nature of tree-sitter actually made it easier to port the DSL to multiple runtimes.
> provide 100% accurate code intelligence
Totally agree that tree-sitter cannot be used for this, if we are aiming for 100%.