Hacker News new | ask | show | jobs
by steinroe 918 days ago
hey, author here. Thanks for posting it!

A bit of background: a few months ago we announced a Postgres language server[0]. A language server adds features like syntax error diagnostic and autocomplete to your editor (vscode, neovim, etc). We have iterated a lot on the parser over the past few months and want to share an update today.

the parser is a core piece of any language server that constructs syntax trees from the raw input string. Usually first an untyped concrete syntax tree (cst) that represents the syntactic structure of the input, and subsequently a typed abstract syntax tree (ast) containing the meaning of the source.

In our implementation, we leverage the actual Postgres parser to-do the heavy lifting. However, the parser is designed to parse executable SQL — not to provide language intelligence. For example, it does not handle incomplete inputs, and outputs just the ast, not the cst. To use it for a language server we had to work around these limitations as good as possible.

While we leverage procedural macros in rust to generate a lot of the repetitive parser code, there remains a portion that requires a bit of manual work. But the groundwork is completed, and we can finally start working on the data model and the actual server next. Our aim is to bring this to a usable state as swiftly as possible.

Huge shout-out to pg_analyze for creating and maintaining libpg_query[1], without which this project would not be possible!

[0] https://news.ycombinator.com/item?id=37020610

[1] https://github.com/pganalyze/libpg_query

3 comments

Thank you for doing this!

Every year I get frustrated with a PostgreSQL formatter, look out into the webs for hope, and begrudgingly return to my sub-par editing experience.

Please take your time to do a good job, and thank you, again!

Have you considered using something like a treesitter grammar? It could solve the editor specific uses cases like highlighting and even linting as it creates asts that are more amenable for a language server implementation
At Splitgraph we compiled tree-sitter to wasm for Postgres autocomplete. Indeed, one major reason we opted for it is that it has error handling so it can work with incomplete syntax (which is what your code is 90% of the time you're editing it).

https://www.splitgraph.com/blog/parsing-pgsql-with-tree-sitt...

thanks for the link, very interesting read! and you are right, libpg_query has its limitations.

the idea is to first implement the parser with libpg_query and work around its limitations as good as possible. Since the scan api also returns all tokens for invalid sql, the language server will then have basic features and syntax error diagnostics for invalid statements, and advanced features for valid ones. once the server itself is done, we want to go back to the parser and replace the libpg_query-based parser with a more resilient alternative statement by statement. ultimately, the libpg_query-based parser should just be the fallback.

that being said, very excited that there is so much development in postgres dx.

Yes, I think the optimal solution here is a combination of tree-sitter for real time (as-you-type) with a fallback to libpg_query. I mean it's technically the other way around, since libpg_query is preferred when it parses correctly. But yeah I think you inevitably need a combination. Pretty sure TS does similar things in VSCode.
Have you investigated generating your parser from postgres' gram.y instead of basically basing it on the bison output?
that's a very interesting idea!

for now, our goal is to take the "easy" route with libpg_query and build a language server that provides basic lsp features for invalid sql, and advanced lsp features for valid sql as fast as possible. we then want to go back to the parser and replace the libpg_query-based approach with a more resilient alternative. as of now, the plan is to implement a handwritten recursive-descent statement by statement. will definitely do research to what extend we could leverage gram.y there, especially to potentially fast-track it.

If some annotations or such would make that easier, it might be possible to do that upstream. Postgres currently has hand-generated code for tab-completion in psql and that's uh, not great (it has gotten large enough that msvc has trouble compiling the file).