Hacker News new | ask | show | jobs
by rgovostes 2004 days ago
You could take this idea further by fully parsing queries according to the grammar of the SQL language, rather than using simple pattern matching.

In fact it is a result of theoretical computer science that you _cannot_ correctly parse languages like SQL, HTML, Python, etc. with regular expressions: Any attempt to do anything non-trivial will have cases where it misunderstands the code.

So you would want to find a SQL grammar (an outdated example in [1]) and a module[2] that can use this to parse queries into a data structure to which you can apply transformations (e.g., changing case of keyword tokens) and then write back out as a string.

SQLite's documentation has some nice diagrams[3] to get an idea for how it parses a query string. The table of links at the top lets you dive into, e.g., all the optional parts of a SELECT statement.

1: https://ronsavage.github.io/SQL/sql-92.bnf.html

2: https://tomassetti.me/parsing-in-python/

3: https://sqlite.org/lang.html

1 comments

Thank you for the feedback, it will be useful. This is something I had in mind and I believe this would be even more powerful but I started with this minimalistic approach. It might grow into something like this later