Hacker News new | ask | show | jobs
Show HN: Readsql – convert SQL to most human readable format (github.com)
43 points by azisk1 1995 days ago
9 comments

Not sure I get the point of this? Perhaps I’m being dense? The hard part about SQL isn’t picking out the keywords in the select/insert statement, it’s understanding the DB relations, its structure and constraints.

If you want to make SQL easier to understand then take a look at a 15 year old stored procedure that’s been hacked at by a dozen devs is over 1,000 lines long, has sketchy rollback error protection, uses two CTEs, a pivot, no temp tables and does some xml shenanigans in the middle (I’m being hyperbolic obviously)

This feels like trying to loose weight by trimming your toe nails, you’re technically lighter but not so as it matters.

I understand the wonder, this is not some kind of magic that will help us solve all SQL issues. That's way too hard. This is a development tool that helps write and read SQL code. So simply saves development time by automating the easy stuff, leaving the hard and important stuff for us to deal with. It is different from other formatters that it lints code inside Python strings. For some context, I am a data engineer and I write a lot of SQL code inside Python code (in the future it should support other programming languages as well).

However, the tool might suggest improvements on SQL code in the future

Ok that puts it in context a little bit, if it’s making your life easier then it must be useful to people in the same position and role. I’ve o it ever written SQL in stored procs and one off queries so relied on Redgates SQL Prompt tool (you might want to take a look at what that does for some inspiration)
Thank you, I can definitely look up for some inspiration from Redgates. I see that it also costs some money
Leaving aside the modest achievements of the code so far, the missing prerequisite to "automatically" formatting SQL is... just formatting SQL.

Because SQL (broadly) was designed to be "human" readable in the first place, it's grammatically very inconsistent and with a lot of keywords. Much more than other languages in use today such as C.

I've yet to find a pattern of indentation, brackets etc. that satisfies my OCD.

Coming up with an example of a nicely formatted SQL statement is not difficult, but turning that into consistent 'rules' and immediately you find counterexamples using other parts of SQL.

I make an effort to indent on most keywords, using a Python-like structure to indent related lines to the same level, one more than the keyword operating on them all, with the heuristic that if I can comment one or more lines to debug, I have a readable query.

I tend to classify SQL statements into two kinds, those that when wrapped in a calling function fit in one screen, and those other longer ones that I'm inclined to write in an imperative language.

Edit: For the author of the repository, the list of reserved words gets longer and more complex when you support different implementations of SQL, and regex may be insufficient once you consider such parsing questions as whether the keyword is within quotation marks or part of a user-defined name.

https://www.drupal.org/docs/develop/coding-standards/list-of...

https://github.com/AzisK/readsql/blob/master/readsql/regexes...

Thank you for the feedback and for the link. This is something I also do for SQL code. Initially I was making this to be used as a pre-commit hook for SQL code inside Python for our team. Probably inspired by the black Python formatter. I just made the MVP and will propose to our team to use it after the holidays. If regexes would seem to be not enough, we still have the power of Python to lend a hand for more complex puzzles
You could take this idea further by fully parsing queries according to the grammar of the SQL language, rather than using simple pattern matching.

In fact it is a result of theoretical computer science that you _cannot_ correctly parse languages like SQL, HTML, Python, etc. with regular expressions: Any attempt to do anything non-trivial will have cases where it misunderstands the code.

So you would want to find a SQL grammar (an outdated example in [1]) and a module[2] that can use this to parse queries into a data structure to which you can apply transformations (e.g., changing case of keyword tokens) and then write back out as a string.

SQLite's documentation has some nice diagrams[3] to get an idea for how it parses a query string. The table of links at the top lets you dive into, e.g., all the optional parts of a SELECT statement.

1: https://ronsavage.github.io/SQL/sql-92.bnf.html

2: https://tomassetti.me/parsing-in-python/

3: https://sqlite.org/lang.html

Thank you for the feedback, it will be useful. This is something I had in mind and I believe this would be even more powerful but I started with this minimalistic approach. It might grow into something like this later
Was expecting something like DESCRIBE but for less technical audience.

As a heavy SQL user, I don't see much benefit in this as of yet. It is very misleading in saying it is the "most human readable" format, when the example shows the "format" to be identical to the original. Just by upper-casing keywords doesn't make it any more readable to be honest.

Anyways a good attempt, hope you're not offended by critical feedbacks and hope they are useful for some ideas to improve the tool.

Thank you for the feedback. The example in Github has code highlighting. I wonder if I should remove that and if that would make more difference. I would be interested to know how this tool could help you with your SQL usage
Every IDE I use and have used for the last 15 years has included an option to format the code the way someone else believes it to be best viewed. I have never found these to be the way I prefer SQL to be formatted for the best readability.

A difficult but incredibly useful idea would be to learn the developer’s style and then format code (theirs or others’) to fit that model.

Ah, I was hoping it would do multi-line and indentation formatting. I find that helps with readability a lot in sql.
That is coming soon!
Does it do anything but uppercase keywords? Is it going to add line breaks or indentation at some point?
Adding line breaks before FROM, JOIN, WHERE, GROUP BY, ORDER BY, LIMIT/OFFSET (and usually before some of the ANDs) is the first thing I do when analyzing some SQL. Capitalization is rarely needed after that and useful for longer term readability.

I also normalize by removing optional/redundant keyword noise, I'm looking at you INNER/OUTER.

Thanks, that's definitely coming in the near future
For now it only upper-cases but I would like it to "prettify" the code as well. So it is an upcoming feature. Hopefully it will even suggest SQL code improvements in the future. Contributions are welcome
I don't really get it. Is upper case more human readable than lower case?
Me neither, when copying uppercase sql i always lowercase everything. But I can understand that some people are more used to that format even though I find it obsolete.

In any case, it is objectively not more human writeable.

Not for everybody probably but I find it more readable, especially when the queries get big. Some people rely on code highlighting but this usually gets messed up when writing SQL code inside Python strings
By distinguishing key words from table names and column names, uppercasing can improve readability, since SQL has such a flexible/variable grammar.

If we wrote:

    select(col1, col2, col3, ..., from=table_name, limit=5)
or something like that, it's obvious what are the intrinsics and what are the variable parts without knowing anything about select but for SQL you need to know all the "sentence patterns".
It’s actually been proved to be the opposite... which is probably why a lot of legalese you aren’t meant to read but they’re required to provide is written in small print upper case.

This has to do with the outline of letters in uppercase being indistinct (if you trace an outline - especially with serif fonts - you’ll largely just get a block) so you need to spend more time per-letter to distinguish the characters whereas with lowercase, the “fitted box” shape is shared between fewer letters: https://www.sciencedirect.com/science/article/pii/S004269890...

I am not claiming that upper-case is more readable in general. I know the practice to write SQL keywords in upper-case and I find it more readable. From https://stackoverflow.com/a/608201/7714279 "You can easily separate the keywords from table and column names, etc." However, it's a choice. Some people rely for code highlighting but this gets messy if we write SQL code inside Python code. This is especially where this library comes to lend a hand
While I agree that uppercase is hard to read, turns out there is a good reason for caps in legal docs:

"Under US law, disclaimers must be 'conspicuous'" https://law.stackexchange.com/a/18210

I don’t think size 8 in light grey in the footer is conspicuous, uppercase or otherwise.
It's a SQL beautifier.
Is it a joke?