Hacker News new | ask | show | jobs
Show HN: Visualize SQL Queries (animatesql.com)
259 points by bvm-101 1546 days ago
My co-worker and I were debugging a SQL issue; having not seen SQL in two years, I embarrassed myself by confusing union vs. join. After this episode, I tried refreshing my SQL memory, but there are few websites that animate SQL for you. Most of them just have a series of images to help you visualize. There are a few tools that are quite good and robust (especially for large/complex use cases) but require installation and are too complex for my simple purpose.

So, just created a small tool to help visualise SQL. Most of the animations are just my understanding of how SQL works. Would love to know what you think? Do you also visualise some of the queries like that in your head? Any feedback would be gold. Btw you can also edit queries and see different results (but its a bit limited).

Have fun ;)

13 comments

We have started looking at something along the same axis of "improving understanding of your queries". Our product has nearly 10k SQL queries that need to be managed for each logical installation.

By converting a SQL query into an AST, you can start applying business logic to the actual syntax of the query. Put another way, you can query you queries. You can also run reports across all SQL to determine things like "show me everything in the product which references this table & column", or "Which queries reference a specific magic string constant?". More advanced reports can be made too, such as "Which queries join tables A, B & C together?"

We haven't taken it to the next step yet, but hypothetically we can go from AST back into SQL and start doing some super crazy shit like patching hand-written queries programmatically. Once something is in AST form, you are basically working with playdoh that another tool like LINQ (and a bit of recursion) can trivially cut through.

  > but hypothetically we can go from AST back into SQL and start doing some super crazy shit like patching hand-written queries programmatically.
And you've stumbled into rewrite-rules and the realms of query-planners/optimizers!

If you enjoy this stuff, it's really fun to learn about.

I’m not a DBA, but is there any IP around what OP and others are attempting to do? Surely anything that makes your life easier would have been patented by Oracle or the likes to eke every dollar out of the market?
The animated step by step is new to me, but any DB worth using is going to include tools that explain the execution of any query you give to the DB. It's known as a "Query Plan". And, despite the name, is not limited to queries. Query plans aren't friendly to the uninitiated, but give you far more technical information you need to actually tune the query and ensure things like "am I actually using the indices I have on this table?".

I'm not a DBA, but I have over 20 years experience using RDBMSes of Oracle, DB2, SQL Server, Sybase, etc.

Just out of curiosity - where do You get those queries from ? By which I mean - Are they static templates, and You get them from source code, or they are dynamic and You gather them from logs (and risk that some rare queries will be left out) ?
Believe it or not, did not know there were tools to parse SQL to AST before this project. I was trying to parse sql manually XD.
I'm not sure a procedural visualization is appropriate for a declarative language like SQL. It can be misleading both on underlying concepts (sets and relations) and actual query execution.

I'd more like to see a visualization it terms of tuples, sets and operations on them leading to result set.

I guess this could be a teaching tool if you focus on just selecting keywords from the dropdown, but in general most SQL engines' EXPLAIN features are leagues more useful than this.
Idea was to supplement the many websites that do explain SQL this way. But yes, I agree with that statement.
Did you ever see http://revj.sourceforge.net/ ?
Even more powerful (though an academic prototype): https://queryvis.com/
Neat tool too, thanks!
very tough to comprehend..
Can't you also do this with Microsoft Access or LibreOffice Base? I think it's called QBE, or Query By Example.
Really cool for joins. It would be nice to be able to adjust the speed after you already started an animation, and maybe play/pause. I also expected to be able to type into the duration box, maybe make it just a label rather than an input that can get focus.
Thank you for building this! I now understand the LEFT JOIN story shared a couple days ago.
Bug report: The buttons used to change the animation duration do nothing, probably because the buttons have the `disabled` attribute applied.
You can only change them before a animation runs. If you are in middle of an animation hit Stop/Reset and then adjust.
LIMIT not working. Try:

SELECT whiteRating, blackRating FROM ChessGames WHERE gameWinner='black' LIMIT 2 OFFSET 8;

You're the best :)! Let me fix that.
Very nice ! Would be happy to see more complex queries in the future, did you plan any update ?
Initially, the idea was to not have keywords selection at all. Wanted to allow for typing any query and just visualise. But it was a bit too complex for the timeline. Was looking to plan this out shortly, but there are so many good suggestions here, I feel might need to overhaul a few other parts too.
Promising. Unusable on mobile since the examples span multiple pages
In overall, it seems good, I'll use it while studying SQL.
This may help you mentally map in brain what happens. But this is all table scanning and doesn't include indexes, which organize data in a tree. When you do SQL, you should consider whether table will contain thousands or millions of rows. If millions, you have to think how that JOIN or WHERE will take indexes into account.
In a way, it is even deceptive, because naive programmers might be lead to think that repeated(!) table scanning is what real database engines do.

It's actually very rare that this happens. For example, SQL Server will create a temporary index on the fly if one is missing. It can create B-Tree, Hash, and Bitmap indexes which might be unexpected for some people because only B-Tree indexes can be created "permanently".

So in some ways database engines do even more than just use statically defined indexes.

> It's actually very rare that this happens

So I have a little consultancy gig for a few decades now where I spend a few days a month optimising bad software for performance (it is what I like; I don’t do anything else but ‘make shit faster’). I can tell you that the the past 10 years 99% of optimisations I did are fixing MySQL queries and indexes that table scan. I had projects that literally have table scanning queries over 50% of the queries ran. The result, as you know but apparently is not very common knowledge, is that these sites and apps run to a grinding halt (after incurring bizarre bills on aws rds; I moved many app from $100k/month bills to $10/month) when even a little traffic comes in.

Or; table scans should be rare but are not.

Edit; removed ‘time’ as that was not a good way of expressing this

How do you build your intuition about creating queries in a way to avoid this?

Is it a matter of having a conceptual model of relational algebra and the way the different db engines work, or is it more an accumulation of heuristics over time for what probably will cause problems, and an iterative process of using EXPLAIN, adjusting the query and seeing what happens?

I am old :) But experience is a big thing; I can sniff by just skimming the table definitions where probably something is very wrong. In uni in the 90s I studied both relational theory and formal methods and I had to spend a lot of time figuring out and fixing complexity; if you take a university level book on big O complexity and work through it, you will have a good feeling what software can and cannot do and in what way. That has not really changed; we have more efficient and more cache, we have improved algorithms, but things that cannot be looked up in O(1) are still dangerous and possibly can incur enormous IO even with only a few million records. Naive developers see that things are blazingly fast locally on their laptops and that’s it. I have met, especially in the last few years (In my bubble this is getting worse, quite fast), quite a lot of lead devs that actually do not know what an index is for and so I see entire dbs without any or only on the id field. I know people (for some reasons) do not like ORMs that create tables and indexes, but it would prevent many rookie mistakes if they did.
I can’t remember the last time I came across a non-CotS database schema that has secondary indexes in a significant number. Like more than half a dozen for a hundred tables or more.

I’ve never seen a database use “advanced” features like clustered columnstore or even just page compression.

I just have an email in my inbox from this morning from a small vendor that “doesn’t recommend” columnstore for a database containing 10 TB of numeric metrics in one table.

That would compress to a few gigabytes and query times would go from minutes to milliseconds.

But they “don’t support it”.

Which I now translate to: “we haven’t even flipped through the manual and when we googled it in a panic we didn’t understand it.”

This is how your data is being managed at huge enterprises and government agencies around the world.

First realize that SQL is not a procedural language, you are only describing the result set. The data store will then create an execution plan which is the actual code that gets run. Learn to read the explain in your data store of choice (very few swe do this). If you have access to a database administrator in your company, befriend and learn from them. Read about how databases store and retrieve data: from sql to data pages. Measure measure measure. Learn about different types of indexes and their trade offs. Remember that “it depends” is the answer to almost every db question and that you should be thinking through all you codes interactions. That is the path to mastery of dbs.
IME just run the DB, pick up and look at the query plans for the most common/time consuming, then add indexes. That's 80% of it. So...

> Is it a matter of having a conceptual model of relational algebra and the way the different db engines work

...no....

> or is it more an accumulation of heuristics over time for what probably will cause problems

...no....

> an iterative process of using EXPLAIN, adjusting the query and seeing what happens?

...that's more like it!

Once you understand the underlying data structures, all the magic goes away. As it should.

Yeah, that works. My process is very different, but your advice is better as it hangs more on actionable tactics than learning intuition.

> Once you understand the underlying data structures, all the magic goes away. As it should.

Once you actually understand them, I feel you don’t need explain in most cases; you will simply ‘see’ why certain queries or definitions or structures are bad.

Would love to read about some examples / strategies in a blog post some day!
> For example, SQL Server will create a temporary index on the fly if one is missing

err, I may be misunderstanding but can you explain why you feel this? I have never (IME) seen MSSQL do this and it wouldn't make sense because constructing an index needs a table scan plus a lot of work on top. Just doing a hash join is simply the better option.

I mean it would be nice at times but there are traps to this which is why (again AFAICS) it's not done and would be unsafe to do without a lot of info about resources and future expected queries which the query planner just doesn't have.

Happy to be set right on this.

It won't create an index you can see in the GUI or query via "sys.indexes" or anything like that.

It's a temporary object, much like a temporary table that exists only in the scope of the query.

As another comment mentioned, this is what a hash-join does internally: it builds a temporary "hash index" of one input, and then uses it to look up rows while scanning through the other input.

If you looks at the query plans in SSMS, you'll occasionally see bitmap indexes as well.

The equivalent of a standard B-Tree index that you would create permanently is the "Index Spool" operator. You'll also see "Table Spool", which is basically a temporary heap.

The example in the original article was the equivalent of this loop:

    foreach( a in table_a ) {
        foreach( b in table_b ) {
            if ( a.id == b.aid ) ...
        }
    }
That's hideously inefficient. Most databases will automatically do something like:

    var a_hash = new Hashtable( a.row_count )
    foreach( a in table_a ) {
        a_hash.add( a.id, a )
    }

    foreach( b in table_b ) {
        if ( a_hash.lookup( b.aid )) ...
    }
The clever part in all of this is that you can do this two ways: build a hashtable of "a" and lookup "b" rows in it, OR build a table of "b" and lookup "a" rows in it. They're equivalent, but the performance can be wildly different.

RDMBS query planners have the job of figuring out which to pick. Even if you think you can outperform the database by writing code like the above in Java or C# or whatever, you won't write out every combination and have the statistics available to choose. The database engine can and does.

SQL Server can do both steps in parallel across all CPU cores which is a topic of several PHD-level research papers. For example, hash tables can have performance issues if the same key occurs too often (e.g. many NULL columns). Balancing this across multiple cores is... complicated.

> Just doing a hash join is simply the better option.

Maybe he meant just that - hash join has to populate hashtable before joining.

Regarding B-Tree index, perhaps he thought Automatic Index management - https://docs.microsoft.com/en-us/sql/relational-databases/au... - but it must be enabled explicitly. But I don't feel like it is "on-the-fly". Rather in-background.

I'm going to be that guy and just say that I wince at the use of emoji instead of words. It's not very accessible and I surely hope nobody actually uses emoji word replacements professionally.
I don't like it either. I got good news for you and bad news. The good news is that emoji usage seems higher in open source/side-projects than in professional environments. The bad news is that yes, some people, even professionally, do use emojis instead of words. It is horrible, but I'm afraid it's only gonna get more popular as the population of young people become... Not young and joins the professional workforce.
You mean stands beside the professional workforce.
Are you claiming that "young" people can't join the professional workforce? Are they joining some other, "unfessional" workforce? Is there an age limit at which you're Old(tm) and then are able to be a true professional?
Think of emojis as Egyptian hieroglyphs - we've been there, done that.

On the other hand, emoji are far more international - everyone knows what a thumbsup means ....

> everyone knows what a thumbsup means ....

In some places, a thumbs up means "Up yours!" Not very friendly at all.

Mea Culpa - not my intention :)

I guess the victory/peace sign would also be a bad example (and that amongst english speakers).

I’m sure the 3 developers from Iran/Iraq who don’t know English and would be offended by that are fine.
It does have an air of "yeah fuck you buddy" sometimes to get a thumbs up
<thumbs_up/>
I hope at least one emoji is universal...<3
Are you under the impression that screen readers can't read... emoji?

You can call them unprofessional, that's like, your opinion, man, but "not accessible"? Admit it, you made that up because you don't like them and you hoped it'll stick.

some emojis are fine like this one -> :-)
That's an emoticon ;)