Hacker News new | ask | show | jobs
by johnvaluk 1903 days ago
Overall an enjoyable read, but as someone who includes SQL queries in code, I disagree with two points:

I despise table aliases and usually remove them from queries. To me, they add a level of abstraction that obscures the purpose of the query. They're usually meaningless strings generated automatically by the tools used by data analysts who rarely inspect the underlying SQL for readability. I fully agree that you should reference columns explicitly with the table name, which I think is the real point they're trying to make in the article.

While it's true that sorting is expensive, the downstream benefits can be huge. The ability to easily diff sorted result sets helps with troubleshooting and can also save significant storage space whenever the results are archived.

5 comments

Table aliases make queries much less verbose than using the full table name, and more readable as a result. Aliases are unavoidable when you're joining the table more than once. Not using qualified identifiers is just asking for trouble.

Short aliases - I tend to use the first letter of each word in the table name - work best, IMO.

I tend to use two character because you might join similar named tables or the same table multiple times.
I'm the opposite. I like the short aliases, esp. when there are many tables. Short aliases can all be the same length, and therefore align better for better readability, and they don't pollute the visibility as overly-verbose table names do.

In code, I like the length of the variable name to be proportional to the size of the scope. Small scope -- short variable names.

To each their own, but in the case of ETL/ELT, you would just be asking for pain not using aliases.
Even there someone needs to read them eventually than just the person who wrote it. Single letter aliases are just evil. In some ways it’s the same as doing: String x = “Hello”
> Even there someone needs to read them eventually than just the person who wrote it.

That’s not an argument against table aliases, its an argument against unclear table aliases.

Single letter table aliases are better than just using unqualified column names, both of which are worse than table aliases guided by the same naming rules you’d use for semantically-meaningful identifiers in regular program code.

In the case of ETL you should only be referencing those tables a few times because you are integrating them into friendly analytic models. In that case you probably have a lot of columns to wrangle and complex transformation logic. In those cases I prefer to use no alias at all to avoid the scrolling around to get context, even when table names are very long.
Aliases are pretty much mandatory if you need to join the same table twice; otherwise you have no way to disambiguate column references. I understand the case of auto-gen’ed queries. However, in page long queries that I’ve written by hand, short aliases (typically the acronym of the table name, sometimes with a one-letter suffix if the table is joined multiple times) make the whole query flow like a nice story.
I got descriptive table names like WhsTransactionGoodsItems and WhsTransactionGoodsItemPackages.

I feel it would be rather noisy to have to specify such table names in front of the 15+ column references in a query, compared to using aliases.

Then again I've never had to diff the result sets, so I guess our usage is quite different.

My editor has tab completion, it's not like you have to type every character. I prefer it for readability but it's definitely debatable.
Readability is weird heh, I'm exactly the opposite. For me long lines become a blur, so better to have a short alias.
I know what you mean. I think this is really a matter of readability vs comprehension. I want all the context on one line so I can understand it, even if that does technically make it harder to read that line.

This is especially a problem in huge queries.