Hacker News new | ask | show | jobs
by Winsaucerer 821 days ago
I feel like a foreigner in another land when I read your comment and others like it. For as long as I can remember using SQL, I can't remember ever finding it more difficult or backwards than anything else I use.

That difference might go some way towards explaining why I prefer a much more database heavy/thick approach to writing apps than my peers.

4 comments

I agree. I never even thought about "select what you want first" as a problem until someone else pointed out.

Programmers seem far too sensitive about wanting everything to work one way. SQL is a very powerful DSL. It has its quirks but nothing that ever enraged me. I don't really care that it doesn't work like some other stuff I use, I just accept that I'm learning the language of a particular domain. This doesn't mean that I don't think there is always room for improvement. Of course I think FROM first would be a little nicer, but so much nicer that I think its worth changing a whole battle-tested standard? Not at all. The pain is so minimal I don't even feel it.

> I never even thought about "select what you want first" as a problem until someone else pointed out.

I thought it was a problem as soon as IDEs had good SQL autocomplete. I got so used to depending on just being able to "tab my way through" autocompleting in other languages (e.g. if you do <objectVariable>.<propertyName>, it's obvious the set of property names can be narrowed down based on the type of the variable), that it immediately becomes apparent that doing select first sucks, because autocomplete has no good information until you get to the from clause. A lot of times with a good SQL editor like Datagrip I just do "SELECT * FROM foo" first, and then go back and edit the select columns because it can now autocomplete them quickly.

I now notice this in other places too, like I hate how in JavaScript you do `import { foo } from "moduleBar"`. I'd much rather do `from "moduleBar" import { foo }`.

Hey, that's fair! I'm not a big autocomplete user so I never thought of this, but it's a good argument.

> I now notice this in other places too, like I hate how in JavaScript you do `import { foo } from "moduleBar"`. I'd much rather do `from "moduleBar" import { foo }`.

Personally I prefer languages that don't make you import at all ;)

The actual ISO standard falls well short of being useful/sufficient to anyone who isn't an incumbent player. It's effectively a moat and therefore a direct impediment to competition from teams who have novel technical ideas but don't have access to significant capital - building a SQL implementation is a long, expensive journey. This is why many startups resort to building Postgres extensions, or using Calcite or DataFusion.

If SQL weren't so (needlessly) complex we would see much more competition across the database space.

> If SQL weren't so (needlessly) complex we would see much more competition across the database space.

I think there is more competition across the database space now than back when the SQL spec was less complex (say, in 1989 with SQL-89).

Also, much complexity in the spec comes from complex features; I really like grouping sets and window functions, and sure, that adds complexity; but it does allow users to express certain concepts that allow the database to more efficiently process data than sending everything to the user and letting the user solve the computations.

LINQ runs with FROM being first. Definitely trivial difference but a bit easier.
Ya I use Ecto which is the defacto Elixir SQL abstraction. It's heavily inspired by LINQ though only works with SQL. In any event, it also starts with FROM and I always end up writing my selects last. I've just never felt particularly annoyed writing them first in SQL (and I've written a LOT of raw SQL) I'd just do it without thinking about it. Never thought of it as a big deal.

The big problem with SQL AFAIC is its poor (really complete lack of) composability.

> I feel like a foreigner in another land when I read your comment and others like it. For as long as I can remember using SQL, I can't remember ever finding it more difficult or backwards than anything else I use

Learn linq or query/list comprehensions and then you'll easily see why SQL is backwards.

I've been using Django almost as long as I've been using SQL and I prefer the SQL ordering more: it matches the rest of the code, making it faster/easier to read. As a crude example:

  SELECT results FROM source WHERE criteria

  results = source(criteria)
It's rare to see someone want to change assignments in code to be like:

  source(criteria) -> results
Where I see it as the same thing: the SELECT columns, like the variable assignment, are the interaction point with the following lines of code.

And yes, CTE ordering does annoy me because of this. Putting it in the middle is pretty much the worst order.

> Where I see it as the same thing: the SELECT columns, like the variable assignment, are the interaction point with the following lines of code.

Indeed, which is why source(criteria) -> results makes more sense: the results definition is right next to the code that's going to be using that definition. If you put the results definition first as with SQL, then you have to scroll up to find the context (although perhaps Python's indentation sensitivity is the tripping point in this case). Not even mentioning the fact that the SQL way completely destroys any chance of code completion.

I'm going to boldly state that the SQL way is literally objectively wrong, in that there is no world in which SQL's choice is superior for general querying.

Then why are you advocating for it?

> or query/list comprehensions

List comprehensions are column first.

Right, here's the nuance: list comprehensions are intended to be concise one-liners, so having the results definition far off to the right defeats the principle I was outlining. Most SQL queries are not like this, they are almost always multiline of the form:

    select x, y, z
    from Foo
    where a or b
Here the opposite is the case: selection-first moves the return definition far from the subsequent code that uses it.

So if you're going to support list comprehensions, a monadic do-style notation which lets you chain them and again places select last:

https://wiki.haskell.org/List_comprehension#List_monad

Your first example should be more like

  return source(criteria).results
In your SQL, `results` isn't the variable you're assigning to, it's the column you're reading from source.
I learned SQL before I learned set theory. While learning set theory I remember thinking "oh this notation is just SQL backwards." Afterwards I began to find SQL much harder because I realized there are so many ways to mathematically ask for the same data, but SQL servers will computationally arrive at the end differently and with very different performance. This is a minor deal if you're just doing small transactions on the database, because if you are dealing with pages of 100 objects it's trivial to hit good-enough performance benchmarks, even with a few joins.

I was first introduced to the issue of needing hyper optimized SQL in ETL type tasks, dealing with very large relational databases. The company switched to non-relational database shortly after I left, and it was the first time I professional witness someone make the switch and agreed that it was obviously required for them. We were dealing with very large batch operations every night, and our fortune 500 customers expected to have the newest data and to be able to do Business Intelligence operations on the data every morning. After acquiring bigger and bigger customers, and collecting longer and longer histories of data, our DBA team had exhausted every trick to get maximum performance from SQL. I was writing BI sql scripts against this large pool of SQL data to white-glove some high value customers, and constantly had to ask people for help optimizing the sql. I did this for a year at the beginning of my career, before deciding to move cities for better opportunities.

Lately, I've began seeing the requirements of high performance SQL again with the wave of microservice architectures. The internal dependency chain, even of what would have been a mid size monolith project a decade ago, can be huge. If your upstream sets a KBI of a response time, it's likely you'll get asked to reduce your response time if your microservice takes up more than a few percentage points of the total end to end time. Often, if you are using relational SQL with an ORM you can find performance increases in your slowest queries by hand writing the SQL. Many ORMs have a really good library for generating sql queries they expose to users, but almost all ORMs will allow you to write a direct sql query or call a stored procedure. The trick to getting performance gains is to capture the SQL your ORM is generating and show it to the best sql expert that will agree to help you. If they can write better SQL than the ORM generated than incorporate it into your app and have the SQL expert and a security expert on the PR. You might also need to do a SQL migration to modify indexes.

So in summary, I think your experiences with SQL depends heavily on your mathematical background and your professional experience. It's important to look at SQL as computational steps to reach your required data and not simply as a way to describe the data you would like the SQL server to give you.

Was this before BigQuery/Presto/Trino? To me it seems like those technologies would have been a good fit.

They don't really work with indexes but instead regular files stored in partitions (where date is typically one of them).

This means that they only have to worry about the data (e.g. dates) that you are actually querying. And they scale up to the number of CPUs that particular calculation needs. They rarely choke on big query sizes. And big tables are not really an issue as long as you query only the partitions you need.

Those technologies were brand new at the time, the discussions about the problem started in 2013. The company (I had zero input) choose a more established vendor with an older product. Given the time and institutional customers that were trusting us with their data, I suspect any cloud based offerings were a nonstarter, and open source felt like a liability.

Of course with 20/20 hindsight that decision is easy to criticize. I suspect their primary concerns were to minimize risk and costs while meeting our customer's requirements. Even today, making a brand new Google product or Facebook backed open source project a hard dependency would be too much risk for an established business.

> I can't remember ever finding it more difficult or backwards than anything else I use."

This is the major problem. SQL looks like is not "difficult". You don't see (as a user) all their MASSIVE, HUGE, problems.

That is why:

- People barely do more than basic SQL

- People can't imagine SQL can be used for more than that, which leads to:

- Doing a lot of hacky, complex, unnecessary stuff on app code (despite the RDBMS being capable of it)

- Trying to layer something "better" in the forms of ORM

- Refusing to use advanced stuff like views, stored procedures, custom types, and the like

- Using of using advanced stuff like views, stored procedures, custom types, and the like, but wrongly

- Thinking that SQL means RDBMS

- So when the RDBMS fails, it is because the RDBMS is inferior. But in fact, is SQL that have failed (you bet the internals of the RDBMS are far more powerful than any NoSql engine, unfortunately, they are buried forever because SQL is a bad programming interface for the true potential of the engine!)

- So dropping SQL/RDBMS for something better, like JS (seriously?)

- And they are happier with their "cloud scale" NoSQL that rarely performs better, needs major, massive hacks for queries, or reimplements, poorly, ACID again, is more prone to data issues, etc.

And this is not even starting. If you think "is bad to make a full app, all their code, in relational model" that is how much brain damage SQL has caused.

---

I can count with my fingers the number of semi-proper DBs/SQL usage on my niche (ERPs) and that is mostly mine! (For example: I use dates for dates, not strings, like many of my peers!) and that is taking into account that I actually learned what the heck is that "relational" thingy after +20 years of professional use.

Go figure!

P.D: And then go to my code and see "what the heck, I could have done this in some few lines of SQL" and "what the heck, if only SQL were well designed I could do this dozen lines of SQL in 3!"