Hacker News new | ask | show | jobs
by tibiapejagala 1607 days ago
The problem with sql is what happens when you fall off the SELECT FROM JOIN WHERE GROUP BY HAVING ORDER BY LIMIT cliff. The simple stuff in sql reads like English, but for that case ORM would generate a pretty efficient query anyway. The complex stuff in sql looks terrible in my experience and ORM bail out quickly. Once you can’t get the result with a simple SELECT then sql stops being declarative. Instead of writing what you want to get, you write something like a postmodern poem while having a stroke, just to convince postgres benevolent spirits to give you something almost right. Complex UPDATEs and DELETEs with joins are even worse.

Also lack of syntax sugar doesn’t help. SELECT list could support something like “t1.* EXCEPT col1, col2”. Maybe JOIN ON foreign key would be nice. IS DISTINCT FROM for sane null comparisons looks terrible. Aliases for reusing complicated statements are really limited. Upsert syntax is painful. Window functions are so powerful that I can’t really complain about them though.

We use a lot of sql for business logic, but some code I have to reread from zero every time I need it. Maybe we modeled our data wrong or there is some inherent complexity you can’t avoid, but I mostly blame sql the language. Unfortunately I have no idea how it could be improved.

Anyway I think the sql cliff is real. Once you take a step outside the happy path prepare for a headache. For me sql definitely is in some local maxima, after all I use it every day at work.

3 comments

The biggest thing is... SQL is not reusable, period.

Why don't we have SQL libraries?

I know that data models are kind of special snowflakes, but some models pop up over and over and over again and code reuse is always 0 with SQL.

To give you an example of a common problem, SLAs or the like for teams with regular business hours.

A team has to respond to a request within N hours. To calculate that I need to take into account 8 business hours per day, excluding weekends, excluding holidays (ideally localized holidays), etc.

It's a nightmare with SQL. It's precisely the kind of thing you want in a library.

Plus, obviously, standard SQL doesn't have a way to share and distribute any libraries, even if they were made. It's pre-C in terms of stuff like that.

And the core issue is, sql is just strings. It only fails at runtime and not a compile time. There is no compile-time strong/static typing, you only find out its broken when you run it. So it makes it really difficult to re-use. The other problem is, you have to specify table and column names in queries (duh, how else), but that means the entire persistence model is hard coded and need manual tweaking and so on. In an ideal universe, each field would be generic, each object generic and each collection of objects generic and all functions being generic. You'd end up with one huge flat structure of fields that are linked to each other to form more complex objects. But that not efficient in current computers nor do we have enough memory to store all information like that. And you might end up back at square one where everything is just strings.
> The other problem is, you have to specify table and column names in queries (duh, how else), but that means the entire persistence model is hard coded and need manual tweaking and so on. In an ideal universe, each field would be generic, each object generic and each collection of objects generic and all functions being generic.

Yeah, define the database in a database. And then define the database for the definition of the database in a database.

If you'd like to deliver a product that does something, you've got to stop adding abstraction layers at some point.

The thing is, for sure we could define some use cases where it's possible to take that general case and make it specific. That's why I gave a concrete example.

I refuse to believe that absolutely every data storage and access in this world is unique.

Everybody believe that it's unique, and that's a different story.

OS vendors also thought their hardware was special and magical a long time ago and yet POSIX was invented and suddenly they were all more or less commoditized.

I feel that we're in the teen years of data storage/data access technologies. And SQL is sort of like dental braces.

A long long time ago I sent in a patch to Hibernate to check the validity of all declared HQL queries.

There's no reason why it can't be checked, you just need to have your schema declaration.

Name clashes are everywhere. That's why you have namespaces/packages. In SQL they are called schemas. People don't really use them these days.

The tweaking is one of the great things. If you're hardcoding queries (not sql), you're actually defining the order of operations etc. A query analyzer will use statistics, and you can hint how queries have to be executed, depending on the shape of your data.

Your code is also just strings, until you compile it. Actually, these days until you run it. Your arguments would make more sense in the 90s where people would actually compile code.

You ideal universe is actually "The Inner Platform Effect". Better let pgsql be the data platform ;-)

> Your code is also just strings, until you compile it. Actually, these days until you run it. Your arguments would make more sense in the 90s where people would actually compile code.

His arguments made sense in the 90s, and amusingly, post 2015.

Your argument made sense in the 00s and before 2015.

Swift is compiled (Apple platforms; Objective-C has always been compiled).

Rust is compiled (multiple platforms).

Typescript is compiled (so web/Javascript).

Kotlin is compiled (Android; Java has always been compiled).

C/C++ were always compiled (POSIX; Windows).

C# was always compiled (Windows; POSIX).

Almost every modern language is compiled and if it's not, it's getting a very solid static analysis step that for sure you want to have and run (PHP got types a while back, Python is getting them, Ruby is getting them).

The other problem is, you have to specify table and column names in queries (duh, how else), but that means the entire persistence model is hard coded and need manual tweaking and so on.

This resonates with me If I understand you correctly, with RDBMS/SQL, the structural decisions you make in the database to represent your data "poison" your application making it difficult to change over time.

The data model is the code reuse.

You can model business hours and SLAs with relationships. Join on time and team.

  SELECT support_request.request_time + team.response_sla AS respond_by_time
  FROM support_request 
  JOIN team_sla 
    ON support_request.assigned_team = team_sla.team_id
   AND DATE_PART('day', support_request.request_time) = team_sla.day_of_week
   AND DATE_PART('hour', support_request.request_time) BETWEEN team_sla.start_hour AND team_sla.end_hour;
It's quite impressive how you've missed the core of my message.

Now bundle up your proposal, put it up on Github, license it as MIT, and publish it available on sqlpm.org (SQL Package Manager) so that I can re-use it.

What's that you say? I can't? There's no sqlpm.org? Not even a postgresqlpm.org?

Where's the SQL ecosystem?

Oh, wait, there isn't any because SQL is not really reusable. It's <<all>> one-off scripts, like back in the Dark Ages of software development.

I always install https://github.com/awslabs/amazon-redshift-utils/tree/master... on my Redshift clusters.

What you are describing as code reuse exists for databases, but they are called applications and generally utilize a general purpose programming language. It doesn’t make sense to have a SLA data model library because every use case is different. It’s a database, not procedural code.

You must have not seen enterprise apps.

There's a reason monstrosities like SAP exists, they're practically what you describe.

If stuff like SAP is the future of Line Of Business (LOB) apps, instead of having a rich Open Source ecosystem of data storage and data access libraries, then we've lost.

We're locked in the trunk.

SQL is not modular (the root cause of why it’s hard to reuse) because it’s declarative. If the underlying data models stay relatively stable, you can reuse code with certain assumptions. There’s a trade off here for sure.
No that's not why it's not modular. If that was the case, then other relational algebra-based query languages would suffer the same problem. They do not.
SQL can be somewhat reusable through views. Also in your code you can make reusable functions that contain SQL queries. You won't get the best performance reusing the queries this way, but you can put them into a transaction and build up more complex flows from reusable pieces that way
Postgis and many other libraries exist. They are widely used.
Is Postgis implemented in SQL?
That's how I remembered it from back in early days. There was some pl/sql you had to execute, and you'd have some extensions. I don't think it's like that anymore
No man you are right. There is nothing arcane here: sql is just an ugly and unintuitive language. I think sql works the opposite ways normal brains work so unless you are doing a simple query, you have to be able to parse and simulate what sql will do in your head. After doing functions/MV's with CTE's that are 100+ lines per query daily, you kinda get used to but also not. I've concluded that although sql is awesome, it is also full of warts.

What helps me is to code all of it in lower case and use something like that Datagrip with a good theme. That way you get something that is readible, colour coded and has autocomplete (very good with joins). It's the only way I've managed to keep my sanity as my experience with it grew. Bad data models doesn't reallllly impact it that much, sql is still sql even with a clean model.

I've built mini database engines in the past because of my frustrations with sql but I still use and prefer an actual rdms as opposed to trying to reinvent the wheel. There are so many features we take for granted it's not even funny. Try building your own production-ready storage system and you would quickly appreciate how deep the rabbit hole really goes.

I have used those features that you say "Would be nice to have." I didn't realize they weren't ubiquitous. I agree they are excellent.