Hacker News new | ask | show | jobs
by xyzzy_plugh 1025 days ago
Dropping down to SQL to write a really complex query is, in my professional experience, always a poor use of time. It's far simpler to just write the dumb for-loops over your data, if you can access it.

Of course not all engineers can operate with SQL as efficiently as code -- that's the whole point. Otherwise why would we be writing code? Learning SQL intimately doesn't change that fact.

7 comments

> Dropping down to SQL to write a really complex query is, in my professional experience, always a poor use of time.

We’re not talking about Assembly here, “dropping down” to SQL is something that anyone should be expected to do as soon as you’re grabbing or modifying any data from a database in any scenario where performance or integrity matters. The errors you can see in situations like this are extremely complex and databases literally exist to solve them for us.

Also, if we just completely disregard the performance for a second and focus on data security instead, how do you ensure sensitive data isn’t passed to the wrong party if you don’t care about what queries are being sent?

I mean, it doesn’t matter if it’s not “in the end” displayed to an end user in the application you’re writing, or if its not stored in the intermediary node where your code is running, that data is now unnecessarily on the wire in a situation where it never should have been in the first place. If you end up mixing one customers data with another’s and sending all of it in such a way that it could even theoretically be accessed by a third party, that’s a lawsuit waiting to happen regardless of whether it was “displayed” or “forwarded” or not.

Imagine if you sniffed the packets going to some logistics app you use on your phone and you saw meta-data for all packages in your zip code in the response, or if some widget showing you your carbon footprint actually was based on a response containing the carbon footprint of every customer in the database. Even if it’s just [user_id,co2] it’s still completely unacceptable.

Never mind scenarios where you’re modifying, adding or deleting data, those are even worse and no explanation should be necessary for why.

Obviously it greatly depends on what you're doing. If you're using a relational database as a glorified key-value store for offline or batch processing of a few hundred megabytes of data, sure. Hell, just serialize and unserialize a JSON document on every run if it's small and infrequent enough ¯\_(ツ)_/¯

If you've got a successful data hungry web service with a reasonably normalized schema and moderately complex access patterns though, you're not going to be looping over the whole thing on every page load.

SQL is definitely worth learning. Recently found that processing a 350kb json is equally fast by sending it to Postgres for processing, compared to using some dedicated Java libraries: https://ako.github.io/blog/2023/08/25/json-transformations.h...

This opens some interesting options if you want to join the result with data from your database.

Way way way slower though. I just added something to our app that took 600ms for the naive ‘search and loop’ version (and kept getting slower the more items you needed, completely unscaleable) vs the 30ms for the ‘real SQL query’ version. Guess which version got actually committed.
Did your for loop solution include concurrent access by multiple clients? I highly doubt "engineers can not operate with SQL as efficiently as code" can implement anything even remotely as robust as what SQL DBMS offer even for basic use cases. Are you mutating data? What will happen if the system crashes in the middle of the mutation? How are you handling concurrent writes and reads?
It's unclear whether you mean that it's simpler to make a query and iterate over the rows to massage the result in your application or to make a query and then iterate over the returned rows to make more single-row queries. (Or perhaps some secret third thing I'm not considering.)

I'll admit I'm a little curious about what exactly you mean here.

SQL is code.