Hacker News new | ask | show | jobs
by hnarn 1027 days ago
> not all engineers can operate with SQL as efficiently as with code

I don’t mean for this to sound insulting but I honestly do not think this is an acceptable take to have as a developer.

Not knowing SQL is like refusing to learn any language that has classes in it, simply because you don’t like it.

I’ve heard stories of huge corporations failing product launches because some code was written to SELECT * from a database and filtering it in-app instead of doing the queries correctly, and what’s so fun with these types of issues is that they usually don’t appear until weeks later when the table has grown to a size where it becomes a problem.

When you’re saying that you’d rather find the data in-app than in-database, you’re putting the work on an inferior party in the transaction simply because you can’t be bothered.

The code will never* find the correct data faster than the database.

* there may be exceptions, but they’re far enough between to still say “never”.

2 comments

Dropping down to SQL to write a really complex query is, in my professional experience, always a poor use of time. It's far simpler to just write the dumb for-loops over your data, if you can access it.

Of course not all engineers can operate with SQL as efficiently as code -- that's the whole point. Otherwise why would we be writing code? Learning SQL intimately doesn't change that fact.

> Dropping down to SQL to write a really complex query is, in my professional experience, always a poor use of time.

We’re not talking about Assembly here, “dropping down” to SQL is something that anyone should be expected to do as soon as you’re grabbing or modifying any data from a database in any scenario where performance or integrity matters. The errors you can see in situations like this are extremely complex and databases literally exist to solve them for us.

Also, if we just completely disregard the performance for a second and focus on data security instead, how do you ensure sensitive data isn’t passed to the wrong party if you don’t care about what queries are being sent?

I mean, it doesn’t matter if it’s not “in the end” displayed to an end user in the application you’re writing, or if its not stored in the intermediary node where your code is running, that data is now unnecessarily on the wire in a situation where it never should have been in the first place. If you end up mixing one customers data with another’s and sending all of it in such a way that it could even theoretically be accessed by a third party, that’s a lawsuit waiting to happen regardless of whether it was “displayed” or “forwarded” or not.

Imagine if you sniffed the packets going to some logistics app you use on your phone and you saw meta-data for all packages in your zip code in the response, or if some widget showing you your carbon footprint actually was based on a response containing the carbon footprint of every customer in the database. Even if it’s just [user_id,co2] it’s still completely unacceptable.

Never mind scenarios where you’re modifying, adding or deleting data, those are even worse and no explanation should be necessary for why.

Obviously it greatly depends on what you're doing. If you're using a relational database as a glorified key-value store for offline or batch processing of a few hundred megabytes of data, sure. Hell, just serialize and unserialize a JSON document on every run if it's small and infrequent enough ¯\_(ツ)_/¯

If you've got a successful data hungry web service with a reasonably normalized schema and moderately complex access patterns though, you're not going to be looping over the whole thing on every page load.

SQL is definitely worth learning. Recently found that processing a 350kb json is equally fast by sending it to Postgres for processing, compared to using some dedicated Java libraries: https://ako.github.io/blog/2023/08/25/json-transformations.h...

This opens some interesting options if you want to join the result with data from your database.

Way way way slower though. I just added something to our app that took 600ms for the naive ‘search and loop’ version (and kept getting slower the more items you needed, completely unscaleable) vs the 30ms for the ‘real SQL query’ version. Guess which version got actually committed.
Did your for loop solution include concurrent access by multiple clients? I highly doubt "engineers can not operate with SQL as efficiently as code" can implement anything even remotely as robust as what SQL DBMS offer even for basic use cases. Are you mutating data? What will happen if the system crashes in the middle of the mutation? How are you handling concurrent writes and reads?
It's unclear whether you mean that it's simpler to make a query and iterate over the rows to massage the result in your application or to make a query and then iterate over the returned rows to make more single-row queries. (Or perhaps some secret third thing I'm not considering.)

I'll admit I'm a little curious about what exactly you mean here.

SQL is code.
There’s a difference between writing olap and oltp sql queries. Hell, in the industry we even have a dedicated role for people who, among other things, write olap queries: data analysts. I’m assuming here that we are talking about writing complex analytical queries.