Hacker News new | ask | show | jobs
by jason2323 1277 days ago
> It's quite another when experienced seniors ban the use of SQL features because it's not "modern" or there is an architectural principle to ban "business logic" in SQL.

While I agree with the idea of pushing heavy compute to where the data resides, I wholeheartedly disagree with the statement quoted above. Concentrating business logic in SQL makes it effectively untestable. Its not easy to "compartmentalize" SQL code such that each individual piece is testable on its own. Often, this lets major issues go unnoticed until the SQL query executes in production on some unexpected input data and people are woken up at 3AM. Adding on top of this the fact that SQL exceptions are a PITA to debug on a normal day, and it probably isn't any easier at 3AM

5 comments

Are you talking about stored procedures? SQL itself is super easy to test: insert data into tables, run query, compare output. It's also easy to test queries on production data since every db has a repl. Queries are also mostly composed of read-only, referentially transparent parts so it's super easy to take snippets and test/run them in isolation. For complex updates you join the table you want to update to a read-only query with all the logic to calculate the new values.

SQL is probably the easiest language there is to write well tested, composable, easy to debug code.

> Queries are also mostly composed of read-only, referentially transparent parts so it's super easy to take snippets and test/run them in isolation

True enough, but now you need to ensure that the snippets copied out into tests are in sync with the in-line versions embedded inside your 6 screen long sql query.

In mssql you have "inline tabled valued functions" which you can declare and reuse, and they are inlined and optimization happens across them. We use sqlcode [1] to get around weaknesses in deploying stored functions.

Big drawback though is that functions can only take scalar arguments, not table arguments.

Another method is to just have a bunch of CTE clauses, and append something different to the end of the CTE depending on which parts of it to use (i.e. some string assembling required).

[1] https://github.com/vippsas/sqlcode

Easy to debug and write well-tested - mostly yes. Composable is tough though (make sure all your table aliases are unique, that there are no unambiguous column names, that all your ANDs are in place and don't forget that 1=1 to make it easier to uncomment parts. You already need a somewhat sophisticated query builder just to compose multiple JOINs cleanly. Recently I needed to get a query builder-ish thing going with arbitrary number of conditions which should be done as JOINs and the most I could muster to keep it at least a little manageable was a "pkey IN({literal_subquery})". It does compose as in "yes you can do it" but I wouldn't say it composes very conveniently.
Excuse me, but can you show me where in the SQL language documentation examples I can find how info about its unit test harness framework, teedeedee best practices, mocking, shimming, and dependency injection? And that's just the bare minimum but I start with those as the first thing to learn in any new language.
At my first real programming job, I found an error in the SQL functions that the company had written. After pointing it out, the CEO bet me that I couldn't fix it. Apparently their best programmers had tried and failed.

I did end up fixing it, but it took me a couple weeks.

Beyond being untestable, it's also hard to version control and roll back if there's a problem. We had some procedures around SQL updates that were kind of a pain because of that.

Regarding SQL being "untestable"...

I have put SQL DDL+DML+stored procedures in version control, create/run stored procedure (TDD) unit/integration tests on mock data against other stored proceedures, had pass-fail testing/deployment in my CICD tool right alongside native app code, and done rollback, all using Liquibase change sets (+git+Jenkins).

Using Liquibase .sql scripts for version control isn't hard. Testing is always more-work but it's doable.

I don't completely disagree with you on rollback though as hard, at least full pure rollback-from-anything. Having built tooling to do it once with Liquibase I found the effort to guarantee rollback in all circumstances took more effort than it was worth. A lot of DDL and code artifacts and statements like TRUNCATE are not transaction safe and not easy to systematically rollback. Liquibase did let you specify a rollback SQL command for every SQL command you execute so you could make it work if you had the time, but writing+testing a rollback SQL command for every SQL command you execute wasn't worth it and is indeed materially more effort than just rolling back to earlier .war/.jar/.py/docker/etc files. (The latter are easier in part because they are stateless of course.)

In any case, something like Liquibase can get you a long ways if you have the testing mindset. (Basically it lets you execute a series of SQL changesets and you can have preconditions and postconditions for each changeset that cause an abort or rollback.)

If you mean 5+-page SQLs are untestable, I don't disagree. Like any code you need to break it up once it gets past a certain size.
> Like any code you need to break it up once it gets past a certain size.

And SQL doesn't give you the tools to do that. There's no easy or natural way to split a statement into smaller parts.

SQL doesn’t give you tools to break up code?? Err, I don’t agree there. I can think of four options just off the top of my head.

Simplest is to use WITH clauses (common table expressions/CTEs). They can help readability a lot and add a degree of composability within a query.

Second, you can split one query into several, each of which creates a temporary (or intermediate physical) table.

Third, you can define intermediate meaningful bits of logic as views to encapsulate/hide the logic from parents. For performance you materialize them.

Fourth, you can create stored procedures which return a result set and like any procedural or functional language chain or nest them.

These techniques are available in most databases. More mature databases support forms of recursion for CTEs or stored procedures or dynamic SQL.

As with most programming, proper decomposition and naming helps a fair bit.

> Beyond being untestable, it's also hard to version control and roll back if there's a problem.

Recently, I've learned that SQLServer supports synonyms. So you version functions / procedures (like MySP_1, MySP_2, etc...) and establish a synonym MySP -> MySP_1. Then you test MySP_2 and when ready, change the synonym to point to MySP_2. Of course, all code uses just the synonym.

We use sqlcode as a more general solution to this:

https://github.com/vippsas/sqlcode

I'm not sure where this idea comes from. There are unit testing frameworks for T-SQL and pl/sql. Stored procedure code can be version controlled like any other code.
We care A LOT about testing everything, always.

We use SQL container [edit: new, freshly cloned DB for a test function in less than a second] and find it to be no problem in practice to write integration tests for our pieces of Go code that calls the SQL queries with some business logic in them.

(No, we don't do a sprawling mess of stored procedures calling each other. We just try to not move data over to backends ubless we really need to)

If you really need complex flow, making connection scoped temp tables for temporary results in SQL and having the composability/orchestration through backend functions calling each other and passing the SQL connection between them is doable.

Yes you cannot unit test every small line of SQL, but since SQL is much higher level that isn't really needed. Test the functional behaviour / inputs/outputs and you are fine..

It really isn't different for writing for a GPU in a sense.

> Concentrating business logic in SQL makes it effectively untestable. Its not easy to "compartmentalize" SQL code such that each individual piece is testable on its own.

What are you talking about? SQL is just as easy to compartmentalize and test as anything else.

Each query statement belongs in a function -- there, compartmentalized. Now set up a table state, run the function, and compare with new table state. The test either passes or fails.

Also no idea why you'd think SQL is a PITA to debug. It's a relatively compact and straightforward language once you learn it, and queries are self-contained. It's generally much easier to debug a query than it is to debug something happening somewhere across 10,000 LOC across 400 functions.

It's totally possible to validate data before pushing to a table, and have normal tests around sql massaging it. Saying this as somebody looking at 1000s of transformations done by my teams.

Admittedly, It took a while for data engineers in the industry to accept these practises though.

PS but stored procedures and udss are evil, yes