Hacker News new | ask | show | jobs
by Everlag 2171 days ago
I am curious how many people mock their SQL queries and how they decide to mock vs not mock at that level.

From the perspective of the services I'm knee deep in at the moment, SQL feels like the wrong layer to mock at. Instead, when we do want to mock, we usually pass around mocked out data access objects. That way we work at the application layer of `GetN` rather than `SELECT a, b, c FROM N`.

It feels like a lot of the value of the tests we write that do execute sql is a result of them actually running it. Granted, it is slower than mocking, requires synchronous rather than parallel tests, and we need to wipe the impacted tables for every test. However, the value we get is that we know that a query actually has the intended semantic.

6 comments

In unit tests, I like to only test a single method in isolation, with all of it’s dependencies mocked. So in a domain layer method, that depends on some infrastructure layer code that itself does the DB access, I mock the DB access layer. And the DB access layer already hides the SQL, so you’re mocking “normal” objects/methods, not SQL.

However, when unit testing the DB access code itself, I like to connect to a real-but-embedded database (for example with JVM projects, these guys have some great embedded DB libs: https://github.com/flapdoodle-oss), with a different DB per test class (different test classes run concurrently), but no cleanup between test cases (which run sequentially). This lets me test the SQL queries themselves.

Integration tests are a different story, they run against actual deployed versions of services, with real everything (including databases), nothing is mocked.

Typically there is no issue with running parallel tests against a real database. I just wrap every test case in a transaction and turn the transactions into subtransactions. This way you can just rollback at the end of each test.
We typically have multiple options in our tests - some SQL logic is testable in sqlite itself, while others (crap like arrays and datediffs) we try to run on an actual postgres instance. Snowflake has been the most annoying, since it doesn't support transactions and is also relatively slow anyways.
I agree with you. SQL needs to be integration tests. In this era of Docker spinning up a DB should be easy.
Yep, this is the pattern I usually use. It's useful to put tests that hit a real DB behind a build flag, for situations where a DB might not be present. Also since I prefer to bake migrations into my application, those get thoroughly tested as well.
We require that CI builds are self-contained. So if you need a database, run it. Others mentioned Docker. Specifically there is Testcontainers. If you need cross-service integration, those don't belong in your CI builds. (So the tech/setup can also be a bit different.)
I may be naive, but do people with slow DB tests put the DB in a tmpfs to improve performance?
We do this all the time. Our tests run against SQLite (I use Mariadb in prod).

Each parallel testrun gets its own path in /dev/shm/xx so everything is in ram as well.

If you're using SQLite, why the indirection of an in-memory filesystem rather than SQLite's in-memory database feature?
Not the person you're asking this, but N connections to sqlite :memory: backend results in N separate database instances.

This can be worked around in some scenarios, but not always & reliably. Most importantly, you can't reach the same in-memory database from multiple processes.

https://www.sqlite.org/inmemorydb.html

I have used libeatmydata to help with this, but you just made me realize that in my current project I could just dump the test db to RAM. Thanks for the tip!
For integration testing I'm sure people do all sorts of "interesting" things. For unit tests in an automated CI build, it is often more desirable to have a smaller subset of data, which presumably is less slow, since tests there also focus on more isolated workflows.