Hacker News new | ask | show | jobs
by latch 1672 days ago
> For some time proponents of unit testing have asserted that unit tests should not touch a real database

Is that still a belief in some circles? I feel like the shift away from this started like 15 years ago (largely because of RoR in my mind).

Anyways, this essentially launches a pg instance with a postgresql.conf that is aimed for speed (at the risk of possible data loss/corruption). DO NOT DO THIS IN PRODUCTION, but I just bake the following in local/test/CI:

     fsync = off
     synchronous_commit = off

Some other things I've seen / done in the name of testing with PG:

- Use unlogged tables. Maybe it's a bit faster, never really seemed to make much a different.

- Drop all foreign keys. This has significant non-performance impact. On the downside, it materially changes the environment that your tests are running vs your system: test could now pass for otherwise invalid code. On the plus side, it makes setting up fake data _a lot_ easier.

- Run tests with a pool_size of 1. This catches cases where you start a transaction, but forget to use the transaction object, e.g.:

    db.begin_transaction(tx ->
       tx.query("update x")
       db.query("delete y") // should have been tx.query
    )
- A robust Factory library (NIH or OSS) is pretty probably the most important thing you can do

- If you can't pre-seed all the test data, try to write _all_ your tests without ever having to truncate/delete tables between tests. This (a) avoids the slow delete/truncate and (b) lets you run tests in parallel. This means using random keys to avoid duplicates (obviously uuid is the simple answer here) and not selecting all data from a table when verifying/asserting (since who knows what's going to be in there).

4 comments

For real _unit_ tests? I would argue it's still a good distinction.

We use an embedded postgres in our DB tests, and we call those 'Integration Tests' and run them separately than the pure unit tests. While still tremendously valuable, they do take a bit longer to run, and currently aren't written to allow parallel tests running.

We've had a typical habit of writing most tests that hit the DB. Since applying a bit more discipline to remove the DB requirement, we've found it's a more pleasant experience with the quicker feedback loop in place.

Your last point is a good one - tests that can run in isolation and are agnostic to the presence of other data (and ideally clean up after themselves) tend to be handy.

Aside from testing framework concurrency limitations, why wouldn't you allow parallel tests?

If your running system speaks to the DB in parallel and handles pre-existing data, why wouldn't you want your tests to do the same?

That's a good point.

I'm using one local db in a docker container. And then all db-related integration tests are running in parallel (which is the default in Rust via "cargo test") on this local db.

Having much more confidence in my application/server if hundreds of (integration) tests are successfully accessing db in parallel.

Similarly I've called tests with a PG DB in a docker container an integration test cause they take longer to run.

What's really nice if you run the migrations once to setup the tests then subsequent tests can be fast. Each test run within a nested transaction and rolled back at the end. This ensures that each test get a clean DB to work off of with incurring the cost of running all the migrations.

Also if you have to create new databases for each test (because, say, the data model will not allow reusing it, or you want to run tests concurrently and not all the tests are transactional) creating one as cached template then initialising test-local dbs with `createdb -T <cache>` is extremely effective.

It’s also stupendously easy if you’re using pytest (though the concurrency caching is not there as IIRC xdist can’t reuse session fixtures):

* create a session fixture to initialise and yield the cache db (and clean it up afterwards) * create a regular test fixture which copies (using `createdb`) the template to a new database, and hands that off to the test

Then tests which need the db just have to request the second fixture.

> write _all_ your tests without ever having to truncate/delete tables between tests

This is exactly what we do, and it works really well. We essentially spin up a new tenant for each test suite. Forcing all tests to pass regardless of what other unrelated data is in the db is a great way to ensure cross-tenant data isolation.

The other underrated benefit is that you don't need a separate test database for local development - local dev instance and tests can use the same db. This means you can work on db changes in one place and see their impact on both tests and the running application.

> try to write _all_ your tests without ever having to truncate/delete tables between tests.

I've had really good success with running tests like this the way django does it, where you run your entire test within a migration, and then you just rollback the migration at the end of the test.

I assume that means running everything within a transaction?

Does that mean you can't use regular transactions in code, since Postgres doesn't support 'true' nested transactions? Or does the Django ORM automatically convert those 'inner' transactions into Postgres SAVEPOINTS?

Yes exactly. And yeah Django automatically maps transactions inside transactions to savepoints.