Hacker News new | ask | show | jobs
by harrisonjackson 2256 days ago
For a long time, we ran our Django test suite against an in-memory SQLite DB. It was super fast, which encouraged more tests to be written and a CI/CD process that allowed everyone to confidently ship code often.

Our production database is postgres.

We kept bumping against things we wanted to do in the application code that worked well with postgres, but would fail in sqlite. We limited our development so that we could keep running the tests. We knew that we could run them against a postgres db, but the development time to rewrite our CI test runner to spin up a fresh database was not worth it.

Recently we migrated from github + Jenkins to gitlab + our own gitlab-runners in AWS. During that switch, we prioritized testing against a Postgres DB and got it all running in a containerized way that spins up at fresh DB for every test run. The tests are slower but the runners scale horizontally so we don't mind queueing up as many merge requests as we need to and deployments through Gitlab environments are a big improvement over our Jenkins deployment job, so we still ship as often as we want.

Now our biggest testing issue is keeping fixtures up to date.

4 comments

It isn't clear if you're already doing this, but Postgres has a 'create database from template' feature. You can initialize your template database with migrations once for a whole test suite, then clone a new database at the start of each test.

It's quite fast. I run almost all of my tests this way.

I agree, forcing your app into the lowest common denominator of portable SQL is crippling. JSONB columns in particular are extremely useful in Postgres.

Nice! I did not know Postgres had that. We'll probably do something similar next time we prioritize improvements to the test runner.

Right now we are pulling a postgres:10 container down from our ECR on every run :laughcry: so definitely some low hanging fruit around that.

I think we will rebuild the postgres container up to our most recent migrations in prod branch then bake that container onto the gitlab runner AMI daily. Then the test runner can just start that and apply any migrations in the merge request and proceed with the tests.

It obviously makes sense that creating a new PG instance will be slower than creating a sqlite database. There also is some inherent speed difference for simple queries just by virtue of sqlite not needing to context switch to a separate process and to marshal the query/results across process boundaries.

But if the difference is more substantial than those factors would suggest, I'd be interested to see if we can do something about it from the PG side.

if you keep using sqlite for tests it forces Your database logic to be universal. You could confidently switch to other db like mysql any time.
It also means that you have to stick to the lowest common denominator, which is approximately equivalent to the state of the art as of a quarter century ago.

Every benefit has a cost. The benefit doesn't always justify the cost.

edit: To add to that - I've seen more than a couple major database engine migrations in my day, so it's not to say that that isn't a concern. But none of them has ever been from one SQL RDBMS to another. More common is migrating among different classes of database. MySQL to Mongo, Oracle to BigTable, Couch to Cassandra, something like that. MS Access to MS SQL Server a couple times, but even those are different enough that it was never going to be as simple as changing the connection string and having a carefree life.

The speculative future proofing that you do almost never manages to work for the future you end up actually living.

This also means that you have to stick to the lowest common denominator between all databases.
This was specifically why we moved away from sqlite. We wanted to take advantage of features postgres offers that are not universal.
Can't you use transactions to set up and tear down fixtures? That should be a lot faster than spinning up a new database every time.
Yes, that is how the test suite handles it. The CI uses a fresh db for each run of the test job but within that job, it uses 1 instance of the database.

Locally we have a flag to keep the test db alive between runs which speeds the tests up and can help with debugging.

The slowest part of the test run in the CI is building the application container and pulling down the postgres container. I'm sure there are improvements to how we are handling this but it isn't enough of an issue to prioritize it now.

Our issue with fixtures has more to do with changing application code and not having a great way to generate/regenerate the fixtures from live data. We've tried a few different libraries to do this but haven't found any that we love.