Hacker News new | ask | show | jobs
by baskethead 1633 days ago
You must have this backwards.

Truncating a table is extremely fast. Rolling back a transaction is very slow. If you're not seeing this then there's something wrong with your setup.

4 comments

(Not grandparent commenter) I think you're usually right but I doubt it makes a difference at the scale of 2-5 objects created in a test case. The big game changers IME are in-memory dbs (SQLite) or parallel execution of tests.

This idea of "transaction rollback in test teardown because performance" has a life of its own. The recommended base class for django unit tests (informally recommended, via code comments, not actual docs) uses transaction rollbacks instead of table truncation [0].

On top of this, I think, db truncation gets mixed up with table truncation sometimes too. For example, from OP:

> The time taken to clean the database is usually proportional to the number of tables

... only if you're truncating the whole db and re-initializing the schema, no?

And people sometimes actually do clear the whole db between tests! One unfortunate reason being functionally necessary data migrations that are mixed up with schema-producing migrations, meaning truncating tables doesn't take you back to "zero".

[0]: https://docs.djangoproject.com/en/2.2/_modules/django/test/t...

> ... only if you're truncating the whole db and re-initializing the schema, no?

Nope. In PostgreSQL the cost of truncating tables is proportional to the number of tables while doing a rollback is constant time (and a low constant at that, less than a commit for example).

In other databases like MySQL I believe truncating data is still proportional to the number of tables while rollback is proportional to the amount of data inserted or updated. So which is cheaper depends on your application.

This is definitely not true in PostgreSQL. In PostgreSQL rolling back a transaction just requires writing a couple of bytes of data while truncating requires taking a lock on every table you want to truncate and then for every table probably write more data than the rollback required and then you need to do the commit which is also more expensive than a rollback.
Not with PG. A couple weeks ago I was working on a project that used truncation cleaning, the test suite took 3m40s. I switched it to cleaning with transactions and the the test suite ran in 5.8s.

Truncation cleaning is extremely slow, not only because the cleaning is slower but because you actually have to commit everything your test code does.

I used to effectively do this in postgres with rsync on a known fixture snapshot of the data files. It would usually take under two seconds to reset the state and restart the servers, which was easily fast enough to do effective TDD.

I had a few other ideas to speed it up, also.

There's also creating a template database that exists at a known good state, and using that database template to CREATE DATABASE from.

https://www.postgresql.org/docs/current/manage-ag-templatedb...