Hacker News new | ask | show | jobs
by commandlinefan 1633 days ago
> Truncating the DB between every test is indeed horrifically slow

Using a database at all in unit tests is horrifically slow - one of the (many) reasons you shouldn’t.

6 comments

They’re for integration tests not unit tests. Although the distinction is frequently treated as something that means something by purists, I only use it as a way to distinguish conceptually how many complex layers are being stacked since both run under “unit test frameworks” usually for reporting and assertion purposes. I view mocking as usually an anti-pattern. Careful DI usually gets you far enough and is easier to work with. You want the code under test to resemble what’s happening as much in production as possible. The more “extra” you have, the more time you’re wasting maintaining the test infrastructure itself which is generally negative value (your users don’t care about the feature being late because you were refactoring the codebase to be easier to test each function in isolation.

Empty databases should generally start quickly unless there’s some distributed consensus that’s happening (and even then, it’s all on a local machine…). You also don’t even need to tear it down all the way - just drop all tables.

Ultimately what matters is the entire application, database queries and all, works. I think calling out to the DB in tests is important for ensuring the entire app works.

And tests hitting the database can be fast: https://www.brandur.org/nanoglyphs/029-path-of-madness

This is true. I think however this is not relevant to unit tests. If you choose not to do unit tests, because they're not valuable in your software compared to automating end to end tests, that's fair enough, but on the topic of unit tests, talking to a db isn't really a thing.
Transactions / savepoints and parallelism make a huge difference. I have an app using Ecto and PostgreSQL, and running its ~550 tests takes under 5 seconds. Almost all of them hit the DB many times. The DB is empty and each test starts from a blank slate, inserting any fixture it needs.

An important trick when doing this is to respect unique constraints in fixtures. For instance if you have a users table with an email column as primary key, make the user fixture/factory generate a unique email each time ("user-1@example.com", "user-2@example.com", ...) Then you don't get slowdowns or deadlocks when many tests run in parallel.

One supposes horrifically slow might be a bit subjective.

I notice in a VM on my laptop establishing the initial connection to postgres seems to take 2-3ms, and running a trivial query takes 300-1000us.

I routinely involve the database in unit tests, it is certainly slower but my primary concern is the correct behavior of production code which uses real databases.

If testing using the db is slowing you down that means the test has discovered slow code, and worked, not that you should get rid of the test.
It depends on what is under test. If you're testing a model file that is highly coupled to the database, and whose entire purpose is more or less to function as an interface to the DB, tests need to include the DB almost by necessity. The alternative is to mock so much out that you're essentially testing your mocked code more than the unit under test.
What is the purpose of automated testing? Is it to ensure the code works correctly or is it to "run fast"?
To be able to say to your bosses that you have 100% code coverage.

No, I agree. Hitting the database is slower but not that much slower (at least if you use PostgreSQL and do rollback after every test). And since the goal is correctness I think that this performance hit is small enough to be worth taking.

> to ensure the code works correctly

It's to ensure the code works correctly and indicate where the problem is, whenever it doesn't work correctly. Querying a live database during unit tests fails on both accounts. It doesn't tell you whether or not the code works correctly - it tells you either that the code didn't work correctly or that the database wasn't available at the time the test ran.

Well both things are problems which is nice to know about so you can fix them. Certainly better than not knowing about either problem.