Hacker News new | ask | show | jobs
by notjack 3298 days ago
Some property testing tools like Python's Hypothesis[1] allow specifying specific example values for properties in addition to a general set of values so you get some specific deterministic tests. Hypothesis also saves falsifying values it found previously in a database that it reads from the next time you run it.

[1] http://hypothesis.works/

1 comments

I'm curious how the database works with, say, external CI systems like Travis and on multi-developer projects? Is it (or can it be, sensibly) committed to the repository or otherwise persistent with/near the code so that it transfers across machines and everyone gets the same testing environment?

Of course, with randomised testing, there's an inherent non-reproducibility, so maybe this isn't as unfortunate as it sounds?

For CI systems like Travis, people add it to the cached directories, and it's shared between runs. I know Travis, Circle and AppVeyor all have some way to cache data between runs – nominally for dependencies, but .hypothesis works too.

According to our docs (http://hypothesis.readthedocs.io/en/latest/database.html?hig...), you can check the examples DB into a VCS and it handles merges, deletes, etc. I don't know anybody who actually does this, and I've never looked at the code for handling the examples database, so I have no idea how (well) this works.

If tests do throw up a particularly interesting and unusual example, we recommend explicitly adding it to the tests with an `@example` decorator, which causes us to retest that value every time. Easier to find on a code read, and won't be lost if the database goes away.

(Disclaimer: I'm a Hypothesis maintainer)

I think the default storage format Hypothesis uses is a flat file with a diff-friendly format so it's easy for developers to check it into source control, and it's easy for patches to update the database without exploding the git repo size due to giant binary diffs. Sqlite3 might also be an option but I'm not up to date on the details. As a neat side effect of the diff-friendly format, it's easy to review new falsifying inputs added to the database in pull requests.
> a diff-friendly format ... it's easy for patches to update the database without exploding the git repo size due to giant binary diffs

Interesting - I understood that Git stores whole files, not diffs, so I'm surprised this is a significant feature.

I'm pretty sure git stores diffs not just whole files everytime.