Hacker News new | ask | show | jobs
by MakeUsersWant 1251 days ago
I'd love property-based testing in big data cluster computing applications done in Databricks/Spark. However, (Py)Spark is very high-latency, even in local mode, so Hypothesis would have to generate a list (DataFrame) of thousands of test cases to be evaluated in parallel. That is where I got stuck. Has anybody ever done this successfully?
1 comments

There are lots of dials that can be tweaked; e.g. timeouts, number of tests, etc.

For example, I've used Hypothesis to test some browser-automation, which uses the ChromeController package to launch a Chrom(ium) browser to take screenshots and print-to-PDF. The tests do things like:

- Generate random HTML

- Write it to a temp file

- Launch Chromium, set its window width+height, and navigate to that file:// address

- Take a PNG screenshot

- Use a PNG library to assert we've got a valid PNG, of the given width + height

There are similar tests for print-to-PDF (checking that it's a valid PDF with at least one page), etc.

The only fiddling I had to do was put `deadline=1000` in the `@settings` decorators. This prevents Hypothesis giving up on a test run too early; it automatically runs the tests fewer times, so it stays within a reasonable time frame.

These sorts of tests are good for sanity-checking that we're plugging things together in the right way; but I wouldn't rely on them checking enough times to e.g. catch arithmetic edge-cases, etc.