|
|
|
|
|
by MakeUsersWant
1251 days ago
|
|
I'd love property-based testing in big data cluster computing applications done in Databricks/Spark. However, (Py)Spark is very high-latency, even in local mode, so Hypothesis would have to generate a list (DataFrame) of thousands of test cases to be evaluated in parallel. That is where I got stuck. Has anybody ever done this successfully? |
|
For example, I've used Hypothesis to test some browser-automation, which uses the ChromeController package to launch a Chrom(ium) browser to take screenshots and print-to-PDF. The tests do things like:
- Generate random HTML
- Write it to a temp file
- Launch Chromium, set its window width+height, and navigate to that file:// address
- Take a PNG screenshot
- Use a PNG library to assert we've got a valid PNG, of the given width + height
There are similar tests for print-to-PDF (checking that it's a valid PDF with at least one page), etc.
The only fiddling I had to do was put `deadline=1000` in the `@settings` decorators. This prevents Hypothesis giving up on a test run too early; it automatically runs the tests fewer times, so it stays within a reasonable time frame.
These sorts of tests are good for sanity-checking that we're plugging things together in the right way; but I wouldn't rely on them checking enough times to e.g. catch arithmetic edge-cases, etc.