|
|
|
|
|
by urthor
1240 days ago
|
|
fwiw, you can share a spark session between unit tests. Even persist a spark session throughout the day so your tests run against a hot session. Straight TDD with spark is perfectly fine if you know what you're doing. I'm not saying it's easy or there's an easy guide somewhere, but it's possible. If you're using Pyspark via the API, it's likely an incredibly important part of your process. |
|
Our CICD platform and their owners get unhappy if we spawn an ad hoc spark session for testing purposes.
There is also a general expectation that unit tests are self contained and portable. So you could execute them in mac, linux, and arm ISA without much effort.
Another point was that we need to make this mocking or test setup easy because data scientist and ML Modellers are the most important persona who needs to write these tests ideally.
So mocking the data source with an abstraction layer and passing pandas dataframes, worked reasonably well for our use case.