A discussion with Katharine Jarmul, kjam, about some of the challenges of data science with respect to testing.
Some of the topics we discuss:
* experimentation vs testing
* testing pipelines and pipeline changes
* automating data validation
* property based testing
* schema validation and detecting schema changes
* using unit test techniques to test data pipeline stages
* testing nodes and transitions in DAGs
* testing expected and unexpected data
* missing data and non-signals
* corrupting a dataset with noise
* fuzz testing for both data pipelines and web APIs
* datafuzz
* hypothesis
* testing internal interfaces
* documenting and sharing domain expertise to build good reasonableness
* intermediary data and stages
* neural networks
* speaking at conferences
Some of the topics we discuss:
* experimentation vs testing * testing pipelines and pipeline changes * automating data validation * property based testing * schema validation and detecting schema changes * using unit test techniques to test data pipeline stages * testing nodes and transitions in DAGs * testing expected and unexpected data * missing data and non-signals * corrupting a dataset with noise * fuzz testing for both data pipelines and web APIs * datafuzz * hypothesis * testing internal interfaces * documenting and sharing domain expertise to build good reasonableness * intermediary data and stages * neural networks * speaking at conferences