Hacker News new | ask | show | jobs
Testing in Data Science with Katharine Jarmul (podcast) (testandcode.com)
1 points by variedthoughts 3123 days ago
1 comments

A discussion with Katharine Jarmul, kjam, about some of the challenges of data science with respect to testing.

Some of the topics we discuss:

* experimentation vs testing * testing pipelines and pipeline changes * automating data validation * property based testing * schema validation and detecting schema changes * using unit test techniques to test data pipeline stages * testing nodes and transitions in DAGs * testing expected and unexpected data * missing data and non-signals * corrupting a dataset with noise * fuzz testing for both data pipelines and web APIs * datafuzz * hypothesis * testing internal interfaces * documenting and sharing domain expertise to build good reasonableness * intermediary data and stages * neural networks * speaking at conferences