Hacker News new | ask | show | jobs
by mattexx 3740 days ago
The best-case outcome of many efforts of data scientists is an artifact meant for a machine consumer, not a human one.

I posit this outcome is absolutely necessary for any data science project to be worthwhile in any organization.

In the case where the project produces a report and goes no further, you still need the data and code for reproducibility, one of the main principles of the scientific method [1].

In the case where the project gets handed off to engineers to re-implement, reproducibility is even more critical, since the engineers best effort to reproduce the code will almost certainly not be successful the first time, and you will need to validate many versions of the production model. Doing this by hand even once it wasteful, doing so many times is tragically so.

In the case where the data scientists can produce a service worthy of production use, kudos!! But understand the caveat that in truly big data or big compute flows, this outcome remains highly unlikely.

[1] https://en.wikipedia.org/wiki/Reproducibility