| I've worked as a data scientist for quite awhile now in IC, lead and manager roles and the biggest thing I've found is that data scientists cannot be allowed to live exclusively in notebooks. Notebooks are essential for the EDA and early prototyping stages but all data scientists should be enough "software engineer" to get their code out of their notebook and into a reusable library/package of tools shared with engineering. The best teams I've worked on the hand off between DS and engineering is not a notebook, it's a pull request, with code review from engineers. Data scientists must put their models in a standard format in a library used by engineering, they must create their own unit tests, and be subject to the same code review that engineer would. This last step is important: my experience is that many data scientists, especially coming from academic research, are scared of writing real code. However after a few rounds of getting helpful feedback from engineers they quickly realize how to write code much better. This process is also essential because if you are shipping models to production, you will encounter bugs that require a data scientist to fix that an engineer cannot solve alone. If the data scientists aren't familiar with the model part of the code base this process is a nightmare, as you have to ask them to dust of questionable notebooks from months or years ago. There are lots of the process of shipping a model to production that data scientists don't need to worry about, but they absolutely should be working as engineers at the final stage of the hand off. |
Browsing code, underlying library imports and associated code, type hinting, error checking, etc., are so vastly superior in something like Pycharm that it is really hard to see why one would give it all up to work in a Notebook unless they never matured their skillsets to see the benefits afforded by a more powerful IDE? I think notebooks can have their place and are certainly great for documenting things with a mix of Markdown, LaTeX and code, as well as for tutorials that someone else can directly execute. And some of the interactive widgets can also make for nice demos when needed.
Notebooks also make for poor habits often times and as you mentioned, having data scientists and ML engineers write code as modules or commit them via pull-requests helps them grow into being better software engineers which in my experience is almost a necessity to be a good and effective data scientist and ML engineer.
And lastly, version controlling notebooks is such a nightmare. Nor is it conducive to code reviews.