Hacker News new | ask | show | jobs
by taeric 3120 days ago
And yet,"more data in spreadsheets" is a truism. The spreadsheet model is empirically one of the most successful ones we have.

Worse, so called "notebooks" are neat, but they are just as bad for engineering best practices as spreadsheets are.

Not to say that you shouldn't be able to grow from freeform practices. You should. Just don't mistake them for being somehow superior to spreadsheets.

1 comments

You can very easily move some functions out of a notebook and into a proper module, that can be included in a program / production system. Autoreload extension in Python Jupyter notebooks makes it extra nice. It is also easy, and convenient to write py.test style tests inside a notebook, just call them manually inside the block. Then when you move it out to module, you have tests that will automatically be picked up by your testrunner.
You can. Most don't. And moving something from a free form environment to a tested continuous delivery one, is not automatic.

Luckily, the performance constraints of most environments are such that python is not an automatic deal breaker nowadays. That said, correctness of code proofs are usually different from correctness of machine learning algorithms. Such that mixing them seems to just fool both sets of practitioners.

My point is mostly that it that Jupyter far from as bad as spreadsheets (at least commonly used implementations).

Correctness of machine learning is a complicated problem generally.

Meh. With some of the practices I've already seen, at least spreadsheets always show the data. You send me a spreadsheet, I can verify most of it. Send me a notebook, I can typically only audit what you did.