Hacker News new | ask | show | jobs
by pquki4 731 days ago
For a script, you run it from start to end.

For a notebook/repl environment, you can create any number of intermediate steps, rerun the previous step with minor modifications and check if the results are better, rinse and repeat. For jupyter notebook specifically, you can visualize data and add markdown inline which are very useful.

You won't understand it unless you are already familiar with the workflow.

1 comments

Sometimes I need to run my code in small pieces for testing and evaluation, I do this with multiple smaller scripts. Data can be saved to files and this accomplishes nearly the same thing as a notebook without needing to have the notebook environment
Sure, if you don't mind the overhead (both development and processing time) of loading/saving state to disk (about 10-30 minutes for a lot of my data). In notebooks you don't have to think about it since it's just the objects in memory (and indeed they don't make it easy to think about it, which is a reasonable criticism, but I don't currently know of a framework or system which gives you the advatages of both).
Whenever I work with large datasets I use a small subset of the overall data to do testing while I build the pipeline, this avoids long run times and allows for quick iteration while I get things set up to run against the full dataset