Hacker News new | ask | show | jobs
by comment_guy 2334 days ago
I don't get why anyone one who knows how to use an IDE would ever use a notebook, the coding experience is garbage in comparison. I understand they started as a way to get STEM kids coding quick, but now they are like a standard in data analysis and data science, with those people needing experienced devs to translate the notebook into production code. This just drives the silo walls up higher.
2 comments

Doing data science in an IDE would be terrible. With a notebook, you get the chance to load the data, view it, clean it where needed, view it again, analyze it, model it and do anything else you need to it. An IDE means that you can't use the previous output to guide your next operation in a direct fashion like you can with a notebook.
> With a notebook, you get the chance to load the data, view it, clean it where needed, view it again, analyze it, model it and do anything else you need to it.

In a good data-oriented IDE like RStudio you get to do all of those things and write code which can be saved as plain text and can be version controlled well under git which you can't do well with Jupyter.

R folks have to be the best indicator in this case because they have access to a good IDE and they have good support for Jupyter. Their use is overwhelmingly in plain text files in RStudio, a small portion of rmarkdown notebooks and pretty much no one user R in Jupyter.

Yes! Rstudio is the one thing I miss most when doing datascience in python.

Notebooks give me some of the interactivity but the experience degrades significantly.

The spyder IDE seem like an okayish replacement but some of the library I use expect you to have html display (within a notebook) to give you full functionalities which is not yet available in spyder.

Have you tried Orange. It has scripting capabilities.
No but, looking at some screenshot + descriptions, it seems to get me further from the code which does not seem like what I am looking for

Rstudio gives you the experience of a classical IDE + easy data exploration which I found to be productive from the exploratory stages (where I need to see my data and the effect of my code) to the clean-up phase (where I refactor my file).

as a counterpoint, plenty of R folks are pretty happy doing all of that in Rstudio
It is interesting to see this discussion about notebooks while I'm thinking about all the RStudio users who do all their work inside the IDE and are pretty happy. Notebooks seem like such an inferior tool to me. I'm also extremely bias.
Also lots of emacs users of org-mode as an awesome notebook.
I'm an R folk, and I'm even happy doing all of that in Emacs!
That kind of depends on your process. In many cases pdb (or the debugging interface in your IDE of choice) works just fine for that. It's certainly not "terrible".

After the exploration and preprocessing stage I personally don't see much benefit of the notebook model, training/evaluation and any meaningful visualization takes forever anyway, that means I need to cache and persist intermittent results. With that it doesn't really matter all too much if I work on it in vim&pdb, an IDE, or Jupyter.

Maybe thinking about the data and what your trying to do before coding might be an idea as well.
'Thinking about the data' most often requires looking at the data from hundreds of different angles, quickly investigating its properties and statistics, maybe plotting or fitting a few things, checking some hypotheses etc (all of the above code you will most likely throw out after the initial stage).

Same with the results - once you've coded something (perhaps outside of a notebook environment) and obtained results, verifying that they are what you expect is much more efficient to do in a notebook.

Maybe you use a notebook I'm completely unfamiliar with, but my experience is that they allow you to write code, run it, and save the results in cells. My IDE does all of that except the saving of partial results part, but this can be done easily by just dumping your precomputed data to disk if you can't recompute it easily. In either case, an IDE gives you get an actual debugger, plus with IntelliJ it has a great data visualization plugins, database viewer, great autocompletion, and integrates with your VCS, etc. What do you do when you need an actual debugger, or need to profile your code? What about documentation for the function you are calling? In my IDE this is a popup, in every notebook I've used, this is a google search.
I use both PyCharm and JupyterLab on daily basis, typically dealing with multi-gb datasets.

If I'm writing a library or adding new features to one, or writing tests I'll use PyCharm sure thing, otherwise the notebook is a quicker way to sketch prototypes and always have a kernel with preloaded datasets and pre-imported stuff ready at hand. I don't want to wait 10 minutes to just load the data every time I want to check if my new function works well on it at big scale. That's one of the most important bits.

PyCharm is a clear winner at actually writing code that you won't throw in the bin 10 min later, and once you know what to write.

Debugging? Don't remember ever using PyCharm despite the fact it exists... either pudb or python-devtools or something else. I'd just write tests and things start working in the process. And btw you have pdb debugger (some weak version of it) in jupyter if you really need it. Docstrings? Press tab twice in the notebook. Or keep PyCharm open on the side so you can cmd-b. Profiling? Never a pycharm builtin, maybe something like flamegraph but an external tool anyway.

> I don't get why anyone one who knows how to use an IDE would ever use a notebook,

The Python IDEs for data science are mostly garbage - if you have any recommendations, I'm all ears because I really don't like notebooks but still keep switching between jupyter and vscode depending on what I'm working on.

I use IntelliJ for all my work, data or normal dev stuff, and it works great (all is python). Maybe there is just a workflow issue here where people are used to saving their data as they go in cells. I just write my algorithms all the way through, get a subset of data to debug against, then use the debugger to help me see what mistakes I made. I always run my code all the way through and only stop at the step I'm debugging. I like this better than saving the data from previous computations because I tend to refactor a lot and would need to rerun most of the notebook anyway. Also, rerunning it all the way through a lot makes me notice slow spots more than if I only ran that area a few times and saved the results. For me, this has the effect that those areas get more attention and my code is closer to production grade than if I had used a notebook workflow. My two cents, but give IntelliJ a try if you want a good python IDE.
> I use IntelliJ for all my work, data or normal dev stuff, and it works great (all is python).

Can it show me inline plots and allow me to embed rendered formulae written in LaTex, images and video in between lines?

This is the reason people like notebooks.

I have found PyCharm to offer a good trade-off between data exploration and productionizing your code. It has the best Python debugger that I've used. You can also run Jupyter notebooks in PyCharm when that makes sense for you.