| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by extr 1879 days ago
	Yeah, I find that the out-of-order execution issue is common with people who have a software development mindset, but for data analysis/science is basically the only sensible way to work. The "load data" command might be one line but takes 3 minutes to run, while a huge chunk of code that plots the data might take 1 second and I might want to tweak it 50 different ways before settling on something that I like/delivers insight. Producing a standalone script that develops the same insight you get from "playing" with the data is an afterthought in some cases.

2 comments

disgruntledphd2 1879 days ago

As long as you're aware of the dangers, it's fine. Personally I try to model offline from analysis to avoid this issue, and set eval to no in org for those cases where I've built the model inline with the analysis.

Unfortunately, it generally takes a couple of terrible situations before people learn the problems with this.

link

hervature 1879 days ago

I agree that data analysis needs a tool to persist data while iterating over certain functions. But in this vein, said tool should aim to try to prevent the user from having to run the load_data() function more than once. Not encourage it by allowing someone to permanently manipulate the output of load_data().

link

disgruntledphd2 1879 days ago

This is an option in many tools, but it doesn't tend to work that well in practice.

I do agree that this is the ideal though (As an example if Pluto is always reactive, then this workflow becomes much more difficult as when you change a downstream datapoint, the model will be re-run).

link