Hacker News new | ask | show | jobs
by snidane 902 days ago
Out of core computations. While your python and R script will choke after reading few hundred megs, my compiled binary cli will keep streaming through many such files with memory usage sitting somewhere near zero.
1 comments

That’s just the effect of streaming IO vs reading in the file into memory all at once. That has nothing to do with the language you use, but how you process the data.

I keep multiple little Python scripts around to do things like sum lists of numbers (think extracting a column with awk, then calculating a sum). Compiled vs an interpreted script really doesn’t matter. What matters is using the right algorithm for the job. R and Python data science libraries like to read in all of the data at once into one single data structure. That’s the anti-pattern to avoid if at all possible.

(But they are very handy for small datasets of complex calculations that require the entire dataset in memory. )