Hacker News new | ask | show | jobs
by consz 4223 days ago
Totally agreed. I do model analysis on data sets with 200k-5m rows and anywhere from 500 to 20k columns. I originally started doing my work in R, but about two years ago, python started improving rapidly for heavy data analysis, and at the moment I'd say it's a clear winner.
1 comments

For that kind of data or larger, I would avoid R and Python and move to writing my own algorithms or try out something for more heavy duty analysis such as Mahout or Spark. R and Python are still one box and memory constrained.