| More accurate title would be fit in RAM of single machine. Maybe some bonus category: 0. Spreadsheet is all you need. 1. Python script is good enough. 2. Java/Scala is way to go. 3. Need to manage memory (gc doesn't cut), some custom organization. 4. Actually needs a cluster. |
I HATE when people use Spreadsheets to do anything besides simple math.
http://lemire.me/blog/archives/2014/05/23/you-shouldnt-use-a...
TL:DR your work is not reproducible and we can't see what you did to get to your numbers. A million examples of why this is bad.
Also
> 1. Python script is good enough
You mean Python with pandas and numpy?
I use R which is also a great choice
> 2. Java/Scala is way to go.
For you but the vast majority of Data Scientist don't use either and their choice for people is not universal. Julia looks like a great new comer. I again mainly use R.
> 3 & 4 are good points.