Hacker News new | ask | show | jobs
by anacleto 4104 days ago
I used Excel to do some basic data analysis tasks to see whether it is a reasonable alternative to using a statistical package for the same tasks.

Excel is a poor choice for statistical analysis beyond textbook examples, the simplest descriptive statistics, or for more than a very few columns. The problems I encountered that led to this conclusion are in four general areas:

[1] Missing values are handled inconsistently, and sometimes incorrectly.

[2] Data organization differs according to analysis, forcing you to reorganize your data in many ways if you want to do many different analyses.

[3] Many analyses can only be done on one column at a time, making it inconvenient to do the same analysis on many columns.

[4] Output is poorly organized, sometimes inadequately labeled, and there is no record of how an analysis was accomplished.

'Here’s an example of how the numerical inaccuracies in Excel can get you into trouble.'

http://pages.stern.nyu.edu/~jsimonof/classes/1305/pdf/excelr...

As someone previously mentioned here 'RStudio' is great. I would also add R-Project for Linux distribution.

Excel is a wonderful tool for many things. Statistics is not among them.

P.S. Happened a couple of times that articles posted on 'Journal of Econometrics' was denied (even by college students) for errors due to Excel.