|
|
|
|
|
by thousandautumns
3118 days ago
|
|
pandas is absolutely terrible compared to the dplyr, data.table, or even base R for data manipulation. And while you would have been right about Python being better for machine learning a couple of years ago, these days basically every popular machine learning library in Python (Tensorflow, keras, etc.) now has an API in R. I also don't know why you are separating "traditional statistics", "predictive analytics", and "data analysis". They often are the exact same thing. In fact, it makes me wonder how much experience you have with statistics if you are under the impression that it is somehow different from data analysis "or any other variant thereof". You are right on exactly one count: Python is superior for putting data analytics into production. And that isn't an insignificant advantage. A lot of data science today involves packaging an analysis into some larger program or product, and Python is absolutely better suited to that task. But in virtually every other case (including lots of machine learning problems), R is either as good if not greatly superior to Python. |
|
I'm separating out traditional statistics as an alias for statistical inference - make distributional assumptions, test them, estimate the effect of X on y and put a 95% confidence interval around it. That sort of stuff.
It's the stuff that absolutely does not matter if you're assessing the overall effectiveness of a classifier, and certainly isn't needed in a lot of data analysis tasks where all you need are variations of counts and percentages.
For the record, my academic background is maths and statistics. I've picked up any software development experience on the job.