|
|
|
|
|
by clearspandex
4814 days ago
|
|
Jonathan here, Co-Founder of Zipfian Academy. While it is easy to get up and running quite easily in R for simple analysis, it is a complex language that takes years to master. I recommend learning Python for the aspiring scientist because of its breadth of applicability. While R is probably better for statistical analysis than Python (every language has its specific domain where it shines), across the entire domain of tasks a data scientist must perform, I feel that Python provides the best aggregate utility. Also, as the comments below highlight, actual statistical analysis is but a small part of the data pipeline. Python has great facilities for interacting with data stores/sources in addition to being a powerful tool to clean and munge data. >When I think about, in my data programming related work, I'd say about 5% is doing analysis or executing statistical routines. And 95% of my time is spent on finding, cleaning, and properly normalizing data. I hope the post doesn't downplay the importance of R to statistical analysis, it is a mature language with a great community surrounding it. The toolset of a data scientist is probably one of the most heterogeneous out there and necessitates learning and using many different abstractions. For such a new (and hard to define) subject, I think dialogue is crucial to constructively advance the field. I would love to hear suggestions from the HN community about how to train the next generation of data scientist, what aspiring data scientists want to learn (or find difficult to learn), and how we can build a great data community. |
|