Hacker News new | ask | show | jobs
by ktpsns 3048 days ago
I make the claim that you can go very far in the SciPy ecosystem without ever touching R.

It is worth understanding the concepts of numpy and pandas. Furthermore, try out IPython/Jupyter, especially for rapid publishing (people run their blogs on jupyter notebooks).

I think certain libraries depend very much on where you focus. Machine learning? Native language processing? Visualization? Something in economics? Fundamental sciences? For instance, I never need NLTK in theoretical astrophysics ;-) Instead, I need powerful GPU based visualization, which is however very old school with VTK and Visit/Amira/Paraview (also very much pythonic).

3 comments

I disagree, even though python is the language I do most of my development in. But it probably depends on the problems we're thinking of a data scientist solving.

If you're doing a lot of work with matrices, model fitting in production, then python seems fine. However, a lot of data scientists I see are more like scrappy data analysis / visualization types, who are churning out small dashboards. In that case R's tidy verse and shiny are just incredibly fast to develop with.

I second that R is nice to have, but not needed. I’ve been doing science in Python for a decade without ever needing R.

For powerful GPU viz, have you considered vispy? Four authors of four independent Python science visualization libs got together to build it.

Agree, I would drop R, Python has you mostly covered now. Julia is also worth learning.
I wouldn't be recommending to drop R at all.

Very few enterprise data science teams are 100% Python (in fact none I've heard of). R is still very heavily used (and in fact all data science teams I've worked in it has been the dominant technology).

There is a reason Microsoft purchased Revolution.

R, python and Julia are all Turing-complete languages, so of course you can drop any two and get by with just the third.

The real selection happens when you consider what's available in opensource world. What code you don't have to write? What high-quality libraries are available vs which ones you will have to write yourself?

On this topic, R has vast advantage over python in some domains, such as bioinformatics for example, while python definitely shines when it comes to deep learning (and using for loops).

You can't just claim that one shouldn't look at R because you personally know one language better the other, quite likely because in your domain it's not being used as much.

I do prefer the deep learnin, NLP and production serving story in python, but you will have to pry dplyr+ggplot from my cold dead hands for quick analysis and charting. Not to mention that pandas's API is a clusterfuck compared to R's native data frames.