Hacker News new | ask | show | jobs
by vegabook 4223 days ago
Python's pip is pretty good though not quite as polished as CRAN. I have had few problems running complex code from third party sources, though one always has to be aware of the Python 2 v 3 "problem" (though it is diminishing now with most things available on 3). If you get pip up and running on a new Python installation you can avoid Anaconda/Canopy if you want a clean installation, and I have installed fairly complex Python setups in multiple locations without too much trouble. Let's be fair, R can also be tough if it calls a lot of third party libraries. Just try to get rJava working properly for example if the local R and Java installations are not both 32 or 64 bit. It can be a complete mess to disentangle this sort of stuff in R. Or for example running code that uses Cairo, on a mac. My experience is that Python's poor package management reputation is not really deserved anymore. Python's virtualenv also allows you hermetically to seal away an entire python environment, including its libraries, so that it will not conflict with other python environments that might have different versions of the interpreter and/or libraries. I am not aware of anything this robust in R.

Reproducible computing? The ipython notebook is awesome, though I am not sure if there is anything as good as knitr if your workflow is LaTeX oriented.

R "hands" will usually find Python a backward step when it comes to vectorized data manipulation, but its a forward leap if your data becomes too big or if you have to step out of the comfy environment of exploratory analysis into any form of (even trivial) production settings.

And no you definitely do not need HDF5 to effectively use Pandas.

3 comments

The closest equivalent to virtualenv for R is packrat: http://rstudio.github.io/packrat/. It doesn't (yet) support different R versions for different projects, but that's on the roadmap.
Yeah packrat is great! It is a really important package which has greatly increased my willingness to use R in production.
Ok that's good to know. Sure, R breaks inexplicably sometimes due to dependencies, no doubt about that.

virtualenv sounds useful. Is it used much when python code is published in a paper?

About HDF5: I was just making the point that the Pandas docs recommend I install Anaconda to get Pandas, thus also installing HDF5. I am sure there are other ways, but the way the documentation is phrased suggests that these other ways are overly difficult.

I'm just learning Python to do some data and graph ananlysis experiments. Should I go with Python 2 or 3?
You are strongly encouraged by the Python powers that be to move to 3, and I have only in the past few months begun to agree with them, and that is because some serious standard libraries like asyncio are now only available on 3. It's (finally) the future. However a big caveat is that if you're learning Python, most of the sample code you will find on the web will be 2-based and will not work well under 3. It's not so much the print statement, but range() works subtly differently too now (return a generator not a list - too subtle for beginners to properly understand in my view) and unicode strings can break older code too. Just be aware of these things and move to 3 is my (51/49) advice, but this is a controversial point and others will have differing points of view.