| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by a_bonobo 864 days ago

Yeah that's how it ended up for me: large datasets get churned through for speed in Python, but I then usually switch over to R with the summary data because there's just way more biology-specific ecosystem in R than in Python.

R/Bioconductor has packages for human genome-specific analyses so it's easy to download gene positions etc., there are packages for read simulation, amplicon sequence variant detection, gene distance simulations, any kind of RNAseq analysis you can think of... none of these packages exist in Python. If you'd rerun it in Python you'd save 10 minutes or hours of running time but you'd lose days or months re-implementing analyses that are in R packages (plus those R packages often call on C++ code, anyway)

plus ggplot2 is miles ahead of any plotting in Python (to me :) ).

1 comments

eyegor 864 days ago

To your last point, have you tried plotnine? It's meant to be ggplot2 for python.

https://github.com/has2k1/plotnine

link

a_bonobo 864 days ago

I've looked at it yes!! but there's heaps of ggplot-based libraries that I also like to use; things like cowplot, ggsignif, ggtree etc. It would be a cat-and-mouse game for plotnine to keep up with the ggplot-based ecosystem!

link