Hacker News new | ask | show | jobs
by rch 4606 days ago
First, I should note that my needs are fairly specific, and not typical of the rest of the NGS world. The datasets are essentially the same though.

The rate at which we are acquiring new data has been accelerating, but each of our Illumina datasets is only 30GB or so. The total accumulated data is still just a few TB. The real imperative for using MR is more about the processing of that data. Integrating HMMER, for instance, into Postgres wouldn't be impossible, but I don't know of anything that's available now.

Edit: A FDW for PostgreSQL around HMMER just made my to do list.

1 comments

So is it fair to say it is an "ease of use" use case?
Is that the same as an 'impossible to do otherwise' case?

Edit: I should say 'currently impossible' since as I noted, I can imagine being able to build SQL queries around PSSM comparisons and the like. I just can't build a system to last 5+ years around something that might be available at some point.

Since I can't reply directly- agreed :)

That comment was based on your "into Postgres wouldn't be impossible" phrase.

No fair enough of its the only way you can get things to work. I still see lots of people jumping on the "big data" bandwagon with very moderate sized data.