Hacker News new | ask | show | jobs
by mason55 3489 days ago
And the idea here is that you have a large parallelizeable problem but you don't need the Hadoop ecosystem? To put it another way, why not just use EMR?
1 comments

Yes, no Hadoop.

Quite a lot of life science work is stand-alone programs that are domain-specific and read and write flat-files.

This. As someone who works on bioinformatics workflows one of the more difficult aspects of my job is to try to explain to other software folks why we do what we do. While scheduling is a solved problem the issue is that you're dealing with a ton of command line tools expecting POSIX filesystems and often not involving parallelization
Also every legacy mainframe Cobol batch system on the planet.
Will AWS Batch support the running of COBOL batch programs?