| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tetron 2404 days ago

I don't know if I can change your mind, or if anyone else is reading this thread, but CWL was designed to solve a particular set of problems, if you don't have those problems, you might not need it, but it doesn't mean those problems don't exist.

> the project doesn’t even provide you a dispatcher component but instead tells everyone to take a spec and write their own.

Close...

Software that supports CWL are SaaS vendors, FOSS projects, and various HPC schedulers that all have their own incompatible data management and dispatch/scheduling systems. If you want to write an analysis that runs on more than one of these platforms, you need some abstraction for it. CWL is one such an abstraction.

This matters because maybe you've developed a research pipeline that integrates a bunch of different tools written in different languages and want to run it on somebody else's data, and you need to run it on their infrastructure because copying 12 terabytes of HIPAA-restricted data from their LSF cluster to your Google cloud instance isn't an option.

"Just use bash" is what people who adopt CWL are trying to get away from. It is nearly impossible to write portable parallel / distributed analysis in bash, and the result is brittle scripts with more coordination code than code that actually does scientific work. Because CWL is declarative, the CWL engine handles all the coordination, scheduling and data staging for your particular infrastructure.

You may not have any of these needs, but suggesting that we're just bored developers creating castles in the sky is really unhelpful.