|
|
|
|
|
by kzuberi
1304 days ago
|
|
I also found the quality & proliferation of data pipeline tools to be baffling. Somehow always more painful to put these together than it seemed like it ought to be. At one point we wrote an internal tool (I think lots of organizations do this, since all the 100s of existing tools somehow don't fit, so you invent #101) and while it was tremendously satisfying getting batch jobs with 1000's of cpu's churning away, that kind of data infrastructure needs to be standardized. I think some companies are doing this, e.g. saw a presentation about Arvados/Curii that seemed interesting (but haven't used it so not sure). Maybe CWL will turn out to be the way forward here? |
|