| HN Mirror

The DDA pipeline goes like this: ThermoRawFileParser -> Comet -> [a bunch of OpenMS tools] -> Percolator -> [custom quantification stuff]. The data is mostly derived from chemoproteomics experiments where you have isotopically labeled control and compound treated samples that are enriched with some probe. As a result, we work a lot with ratios and have to differentiate the scenario where your compound completely blocks probe labeling/enrichment or it's just stochastically missing due to the nature of DDA. For TMT it's pretty similar. I'm working on DIA as well though it turns out there's still quite a few challenges there for our particular use case.

To answer your broader question about the general need for some structured pipeline or workflow orchestration.. That comes down to volume of data (we do screens as well as one-off studies) and a desire to reduce human involvement as much as possible. So the goal is to have raw files be immediately picked up, processed, and loaded into on internal application where it can be queried and interesting data can be highlighted. During my PhD, this was also a goal of mine (and I have at least two github repos where I got close) but it was definitely less of a priority since actually doing experiments and downstream analysis was the limiting factor.

PS: if you want to talk off-HN, I should be your latest stargazer