Hacker News new | ask | show | jobs
by yawnxyz 638 days ago
I’ve been a bioinformatics engineer for a while.

Publishing is narrative building, and wades through lots and lots of experimental data. Most of those aren’t super interesting.

It’s like let’s say you record everything that’s ever been said around you for a few weeks. Lots of audio.

Then you set out to tell a story about how those weeks have changed you. You find quotes and ideas that shape your narrative. And let’s say you have a good one! And you publish it, and people love your story. You have a few polished sound bites. They support your story.

What about all the other audio? Some would say that’s valuable data, others would say it’s noise. It would also potentially take 10x the work to clean and publish, and without a narrative tied around it, you would have a hard time making it useful. Most of it would be literally noise.

(Unless your Diddy and store all the tapes and then one day FBI comes knocking. Unfortunately the NIH and FDA doesn’t have their own federal enforcement arm)

1 comments

You don't have to post the noise. If you are publishing a paper, you already have a solid experiment in place. What is necessary is a way to reproduce that research, and the final dataset used is an important piece of the puzzle. Of course, if the changelog of the data exists, that might be useful, just to see if the authors are modifying the data to cherry-pick the results they are publishing.