Hacker News new | ask | show | jobs
by 883771773929 2822 days ago
I'd really like to know if there's a GNU-like project for the human genome.

I do understand that the 'value' in DNA testing is largely the ability to follow the social graph of inheritance along with phenotypical metadata. This is pretty much at direct odds with privacy even for the strictest of 'anonymizing' setups considering that the greatest context-dependent 'utility' such as curing a truly unpleasant genetic disease or persecuting an ostracized person or group is due to identifying and understanding highly anomalous genes, phenotypes, or linkages of interest that come down to damning specificities.

There is one case I can think of that I _might_ be willing to participate in the "Human Genome Revolution", and at first thought it seems to me to be useful to both those that submit their samples and to humanity as a whole: a 'simple' counter of every allele across all DNA samples uploaded. However, I would appreciate feedback as to what are the gotchas here that will be abused, because I'm positive I have not thought such a system through at all and the Monkey's Paw [0] will grant my wish. :(

It would be a project that requires three components listed in increasing importance:

1. Cheap commodity DNA sequencer that can export its data in a free format. 2. A network protocol for uploading DNA alleles to a swarm of peers that archive and distribute the total counts. 3. A probablistic anonymizing data structure that combines something like a generalized crypto accumulator[1] and a zero-knowledge negotiation process between the network and a client for incrementing the common counters above a fairly large threshold such as maybe several thousand or million submissions. Perhaps uses a modification of something like the approximate counting algorithm[2] in a game setting.

[0] https://en.wikipedia.org/wiki/Monkey%27s_paw [1] https://en.wikipedia.org/wiki/Accumulator_(cryptography) [2] https://en.wikipedia.org/wiki/Approximate_counting_algorithm