Hacker News new | ask | show | jobs
by dekhn 925 days ago
I'm familiar with security (I keep a copy of Applied Cryptography on my shelf for "fun reading") and tech, here's a copy of my whole genome: https://my.pgp-hms.org/profile/hu80855C Note it's a full human genome, far more data than a 23&Me report. You can download the data yourself and try to find risk factors (at the time, the genetic counsellors were surprised to find that I had no credible genetic risk factors).

Please let me know in technical terms, combined with rational argument, why what I did was unwise. Presume I already know all the common arguments, evaluated them using my background knowledge (which includes a PhD in biology, extensive experience in human genome analysis, and years of launching products in tech).

I've been asking people to come up with coherent arguments for genome secrecy (given the technical knowledge we have of privacy, both in tech and medicine) and nobody has managed to come up with anything that I hadn't heard before, typically variations on "well, gattaca, and maybe something else we can't predict, or insurance, or something something".

10 comments

1) You can be subject to discrimination based on your ethnicity, race, or health related factors. That's especially a problem when the data leaks at scale as in 23andme's case because that motivates the development of easy-to-search databases sold in hacking forums. The data you presented here would be harder to find, but not the case with mass leaks.

2) It's a risk for anything that's DNA-based. For example, your data can be used to create false evidence for crimes irrelevant to you. You don't even need to be a target for that. You can just be an entry in a list of available DNA profiles. I'm not sure how much DNA can be manufactured based on full genome data, but with CRISPR and everything I don't think we're too far away either. You can even experience that accidentally because the data is out there and mistakes happen.

3) You can't be famous. If you're famous, you'd be target of endless torrent of news based on your DNA bits. You'd be stigmatized left and right.

4) You can't change your DNA, so when it's leaked, you can't mitigate the future risks that doesn't exist today. For example, DNA-based biometrics, or genome simulation to a point where they can create an accurate lookalike of you. They're not risks today, doesn't mean they're not tomorrow.

There are also additional risks involved based on the country you're living in. So, you might be living in a country that protects your rights and privacy, but it's not the case with the others.

You forgot an important one: Your ancestors, descendants, siblings, and cousins share much of the same DNA but did not consent to its release. All of the above risks apply to them as well. I'd be most concerned about insurance companies using genetic family history to deny coverage.
I'm not too worried about it because it's never a 100% overlap. Even my brother and I share only ~50% DNA. It gets way sparser for more distant relatives.

About insurance companies, they're legally forbidden to use such data.

>legally forbidden to use such data.

Great training set to check the results of other factors, then use those to infer.

Moreover "legally forbidden" means jack faeces unless you can point to people who had convictions recorded and went to jail. Otherwise we're merely discussing business conditions & expenses.

I mean, of course but that’s applicable to all regulations, isn’t it? Yes, they can be violated, but what else do we have?
If you keep things secret they can't be used in a regulation breach by people who don't know those things.

We have /that/.

Theft is illegal and you lock your house, and that regulation is a serious one. The idea we have nothing but regulation is absurd in the extreme.

> Even my brother and I share only ~50% DNA.

This is completely false. Any two random humans have more than 99% overlap by virtue of being the same species. It's even higher for brothers. We also share around 90% DNA with cats, dogs and elephants.

https://www.amacad.org/publication/unequal-nature-geneticist...

> I'm not too worried about it because it's never a 100% overlap.

This doesn't make sense. If they were equal, you'd be the same person except for environmental differences. Many applications don't need equal DNAs. E.g.

https://youtube.com/watch?v=KT18KJouHWg

> About insurance companies, they're legally forbidden to use such data.

This is a very weak argument. There's a long history of companies doing illegal things, and even if it's illegal today it doesn't mean it'll be illegal tomorrow.

I think it was clear that @sedatk was referring to the 1% that separates him from other human beings, not the 99% that separates him from trees.
Yes, I thought it was clear. I certainly wasn't referring to the risk of incrimination of chimpanzees.
For one thing, this leaks a portion of the genome of your relatives, which is a clear breach of their privacy. Whether you personally deem it sensitive or not, genetic data is meant to remain confidential.
I don't believe making my genome available, which contains similarity to my relatives, is a breach of their privacy.

I think part of my point is that DNA, by its nature, simply cannot remain confidential, and that thinking we can keep it that way is just going to lead to inevitable disappointment.

First, some people extend your argument from DNA to everything and say "I believe that privacy in the modern world is unrealistic"; that doesn't make the argument applicable to the rest of us.

Second, whether DNA can or cannot remain confidential is yet to be seen, but feasibility is certainly orthogonal to whether it ought to be, which is the point at hand.

Third, whether you believe it's a breach of privacy to leak part of your relatives' DNA is besides the point. It's their decision to make, since it's their personal data and deemed confidential under most privacy frameworks, and therefore a breach.

To your first point: Yes, I generally extend my argument to more or less everything in the modern world. Put your garbage out on the street: reporters can rifle through it looking for evidence.

To your second point: we already know DNA can't remain confidential (there is no practical mechanism by which even a wealthy person could avoid a sufficiently motivated adversary who wanted to expose their DNA). That's just a fact, we should adjust our understanding based on that fact.

Most important: sharing my genomic information with the world is not a breach of any privacy framework I'm aware of and subject to (US laws). Do you have a specific framework or country in mind?

> genetic counsellors were surprised to find that I had no credible genetic risk factors

So let's assume you committed to publishing your genome in advance regardless of result. Sounds like you spun the barrel and dry snapped to demonstrate that russian roulette is safe for everybody.

Tell us about how differing views on this to yours would influence opinion about your products you've launched in tech given your extensive experience in human genome analysis. Not at all?

This really may not be a case of being unable to understand something one's paycheck depends on not understanding at all but we can't know that yet.

One non-theoretical risk is that you or a relative leaves DNA on the scene of a crime you didn't commit (or?), and this makes you a suspect. This is also assuming a real identity is tied to the DNA.
That's not the same risk because 23andme also has name, address, email.

One risk if you have PII+genome is that a technically sophisticated entity can determine if you've physically been in a location. Also with an extensive PII+genome database they could find your family, for example for blackmail purposes.

Another risk is that a health insurance provider could deny you based on potential health issues they find in your genome.

Technically, even without PII an adversary could determine that you have been in a physical place, they just wouldn't know what to call you.
Yes, but technically sophisticated entities can also use methods that require less effort.

https://xkcd.com/538/

That's your defense? You asked for actual risks and when shown real, plausible ones recede into XKCD quotes. Clearly just a spoiler.
What real, actual risks which I didn't already know about have been shown in this thread?

The point is that while you can use DNA to identify people in most cases, sufficiently motivated adversaries have more effective, cheaper, lower-technology approaches that they will use first.

Like with many things, the issue is the aggregation of data on many individuals (a database), and easy accessibility of your individual data on request (discoverability and processing).

Me shouting my sensitive private details in a crowded bar is entirely different from putting them on my webpage. There's even a difference between writing them down on a napkin or shouting them out.

Forget about it dude, the other guy's just trolling and hiding his criteria for 'real threat' so he can act like nobody's good enough.

I guess it's representative of the demographics here. Nobody capable of conducting themselves honestly.

>well, gattaca, and maybe something else we can't predict, or insurance, or something something

Sure, if you don't believe in any of the potential negative scenarios, anything goes. You could also post your full name, SSN, DOB, address, etc. here if you are secure in the knowledge that no harm could ever come of it.

I think what they're saying is that name (probably not), SSN (almost definitely), DOB (maybe?) and address (probably) have known, confirmed risks. There are current ways that bad actors can abuse that information.

Genome is still pretty theoretical, except getting caught for committing crimes.

I just checked, and using my True Name (https://en.wikipedia.org/wiki/True_Names) I can easily find my DOB, prior addresses and phone numbers, and using that information, it's likely I could make a reasonable guess for the SSN.
it's likely I could make a reasonable guess for the SSN.

It is? I mean then why are we bothering to protect anything, this shit is all super available for any given person.

SSNs are fairly predictable- if you know region of birth and DOB you can get awfully close, for a wide range of the population.

https://www.pnas.org/doi/10.1073/pnas.0904891106

Konerding's 12th law, amended: "There is no bit of pseudonymized data which cannot be de-anonymized by a sufficiently motivated MIT grad student" (not entirely joking; see https://archive.nytimes.com/bits.blogs.nytimes.com/2015/01/2...)

I think we already know for sure that posting a combination of full name, SSN, DOB, and address is a reliable way to provide scammers with the necessary information to commit fraud.
The question is, what are the potential negative scenarios.
Fully agree with you here. I can understand why people argue "We must do everything possible that no human being ever finds out anything medical-related about another human being, ever"

But that is a value judgement, and I believe it is one that comes at a great cost to society- I wouldn't be surprised if >50% of the cost of medical care is directly or indirectly due to this attitude, and that medical progress has been slowed immensely for the same reason.

If we could make medical data more open, it would greatly benefit the vast majority of people. OF COURSE it is true that some smaller number of other people/patients are helped by the existing medical secrecy system. I fully admit this is a trade-off, where we have to decide what values are more important.

(source: Am medical doctor)

This is disgusting. You want people knowing the maladies they got treated, and how?

There's the old saying of knowledge being power. If you want this information about people being spread, then you're advocating having power over these people over that information.

It takes very little imagination to see how humans would misuse this data.

it's a tradeoff

I'm disgusting for "people having power over other people", you're disgusting for the graveyard of dead people due to the status quo system.

Why do you think people are entitled to have genome data on you? The morality is flipped. Privacy is recognized as a core, natural right. Others have to prove their onus for wanting your biological data. Trusting others is a moral and character weakness, because you have no guarantees as to how that data will be used. Or more specifically, what new ways to analyze and take advantage of that data will become.

I think actuaries will care an awful lot about this data and could use it to negatively influence your risk factor, and thus insurance premiums.

I think if your prior includes "trusting others is a moral and character weakness" then I don't think it's useful for us to discuss this topic further.

As for actuaries, in the US, the GINA law prevents health insurance companies from using this data. I think legal protection is much more important than attempting to hide my DNA.

> I think if your prior includes "trusting others is a moral and character weakness" then I don't think it's useful for us to discuss this topic further.

I agree, if you can't justify trust with reason then it's hard to trust your argument that relies on trust. Trust can be broken, and your stance doesn't address that concern.

While I hold privacy in high regard, your standpoint on trust is pretty extreme.

With your own "trust can be broken", you could conclude that you should distrust "with reason" (hey, it was broken) — basically, flipping it is an equally sound stance.

As a rule, I trust people, keep private stuff not easily aggregated (eg. I might talk some stuff over lunch, but will not email it to the person so they have it on record), and I am quick to distrust people once they fail me. Legal protections do matter, because they discourage misuse of unintended data sharing.

The law could change, allowing the usage of your data without your consent.
Where is it stated exactly that privacy is a core, natural right? Not in the Constitution, though the 4th suggests it. It’s not part of the natural order, I don’t think (most stuff is out in the open). I’m not saying I think privacy is bad or people deserve to have their info out in the open, I just don’t understand why people feel such a right to it, or where governance — natural or man-made — dictates it.
They could also use it to positively influence my risk factor.
I'm gonna start making clones of you.
I'm fine with that, but merely having my genome sequence doesn't enable you to do that.
Wasn't your original argument that they could easily get your genetic material (to figure out the genome from) anyway?

Would a bunch of your cells be sufficient at some point in the near future? (I know progress is being made to turn any cell into a reproductive cell, but that's still not exactly the same thing, but it's on that exact path)

You still might not mind a bunch of your clones though, so I don't think that's much of an argument.

Generally, being pseudo-anonymous is what allows open and free discussion (but lots of vitriol too).

While genetic information is not yet understood well enough by masses to be abused in stereotyping and rejecting and — indeed — "cancelling", there is a huge potential to do so. This especially holds true for gender, racial, national differentiation, genetic disease potential and health profiling — all accessible through a full genome (even if some of the indicators are not with 100% confidence). Lots of this can also be used to start linking genome data to an actual person (helped with data from other contexts), which is where it starts to become risky according to known risk profiles.

Unsurprisingly, someone who is likely a white male (I could have checked using your genome too, but loading up your profile above confirms that) with "no credible genetic risk factors" is a lot less concerned about opening up their genome to the public: you are unlikely to get discriminated against. With that said, even you can get potentially ignored for your privilege: even I just engaged in that — somewhat discounting a part of your experience/claim because you are a white male. Part of that is also education: your extensive experience in the field allows you to make an educated choice. Many can't attain that much knowledge before they decide whether to share their genome or not.

This opens up the question similar to that entire face recognition fiasco — how will unprivileged be affected by the privileged being mostly used to train the models on and do research on?

So the question is how do we ensure enough anonymity to make everyone happy to contribute to the world knowledge, but reduce chances of linking data back to actual people? I know nebula.org is doing something of the sort (though mostly just guaranteeing that they will remove the data at your request, and not share it without your permission), but we could have one genome produce a bunch of part-genomes, still allowing causation/correlation research, but none of them having the full picture.

That would disable some of the groundwork research (is there a correlation/causation only visible in the full genome or larger part of it?), so it's a tricky balance to find.

And finally, I always like to make this choice a bit personal: how would you feel about your child being linked to a criminal case due to your genome being publicly available?