Hacker News new | ask | show | jobs
by blopker 2593 days ago
If anyone is interested in playing with a full 23andMe raw data file (VCF), I have mine on GitHub: https://github.com/blopker/DNA PRs welcome!

If you're also interested in working on this stuff, shoot me an email ;) blopker@23andme.com

6 comments

Keep in mind companies like 23AndMe and Ancestry typically only provide a small fraction of your genome (the parts we currently consider most important, which is a moving target). If you want your whole genome you'll need to go with something like Dante Labs (~200-300USD during sales).
For academic use, UK Biobank, 1000 Genomes and other resources offer variants for large groups, my wife included.

You may (at your own risk) take a look at

https://opensnp.org/

Very cool, thanks for the resource. I see I can upload my own data there too. Probably a more useful place for it than GitHub!
I'm not an expert but AFAIK VCF isn't "raw" data. Raw data would be the output of a sequencer (fastq) which would be several gigabytes. I recently processed raw data from sequencing a tiny virus (~20k base pairs) and it was around 13GB. Human genome sequence data would probably be tens if not hundreds of GB.
23andMe does genotyping not sequencing
Ah.. okay. I didn't know that. Yes that makes sense now that I think about it. They wont be able to do a full sequencing at their current price.
Sorry, but that's not a vcf. It's a tsv of genotypes. Here's the spec for VCFs: https://samtools.github.io/hts-specs/VCFv4.3.pdf
How far are companies like 23andMe from entire genome sequencing? That's kind of what I'm waiting for. Can you still get valuable data from genotyping?
You can get valuable data from genotyping. SNPs contain the bulk of variation between you & me.

For your first question it depends on your definition of "companies like 23andMe". There are numerous companies that'll do a whole genome for you, but I don't know if any of them do the writeup about it that 23andMe provides. 23andMe did at one time offer an exome product, but stopped that a while back.

The largest hurdle is cost. Whole genomes, even exomes, are significantly more expensive than a SNP chip. As most would be users don't know enough to care it doesn't make much economic sense to offer those to the masses at the moment.

Actually about half of variation is private (not common) and commercial services will only look for common SNPs. So you will have some unique variants that would show up in a whole genome but not a SNP test.
True. I was trying to say that your average lay person is unlikely to know the difference enough to be a big deal
Veritas Genetics offers full-genome sequencing for 1000 USD. I purchased their kit for 200 in a limited-time offer. Unfortunately when I sent it in early February, there is a backlog which is delaying my results until late this summer.
I hope you got all of your kids and relatives to sign off on that, if you have any.