Hacker News new | ask | show | jobs
by penciltwirler 1298 days ago
Nicee, but I feel like really the only thing you need to know as an eng is DNA -> RNA -> Protein. Sometimes RNA -> DNA via reverse transcriptase. Everything else is just normal Python scripting.
7 comments

Oh no. A major flaw that kills protects; to run a valid statistical test you need to understand the underlying reality of the data. Otherwise you just run tests until you find “something”.

How do you handle one genomic variant affecting dozens of different rna transcripts and isoforms? How do you handle tissue-specific expression? LD haplotype blocks? Frequency across populations and reference choice? Sample handling affecting read depth? Mixed direction of effects in phenotype-genotype? The critical (and beauty IMO) feature of bioinfo is requiring an understanding of how your dataset can rarely be considered clean and as simple as _observation name_ and _observation value_. To succeed it is usually critical to know a lot about the observation meta data which is not collected in the dataset. Hopefully in the future it will be better curated and less esoteric.

...no. There is more to genomics than python scripting. This is widely incorrect assumption.

A new generation of bioinformaticians and computational biologists are using rust, go, and the web to create, share and deliver.

Checkout nextclade.org

I’m a biochemist + software engineer, and while I understand where you’re coming from, IMO that’s a very harmful/self-sabotaging attitude.

As soon as you start touching science, everything is important.

That’s what I thought too until I learned about

- the dna that doesn’t code for proteins but makes up the vast majority of human dna

- the intron regions of genes that are translated into RNA but then sliced out of the RNA and not transcribed into protein and are 5x larger than the coding parts

Those two things alone are absolutely critical to understand to interpret a genome sequence. Of course there is much more.

You do know that there are things like epigenetics, DNA repair (using specialized proteins), RNAi, post-translational modifications, metabolites (just to name a few)?
Sooner or later you'll have to learn all the other stuff in the linked page: file formats used only in genomics, structural variants, NGS, evolution, regulation, polygenics, etc.
Who knew complex large polyploid genome assembly (i.e. sugar cane) was just a matter of python scripting?