Hacker News new | ask | show | jobs
by PeterisP 3069 days ago
Yeah, in some way it feels that there's essentially a huge software (reverse) engineering project.

We have the technical ability to read all the code in our DNA, understand what small parts of it do (e.g. making a particular protein), and model some of the small scale behavior.

And we've got a very, very, very large codebase of mishmash undocumented legacy homegrown code that sort of does what we want but in an unstable and occasionally buggy manner. And we've got a strong wish to fix some bugs (i.e. genetic diseases) and possibly add some features (e.g. longer quality lifespan, increased capabilities). So we'd like to reverse-engineer this system.

The good part is that we only have to do it once and we can cooperate on it; the bad part is that the system is really complex and (more importantly) horribly interdependent; it actually implements pretty much all the practices that we know makes code unmaintainable.

Anyway. The hypothesis I'm trying to make is that this seems to indicate that research on advanced methodologies and tools to analyze and understand large quantities of tangled (and possibly intentionally obfuscated) computer code; work techniques and algorithms for computer(machine learning?)-aided understanding and reverse engineering large quantities of code seem likely to eventually have practical applications in biotech.

Yes, contemporary code behavior is quite far from protein interaction. That's ok - we're quite far from starting to properly reverse-engineer (in this context) biotech as well; with every decade, code (and its analysis) will become more complex and biotech more understood, eventually meeting. And when designing tools for analysis of very complicated systems, the tools will anyway have to be adapted not to the systems but to the analyzer, to the limitations of what structures the human researchers can understand and "keep in their head" and what needs to be automatically summarized/structured by tools.

1 comments

> The good part is that we only have to do it once and we can cooperate on it;

I'm not a biology person but I think everyone except identical twins has different DNA which makes the problem so much harder since "doing it once" only solves one person's problem when every variable (dna) can potentially interact with every other (n^n problem where n = 3,000,000,000 potential pairs which is an insane number, granted it almost certainly has some defined structure which reduces the potential differences but that will still be a huge number) . Also, you have the whole nature versus nurture problem which makes biology even harder.

Even identical twins are not all that identical. Reading and understand the DNA is one thing, reading and understanding the regulatory network that controls which genes are expressed under what circumstances is at least as difficult.
As an identical twin we have 99.99% the same DNA. But after conception you have epigeneic factors that will start to play. Or even the fact that we don't live the same lives. For instance I like doing math for fun. My brother doesn't care for math at all really.