Hacker News new | ask | show | jobs
by knight-of-lambd 3069 days ago
> Turn off just 1 gene for a disease, and 1). it won’t do anything because you didn’t turn off the other 199, and 2). oh wow that gene was actually used for something else and now you’ve lost the ability to form eyeballs / are born without anything in your eye sockets.

Certainly not on the same scale, but this resonates with my experiences with legacy code.

I think this similarity is more than superficial. Energetic systems evolve over time to become tangled, correlated messes, without some other force counteracting this tendency (ie. refactoring). I wonder if DNA has analogous mechanisms.

4 comments

Yeah, in some way it feels that there's essentially a huge software (reverse) engineering project.

We have the technical ability to read all the code in our DNA, understand what small parts of it do (e.g. making a particular protein), and model some of the small scale behavior.

And we've got a very, very, very large codebase of mishmash undocumented legacy homegrown code that sort of does what we want but in an unstable and occasionally buggy manner. And we've got a strong wish to fix some bugs (i.e. genetic diseases) and possibly add some features (e.g. longer quality lifespan, increased capabilities). So we'd like to reverse-engineer this system.

The good part is that we only have to do it once and we can cooperate on it; the bad part is that the system is really complex and (more importantly) horribly interdependent; it actually implements pretty much all the practices that we know makes code unmaintainable.

Anyway. The hypothesis I'm trying to make is that this seems to indicate that research on advanced methodologies and tools to analyze and understand large quantities of tangled (and possibly intentionally obfuscated) computer code; work techniques and algorithms for computer(machine learning?)-aided understanding and reverse engineering large quantities of code seem likely to eventually have practical applications in biotech.

Yes, contemporary code behavior is quite far from protein interaction. That's ok - we're quite far from starting to properly reverse-engineer (in this context) biotech as well; with every decade, code (and its analysis) will become more complex and biotech more understood, eventually meeting. And when designing tools for analysis of very complicated systems, the tools will anyway have to be adapted not to the systems but to the analyzer, to the limitations of what structures the human researchers can understand and "keep in their head" and what needs to be automatically summarized/structured by tools.

> The good part is that we only have to do it once and we can cooperate on it;

I'm not a biology person but I think everyone except identical twins has different DNA which makes the problem so much harder since "doing it once" only solves one person's problem when every variable (dna) can potentially interact with every other (n^n problem where n = 3,000,000,000 potential pairs which is an insane number, granted it almost certainly has some defined structure which reduces the potential differences but that will still be a huge number) . Also, you have the whole nature versus nurture problem which makes biology even harder.

Even identical twins are not all that identical. Reading and understand the DNA is one thing, reading and understanding the regulatory network that controls which genes are expressed under what circumstances is at least as difficult.
As an identical twin we have 99.99% the same DNA. But after conception you have epigeneic factors that will start to play. Or even the fact that we don't live the same lives. For instance I like doing math for fun. My brother doesn't care for math at all really.
> I think this similarity is more than superficial. Energetic systems evolve over time to become tangled, correlated messes, without some other force counteracting this tendency (ie. refactoring).

Well... yes. That's entropy. Any system tends towards disorder.

> I wonder if DNA has analogous mechanisms.

DNA works differently, since there's an advantage to reusing code that naturally leads to spaghetti. Although it's not about minimizing energy- just that it's more likely to get successful code by adding onto existing code than adding a whole new section (unless you transclude it from bacteria/viruses).

There's also abstractions in nature, a non-spaghetti form of code-reuse - a natural outcome in code, when the same thing is reused several times. e.g. cells form the "atoms" of a body. They are very similar, but specialized. Many internal mechanisms are the same. Metabolism. DNA itself and transcription is reused. It's not much, but we may find other abstractions, currently hidden, if they have been reused many times.
there are also MASSIVE spans within the human genome that are copies and copies and copies and copies.
Yes, both are true.
There's fundamental limits on understanding computation, where the only way to predict what a system will do is to run it.

Is the unmaintainability of legacy code at all related to this? Is the impenetrability of DNA at all related to this?

Though this is a solved problem for our field, if you're using a statically typed language.

You can't right-click on a gene and "find all references".