Hacker News new | ask | show | jobs
by throwanem 1838 days ago
Software changes over spans of minutes to decades; genomes change over spans of millions of years. Software is written; genomes are not. The complexity of software is constrained by programmers' ability to comprehend it; the complexity of genomes is not. The environment in which software functions is determined by humans; the environment in which genomes function is not.
4 comments

Those are trivial surface level differences relative to the central idea of encoding, storing, replicating, editing digital information, which interfaces with other digital and analog systems.
Not that there's much point to saying so, since you appear to be here for no other reason than to assert that my argument is false because you would prefer it be so, but here's another: software is digital; genomes are not.
FWIW all of these differences still feel extremely surface level. I'm no expert but I certainly am, so far, aware of everything you've said with regards to how they differ - I'm kinda hoping for more, given the strong assertion you made that one can not relate the two without being fundamentally ignorant of either topic.

I also think it's somewhat ironic that you're accusing them of only being here to say "you're wrong" but that's what you've done in this thread? I only bring this up because I think we're all after the same thing here - to understand an incredibly interesting topic.

I suspect most of us are really here to learn and discuss. You seem like you have a background in the area, I'm sure we would all benefit from learning about the differences.

If it's the case that the similar is that DNA and code both encode information, and the differences are based on how they do so, it's hard to see why you think they can't be related at all. You've been relating the two.

If I've given the impression that the difference is merely a question of varying encodings, then I have to agree my arguments have thus far been lacking.

The idea that a genome as expressed in nucleic acid is purely, and only, an informational medium, is fundamentally in error. It does encode information in the sequence of base pairs, this is true. But it is also a physical structure in its own right, and properties of that structure incidental to the encoded information have what recently looks to be at least as important a role in the process of transcription as the sequence itself.

There are, for example, some sequences which will cause a ribosome to transcribe the surrounding genes differently or with varying frequency, due to the physical interaction between the molecules involved. (I recently discussed this here in the context of recent research on causes of eye color; it should not be too far back in my comment history.) We also see, for example, that both viral and eukaryotic DNA can be and often are transcribed in ways that produce different proteins from the same sequence, again as a result of physical constraints affecting the interaction with the ribosome. This is one reason why "junk DNA" is a bit of a misnomer, and why we more recently see the term fall out of use in favor of "noncoding DNA" - these regions carry no information in their own right, but nonetheless can strongly affect the outcome of transcription because transcription is not only an informatic process. This isn't true of software; there is no general case in which two programs varying only in nonsyntactic ways will be evaluated differently under otherwise identical conditions - we create programming languages as we do in part to ensure that won't happen, and it's also part of the reason why we use transistors instead of vacuum tubes or relays: in order to engineer that kind of variance as much as we can out of existence. What is therefore an accidental property in software is an essential one in gene expression, and cannot be overlooked without reaching an inaccurate conception of how the latter process works.

That's just one example, and it's true that processes like these can be modeled in software to variously imperfect degrees of fidelity and that information-theoretical models can be useful in understanding some aspects of how they work. But that's not the same thing as them working similarly enough that understanding one very well suffices to reason about the other. I definitely can see how it's easy to assume otherwise! It's an assumption I shared, before my own yearlong exposure to the field at a sufficient level of detail to start to understand what I hadn't understood about it before, and considerable reading and study thereafter.

Unfortunately, I was there to provide engineering support to people doing that work, not to do it myself, and the knowledge I've derived from that experience apparently does not extend so far as producing a concise and positive statement of the fundamental difference between the two fields of study - I spent considerably more time teaching informaticists how to program, formally and otherwise, than I spent learning about bioinformatics. That leaves me able to recommend little beyond seeking out similar experience of your own, which I do recommend if the depth of your interest suffices -although I do also have to say working in academia as a nonacademic has very little else to recommend it.

I know there are some folks on HN with formal knowledge and training greatly exceeding my own, and some of whom have probably also had experience teaching the basics in an accessible way. Perhaps one of them might give a more useful answer here than I've been able to.

>some sequences which will cause a ribosome to transcribe the surrounding genes differently

Not to be a negative nancy here, but if we're being precise, ribosomes do not transcribe. They translate.

Under the fairly reductive central dogma of biology: DNA -> RNA (Transcription) RNA -> Protein (Translation)

Transcription and translation are separate mechanics that don't occur in the same area of the cell, and both use very different complexes to mediate the rates of each in different physical environments.

I don't disagree with any of the substantive points being made, but I think the proper terminology only adds to your argument so I found it strange that it was left out.

It's one of the drawbacks of being an autodidact; I pretty much always have to check to be sure I'm not confusing these two similar terms, and I didn't stop to check this time. Thanks for the correction.
Thanks, this was much more interesting to read, and educational for someone with a software background, which I think kind of goes to show that discussing analogs is actually a reasonable way to approach the unknown :)
Again I agree with you, because I had a similar experience. But, again, my conclusion is different than yours.

You write that we should not talk about biochemistry as computation, as far as I understand. Instead I'd say that we have not studied enough how nature does computation without programmers or even human friendly semantics.

Is still computation, involving space and physics. Too complex to efficiently simulate it (for now) but not big enough so that the emerging behaviour is simple, like for a gas.

ribosomes don't transcribe genes.
Genomes are absolutely digital. GATC is no different from 1 and 0. It's just using a different base (pun intended).

Files on disks have end of file markers, just like the start and stop sequences in DNA. Operating systems have cron jobs (themselves digital) that control when other programs execute.

You mean "DNA sequences are digital" in that base pairs map to a sequence of enumerations.

However, genomes aren't digital. They're 3D structures with a ton of attributes that are not trivially representable digitally.

In the same way software code is digital but the hard drives that hold them are not?
Genomes are much more than just their sequence. Their spatial organisation, their methylation, their fiolding, their packing etc, have no equivalents in a filesystem.
You're talking about a digital<->analog interface. Take a digitally encoded audio file, read it out and turn it into sound waves using a digital analog converter, play it out on physical speakers, record it back with a microphone, use that information to control a robotic arm with a magnet that will swipe over the physical medium... etc. They are absolutely analogous.
> software is digital; genomes are not

False by definition: Digital data is "information represented as a string of discrete symbols each of which can take only one of a finite number of values"

https://en.wikipedia.org/wiki/Digital_data

I agree with you here but I get to a happy conclusion. The (self- or culturally imposed) constraint on computation to be semantically meaningful for humans does not apply for genomes. But this is already useful, because it means we at least have a hint about where to dig more in programming.

There is Theory of Computation and there is Theory of Programming. Your arguments apply to TOP but not to TOC.

https://pron.github.io/posts/what-we-talk-about-when-we-talk...

This all seems like minor differences.

Plenty of software is neither written nor comprehensible I can assure you of that.

Like I don't think your necessarily wrong, but pointing out the literal differences between the two topics doesn't explain to me why the analogy is wrong and therefore doesn't support your argument.

It's like saying "I'm nothing like my mother; I don't even have long hair"

I think the environment is the confounding factor rather than programmer working life-span.

An OS is just so much simpler than dynamically constrained energetic replicators in an always and everywhere collapsing wave function.