TL;DR: Dwarf Fortress fan learns about DNA from popular science articles, forms comprehensively superficial theory of life, the universe and everything, based on analogy to computer code and his estimation that nothing could be more complex than Dwarf Fortress.
I've never really thought a quote from an Adam Sandler movie was one to be repeated, but "Mr. Madison, what you’ve just said is one of the most insanely idiotic things I have ever heard. At no point in your rambling, incoherent response were you even close to anything that could be considered a rational thought. Everyone in this room is now dumber for having listened to it. I award you no points, and may God have mercy on your soul." Bonus: it does double duty summarizing the US presidential debates.
That's a bit harsh. The analogy made is not an uncommon one, but that doesn't mean there isn't some merit to the idea. I appreciate that people are at least trying to learn new things and form their own ideas.
While I agree it's a bit harsh, some of these extrapolations had me saying, "Oh really?" to myself.
> Biology is one of the last fields of science adopting the tech revolution by switching from analog to digital analysis. DNA has only recently been discovered. DNA sequencing is in its earliest stages.
> There is not enough space to fit “software” hereditary behavioral definitions into DNA. If Dwarf Fortress comes close to encoding basic behavioral patterns requiring 10 megabytes of data, we must look for a chunk of genetic data of that size.
> Good programmers try to restrict their functions to no more than 6 parameters when writing code. Thus, a driving system with learning ability depends on over 40,000 programmatic functions. Those functions operate with only a dozen of sensors, wheels, and breaks. What if you had to write self-learning software for controlling all the muscle groups in an ant organism? Clearly it would require much more than 790mb for a human or 117mb for an ant of data space to store.
> The “software” that runs the ant must include basic instincts, sensory recognition patterns, social interactions, spacial awareness, navigation routines, some learning ability, and threat estimation in its ancestral memory in addition to all the hardware schematics. Each muscle group must work in tandem with the senses. How much data would that amount of code require? An easy way to estimate it is to simulate those behaviors on your computer. Having some familiarity with multiple programming languages, I would guess that 117mb is totally insufficient for all that.
> If complex data compression is shown to play a major role in the life of an organism, my argument could be falsified. At the same time, the field of biology would be revolutionized.
> One might object that a negative proposition of the form “x does not explain y” is empirically indefensible. The form of my argument, however, follows another pattern: “there is not enough observable x in y.” Such statements are empirically demonstrable and empirically falsifiable. For example, the statement “there are fewer than 10 goats in this wood” is empirical. Just search the wood to confirm it. By analogy, we should expect at least as many discoveries of genes controlling behaviors as we have for protein generation and regulatory genes. But we do not.
The analogy is horribly flawed, especially the bio compiler metaphor. In silico automatons are primitive in complexity when compared to processes in simple organisms.
Disclaimer: Have seen both the IT/CS and biotech side of things.
I agree that my post was a bit snarky—what can I say, I hadn't had my coffee yet. If the author is a high school kid, he has my apologies for the snark, and encouragement to dig much deeper into the subject matter as well as what is logically and rhetorically sound, and take another pass at his theory.
That said, there's an epidemic of know-it-all-ness in this industry, and this article doesn't pass even the most cursory of reviews of Real Information™. I agree that the analogy is commonplace, but the reason its an analogy and not a model is that it breaks down rapidly under scrutiny.
I mean, I thought it was an interesting read, although I immediately recognized it was not a scholarly publication in any way (even with his "formal theory")
Kind of strange that there's no mention of actual behavioral genomics research here. Aside from the quip that:
> Scientists have found some genetic code that contributes to hereditary behaviors, but the bulk of it is unlikely to ever be found. Why? Because it plainly does not fit.
This is plainly not true. We've found lots of evidence for the genomic basis of hereditary behaviors. See Genome-wide Complex Trait Analysis for example - plenty of statistical research is out there.
I did look up recent findings and they were meager. I followed your advice and looked up "Complex Traint Analysis." It is also quite meager. Were you able to find something close to 10mb? If you did, link please. I would expect that that category of genes would get a technical term alongside with "regulatory" and "encoding" genes the moment enough of them surfaced.
Right! It's obvious he's lacking some (relatively) recent information. And he vastly overestimates our ability to do things like simulate biochemical interactions.
Is he arguing that Dwarf Fortress has codes for souls? Because I feel like his argument and example say exactly the opposite.
What's vastly different between software instructions in dwarf fortress vs DNA is what actually computes the code. There is a negligible amount of complexity in made-made computers vs what goes on in a cell. There are layers upon layers of complexity that goes from genotype to phenotype.
I am a bioinformatician with a genetics background. I've always had this abstract overview of DNA as a highly compressed dataset and geneticists are essentially trying to figure out the encoding.
hey I am glad that you are an expert and your criticism is of this nature! With my limited knowledge, I see exactly what you mean and I agree with you. There is another difference, however, between the two computes that I think you have omitted in your comment. The cell's inner complexity shares the same information space as the organisms "computes," unlike a computer CPU (and the operating system) which is defined and built by a file entirely outside of the 10mb limit I've set. The cell must fit all of it right next to each other.
simulating complex behavior requires at least 10mb of data
The author has not shown that that much is required. At most, the author has shown that it has been done with that much. Seeing as having only thousands of states is apparently enough to make a Turing machine's behavior independent of ZFC (http://www.scottaaronson.com/blog/?p=2725), I really doubt this "10 megabyte" lower bound.
OP's mind would probably be absolutely blown by the demo scene if he thinks 10MB of data is impressive. Also, the NN example is interesting because no one in deep learning thinks that you actually need all those parameters: most NNs can be shrunk by 90%+ with some simple tweaks like quantizing neural weights and pruning them (speculation is that you need to train large parameterized NNs to make the path through high-dimensional space to the optimal program smooth enough for gradient descent to travel).
> Even when behavior is mostly learned rather than hard-coded, the ability to learn itself must be hard-coded. Such code occupies hundreds of megabytes of data. Consider the first generation of software for self-driving cars. NVIDIA’s deep-learning drivers for self driving cars utilize “27 million connections and 250 thousand parameters.”5 Good programmers try to restrict their functions to no more than 6 parameters when writing code. Thus, a driving system with learning ability depends on over 40,000 programmatic functions.
Obviously the author has completely misunderstood what is meant by "parameters" in this context. The parameters here are those of a neural network and thus they are a result of, not a precondition for the learning process. Instead, he should strictly use the implementation of the neural network for comparison, whose uncompressed human-readable size is probably going to be almost small enough to fill into his size constraints.
And that is assuming there is not a sufficiently good algorithm for learning animal behaviors that is much simpler than CNNs.
Legit criticism, I actually tried to get a clarification on that, what those words mean exactly. I read most of the CNN doc and went ahead and wrote that, because I felt that it was the best meaning in context. I could be wrong about that. However, the CNN image itself must be over 1GB. So if those parameters are simply a matrix fed to the learning algorythm, the algo itself still ends up huge. > "And that is assuming there is not a sufficiently good algorithm for learning animal behaviors that is much simpler" > Right, but assuming something is not is the default unless there is evidence that something is.
I think the author vastly overestimates the amount of information required to encode complex behaviours.
In general, the minimum number of bits/base-pairs required to encode some behaviour (i.e. the length of the shortest program which exhibits that behaviour) is uncomputable, see https://en.wikipedia.org/wiki/Kolmogorov_complexity
From an empirical point of view, evolution can't plan ahead and doesn't care about modularity (or at least, such things are higher-level effects which aren't directly selected for). It progresses via massively parallel monte carlo search, rather than "improving" a single individual. This is very different to most software development practices, and would tend to result in much more compact (genetic or computer) code.
The results of our software development practices are incredibly bloated compared to products of evolution. 10MB for an artifact like Dwarf Fortress is huge; we would likely make very significant gains if we compressed it using a superoptimiser (as long as we ignore petty issues like the heat-death of the Universe!).
The advantage our practices have over evolution by natural selection is that they're much faster. Our software is intelligently designed by programmers, who can think ahead about what features or changes might be useful and make specific, targetted changes to the code to bring them about. The redundancy, modularity, structure, separation-of-concerns, etc. which we strive for in our software facilitates this process.
Wikipedia says Dwarf Fortress has been around since 2002. Even if evolution were 'guided' towards such a thing, we would never expect it to come up with such a program in mere decades (although it can certainly make measurable optimisations, and small-but-important changes such as drug resistance).
I don't for a moment believe that Dwarf Fortress contains a true 10MB of irreducibly complex behavioural code. Rather this is simply an indication that our computer languages and compression systems are much less efficient than what nature uses for DNA.
"I don't for a moment believe that Dwarf Fortress contains a true 10MB of irreducibly complex behavioural code. Rather this is simply an indication that our computer languages and compression systems are much less efficient than what nature uses for DNA."
You can test an information stream for entropy and you can also determine whether there is any entropy left in a compressed information stream.
So while I think we should certainly respect the accumulated wisdom and refinement that trillions of cell divisions have stored up over the eons, it's not obvious that "nature" has better compression than we do.
> You can test an information stream for entropy and you can also determine whether there is any entropy left in a compressed information stream.
I don't think that measures the thing that needs to be measured here. Kolmogorov complexity is famously uncomputable; the output of a pseudo-random number generator will be high-entropy but "really" only contains a tiny amount of information (the initial seed).
And more to the point, DNA certainly contains more than 10MB of entropy; the claim is that most of this DNA is junk. Fair enough, but I expect the 10MB of DF code is also mostly junk: irrelevant details that the programmers were unnecessarily required to specify or the compiler picked out, rather than actually expressing the details of behaviour.
I don't think that is very surprising - after all, nature literally had billions of years optimizing the ___ out of DNA and how it encodes (among other things) behavior.
> Biology is one of the last fields of science adopting the tech revolution by switching from analog to digital analysis.
Barricelli was running experiments about digital organisms back in the early 50's right before the identification of the structure of DNA. Quite literally sharing time with atomic/hydrogen bomb calculations. I find it a bit tough to press onwards when sweeping statements like the above are made as a build up to the argument.
I don't believe it's a fair comparison... modern CPUs can run, by definition, a pretty basic range of operations, and every intent has to be encoded in a extremely complex sequence of commands. Given a certain CPU architecture and a certain algorithm to implement, someone might be able to properly infer the expected size of a compiled binary. But, by pushing the concept to the absurd, one day we might build a super-sophisticated CPU that runs code at such an high level of abstraction that the Dawrf Fortress game is encoded in a single binary digit, whose sole meaning is "Play Dwarf Fortress". Between the 10MB binary running in modern CPUs and the single 1-bit binary running in our hypotetical "super-CPU" stand all the possible devices that have been existed and that will be ever invented... How can you place a living cell in this scale? Unless you can do it, any discussion about binary size is simply pointless...
The "super-CPU" that you have described has to come from the same DNA. So if you take bits away from behavior to make "super-CPU," the definition of the "super-CPU" has consumed the data that used to encode behavior, leaving us with the same problem.
If I were going to engage in the same sort of wide-eyed speculation the author of the piece is happy to deploy then I would say he's looking at the wrong level. The DNA contains instructions for growing a brain; the structure of the brain encodes for the hereditary behaviors. Of course, I have no idea what I'm talking about.
Dwarf Fortress defies a silly strawman of materialism, more like. I think it's safe to say that most materialists wouldn't claim that all human behavior comes from our genes.
I've never really thought a quote from an Adam Sandler movie was one to be repeated, but "Mr. Madison, what you’ve just said is one of the most insanely idiotic things I have ever heard. At no point in your rambling, incoherent response were you even close to anything that could be considered a rational thought. Everyone in this room is now dumber for having listened to it. I award you no points, and may God have mercy on your soul." Bonus: it does double duty summarizing the US presidential debates.