Hacker News new | ask | show | jobs
by ewjordan 5784 days ago
Perhaps the disconnect here is that Kurzweil, operating from an information theory perspective, is neglecting the possibility that the biological environment in which a brain grows effectively adds a ton of "data" to the system

Please realize, despite the fact that pretty much everyone on HN is repeating this argument (the "data gets added to the system" argument), it is an extraordinary claim, and should require correspondingly extraordinary evidence if we're to consider it.

I'm going to justify this in excruciating detail, because the claim has now come up so many times.

But first, let's nail down the context, because if we can't agree on that then we really shouldn't even be discussing the topic (and I suspect the whole problem here is that Myers thinks they're arguing about something other than what Kurzweil is actually claiming) - we're discussing the amount of information that we would need to construct an effective intelligent algorithm. Not one particular algorithm, but any effective intelligent algorithm.

Here goes, a pseudo-mathematical breakdown of why this "data gets added" argument is so hideously wrong:

There's an entire infinite universe of "possible intelligence algorithms" (for the moment, we won't define this too precisely, but we'll hand wave and say that this universe consists of all algorithms that take the right inputs and provide the right outputs, whatever those are), most of which are utterly useless, and are certainly not intelligent. Let's call this universe U0.

Step one: let's cut U0 down to a finite practical size, eliminating ridiculous algorithms that we could never expect to implement. We can do this in a million ways, it doesn't really matter; for now, let's just say that we're cutting it down to algorithms that have possible physical realizations using the resources on our planet. That's still a huge number of algorithms. Call this U1.

Step two: Let's now trim U0 in a different way, picking out only the algorithms that we consider actually intelligent, however you want to define "intelligent". Name this (still infinite) set Z.

Step three: Take the intersection of Z (intelligent algorithms) with U1 (practical algorithms), call this set P. P is all the practical algorithms that qualify as "intelligent".

Now let Prob(I) = (size of P / size of U1), the probability that a randomly selected practical algorithm will be intelligent. This is an extremely small probability, but it's finite and non-zero (human intelligence suffices to prove that it's non-zero).

Step four: Now we slice up U1 in a different way, and create a set D_N: the set of all algorithms that can be specified by growing a human from a string of DNA of length N (and that ultimately run within the space constraints).

Step five: Set P(D_N) = intersection of D_N and P, all intelligent algorithms satisfying the space constraints that can be grown from a DNA string of length N.

Ok, that's a lot of sets, but it's okay, we don't need most of them. One last calculation:

Prob(D_N) = (size of P(D_N) / size of D_N), the probability that a randomly selected practical DNA-created algorithm will be intelligent.

No more set-fu, I promise. We've boiled it down to two probabilities, Prob(I), and Prob(D_N). These probabilities are proxies for the information content needed to pick an intelligent algorithm out of the corresponding sets of algorithms.

The "information gets added" claim has a very simple mathematical expression:

Prob(D_N) > Prob(I) when N = the length of human DNA

i.e. a randomly selected DNA-created algorithm with DNA length N has a greater probability of qualifying as intelligent than a randomly selected algorithm in general. And not just a little bit greater - you're saying that the fact that it's implemented via DNA makes the probability much higher, corresponding to the data difference you're claiming with the statement 'far, far more than 50MB of "data" here'.

Perhaps now you see the trouble: in order for me to consider the "data gets added" argument plausible, I need to hear an argument that suggests that a random construction based on DNA is far more likely to lead to an intelligent algorithm than a random construction in general.

Myers has not offered an argument in this direction. Neither has anyone else. Until someone does, the odds are overwhelmingly in Kurzweil's favor; statistically speaking, Myers is flat out wrong.

So I put the question to everyone: what's so special about the DNA construction process that makes it so much more likely to create intelligence than any other construction process we might conceive of?

3 comments

I think there's an argument that DNA is more likely to lead to life (not intelligence) than an arbitrary coding scheme; that is, that with DNA it's easier to create life than you'd expect for a vanilla encoding scheme.

This assumes that during the billions of years on earth before DNA, lots of different chemicals came together, but if any self-replicating one came together, it would grow and still be around. Exceptions are if it wasn't very good at it and died out, or that DNA-based life attacked it or grew faster, crowding it and starving it (or bad luck wiped it out - but it could arise again.)

The fact that DNA did survive shows that DNA is indeed specially suited to encoding life. One might even try to estimate how specialized it is, by estimating how improbably it is, based on how long it took for a planet of experiments to arrive at it.

The details of how it's special could be in terms of protein folding, eg. that you can specify some really cool and useful folds, crucial for life, in surprisingly short DNA sequences. It's as if the search space of encoding schemes was scoured for schemes that in effect included a handy collection of library functions.

But this lovely (I think) argument doesn't apply to intelligence at all; nor even to mammals, or indeed animals - just for basic life. Once life existed, all the extra features were just hacked on.

But first, let's nail down the context, because if we can't agree on that then we really shouldn't even be discussing the topic (and I suspect the whole problem here is that Myers thinks they're arguing about something other than what Kurzweil is actually claiming) - we're discussing the amount of information that we would need to construct an effective intelligent algorithm.

If that's the question, then sure, I agree with you. In fact, I imagine something intelligent code be encoded with much less data.

My point was about the information content required for the human brain itself, and that seems to be what Kurzweil is talking about, at least:

The amount of information in the genome (after lossless compression, which is feasible because of the massive redundancy in the genome) is about 50 million bytes (down from 800 million bytes in the uncompressed genome). It is true that the information in the genome goes through a complex route to create a brain, but the information in the genome constrains the amount of information in the brain prior to the brain’s interaction with its environment.

You're absolutely right - looking at that quote (and looking through the original one that Myers responded to), he's overstated what we can conclude based on DNA length by a good amount (though Myers argument doesn't disprove the upper bound, by any means; it merely points out that Kurzweil's "proof" doesn't hold). The problematic phrase, which makes this argument pretty ambiguous, is "simulate the brain" - neither party has really pinned down what they mean by that, so it's hard to know what would qualify. In retrospect, I think I cut Kurzweil a little too much slack when deciding what he meant, esp. in light of his other writings on the topic...

It's a shame, because his argument is fully defensible if it's stated correctly and applied to the general problem of AI instead of to Kurzweil's pet theory that full brain simulation is the One True Way.

Protein folding.

The amount of computation, and program expressibility involved in folding a protein is far far far more than you would expect from just counting the DNA that made that protein.

Yes, "fold proteins" is a member of a distinguished class of algorithms that DNA is exceptionally well suited to handle.

Similarly, "simulate shallow wave dynamics" is in the class of algorithms that a water tank is pretty near optimal for, and the "find local minima" problem is handled with remarkable ease by gravity and a rolling ball; we'd be hard pressed to write computer programs that solve any of these problems using fewer bits than we can get by with by using the real world to solve them instead.

But most algorithms are not made that easy by any of these computational substrates; in fact, vanishingly few of them are, and an algorithm is only likely to be "easy" relative to vanishingly few systems.

What would let us assume that "Do intelligence" is a member of the subset of problems that are made easy by the detailed workings of biology? Because the a priori probability that it falls into that class is just about zero...

From "counting the DNA", you would expect that the family of hundred-amino-acid-long peptides, which are encoded in strings of 300 bases, would have 4³⁰⁰ = 2⁶⁰⁰ possible three-dimensional conformations, or rather probability distributions over conformations, since many peptides have multiple stable conformations.

However, those 2⁶⁰⁰ sequences of bases are immediately reduced to 20¹⁰⁰ ≈ 2⁴³² possible sequences of amino acids (ignoring the start and stop codons, which are presumed to lie just before and just after the 300-base sequence in question).

Are you suggesting that these 2⁴³² different peptides somehow express many more than the 2⁶⁰⁰ different three-dimensional structures, or rather, probability distributions over them? Because that seems like a highly implausible claim, on the face of it.

Or are you going to answer, "Fucking arithmetic. How does it work?"