Hacker News new | ask | show | jobs
by ainch 84 days ago
The human genome contains around 1.5GB of information and DeepSeek v3 weighs in at around 800GB, so it's a bit apples-to-oranges. As you say, what's been evolved over hundreds of millions of years is the learning apparatus and architecture, but we largely learn online from there (with some built-in behaviours like reflexes). It's a testament to the robustness of our brains that the overwhelming majority of humans learn pretty effectively. I suspect LLM training runs are substantially more volatile (as well as suffering from the obvious data efficiency issues).

If you'd like an unsolicited recommendation, 'A Brief History of Intelligence' by Max Bennett is a good, accessible book on this topic. It explicitly draws parallels between the brain's evolution and modern AI.

4 comments

The comparison is weird as we don't think with the Genome. There are something like ~100 billion neurons with ~100 trillion connections in an adult human brain . I don't know how many bytes of sourcecode deepseek has, but I don't think it helps in determining the amount of reasoning it can do.
> The comparison is weird as we don't think with the Genome

The genome determines how your brain learns, so yeah we do. We don't solve short easy tasks via learning, no, but longer tasks that involves learning involves our DNA.

> longer tasks that involves learning involves our DNA

Longer tasks that involve learning also involve caloric consumption and respiration, that doesn't mean we think with the sun and the air.

No. Not in any way, no.

Learning is a physical process in which neurons form new connections to one another. It has nothing to do with DNA.

This is like measuring LLM performance based on CPU microcode size. Completely nonsense.

Learning also happens on the species level. The species "learns" (thru natural selection) which genes produces brain structures that lead to survival and reproduction.
The human genome isn't its own thing, the genome as a static sequence is really just an abstraction. What actually functions as the heritable unit includes epigenetic marks, non-coding RNA regulation, 3D chromatin structure, and mitochondrial DNA. In the real biological world there are very few sharp edges - systems bleed into one another and trying to define something like 'the number of bits in the human genome' is very difficult, but it's undoubtedly way bigger than you posit here.
> The human genome contains around 1.5GB of information and DeepSeek v3 weighs in at around 800GB, so it's a bit apples-to-oranges.

The apples-to-apples comparison is comparing the human genome to the code behind a particular LLM. The genome defines the structure that learns and thinks, just like the code for the LLM.

And that same information contained in an LLM is a compression of how many terabytes of training data? Maybe in the future there will be models an order of magnitude smaller and still better performing.

What I'm saying is you can't judge the data in the genome by purely counting the bytes of data.