Hacker News new | ask | show | jobs
by categoricalrift 101 days ago
How about the very last "Kept Improvement" in the plot? It's titled "random seed 42 -> 137". I do think this project is quite conceptually interesting, but the model literally choosing a different random seed to achieve lower loss feels pretty far removed from the flowery sci-fi writing at the top of the readme.
3 comments

So the interesting part about this one is that when I had the model write up the results for that session:

https://github.com/karpathy/autoresearch/discussions/32

Look at its comment about this "improvement":

""" Surprising non-results:

- Changing random seed from 42→137 improved by 0.0004. Seed 7 was worse. Make of that what you will. """

So the model knows! It knows that this is a weird thing to do after the fact. I think it's silly that the model even tried and that it ran this, but some part of it also knows that it was wrong. This means that this is fixable by prompt.md

It shows that both Karpathy and the LLM have good taste in random seeds: the answer to life, the universe and everything, and ~1/(the fine structure constant)
The 42 -> 137 also jumped out at me. On the face of it, the associated improvement sure does sound like overfitting to the eval set.