|
|
|
|
|
by simonw
312 days ago
|
|
If the models are memorizing and regurgitating from their training data, how come every model I've tried this with produces entirely different code? Presumably this is because "the network only needs to interpolate between them". That's what I want it to do! I tried the space invaders thing on a 4GB Qwen model today and it managed to produce a grid of aliens that advanced one step... and then dropped off the page entirely. |
|
When you temperature sample the same model twice you also get "different" code, diversity alone is not evidence of new reasoning. What matters is functional novelty under controlled transformations (renamed variables, resized canvas, obfuscated asset file names etc). On such metamorphic rewrites, models that appear brilliant on canonical prompts suddenly collapse, a hallmark of shallow pattern matching.
The paper I mentioned in my previos comment shows SOTA coding LLMs scoring 70%+ on SWE bench verified yet dropping 10–47% when the very same issues are paraphrased or drawn from unseen repos, even though the task semantics are identical. That is classic memorisation, just fuzzier than a CRC match.
As to qwen, even at 4 bit per weight, a 4B model retains ≈ 2.1 GB of entropy so enough to memorise tens of thousands of full game loops. The reason it garbled the alien movement logic is probably that its limited capacity forced lossy compression, so the behaviour you saw is typical of partially recalled code patterns whose edge cases were truncated during training. That’s still interpolation over memorised fragments, just with fewer fragments to blend. And this is something that is actually proven (https://arxiv.org/abs/2406.15720v1) by controlled fact memorisation studies and extraction attacks up through 70B params show a monotone curve so basically each extra order of magnitude adds noticeably more verbatim or near verbatim recall. So a 20B model succeeds where a 4B one fails because the former crossed the "capacity per training token" threshold for that exemplar. So nothing magical there.
Don't get me wrong, I’m not arguing against interpolation per se, generalising between held out exemplars is precisely what we want. The problem is that most public "just write space invaders” demos never verify that the endpoints were truly unseen. Until they do, a perfect clone is compatible with nothing deeper than glorified fuzzy lookup.