| HN Mirror

I can appreciate that the model is likely still highly capable with a good harness. Still, I think this is more in line with ideas from say, speed running (or hell even reinforcement learning) where you want to prove something profound is possible and to do so before others do, you need to accumulate a series of "tricks" (refining exploits/hacking rewards) in order to achieve the goal. but if you use too many tricks you're no longer proving something as profound as originally claimed. In speed running this tends to splinter into multiple categories.

Basically, the gane being conpleted by gemini was in an inferior category (however minuscule) of experiment.

I get it though. People demanded these types of changes in the CPP twitch chat, because the pain of watching the model fail in slow motion is simply too much.