Hacker News new | ask | show | jobs
by maxrmk 84 days ago
I don't think this is a correct explanation of how things work these days. RL has really changed things.
1 comments

Models based on RL are still just remixers as defined above, but their distribution can cover things that are unknown to humans due to being present in the synthetic training data, but not present in the corpus of human awareness. AlphaGo's move 37 is an example. It appears creative and new to outside observers, and it is creative and new, but it's not because the model is figuring out something new on the spot, it's because similar new things appeared in the synthetic training data used to train the model, and the model is summoning those patterns at inference time.
> the model is summoning those patterns at inference time.

You can make that claim about anything: "The human isn't being creative when they write a novel, they're just summoning patterns at typing time".

AlphaGo taught itself that move, then recalled it later. That's the bar for human creativity and you're holding AlphaGo to a higher standard without realizing it.

I can't really make that claim about human cognition, because I don't have enough understanding of how human cognition works. But even if I could, why is that relevant? It's still helpful, from both a pedagogical and scientific perspective, to specify precisely why there is seeming novelty in AI outputs. If we understand why, then we can maximize the amount of novelty that AI can produce.

AlphaGo didn't teach itself that move. The verifier taught AlphaGo that move. AlphaGo then recalled the same features during inference when faced with similar inputs.

>AlphaGo didn't teach itself that move. The verifier taught AlphaGo that move.

No. AlphaGo developed a heuristic by playing itself repeatedly, the heuristic then noticed the quality of that move in the moment.

Heuristics are the core of intelligence in terms of discovering novelty, but this is accessible to LLMs in principle.

> The verifier taught AlphaGo that move

Ok so it sounds like you want to give the rules of Go credit for that move, lol.

It feels like you're purposefully ignoring the logical points OP gives and you just really really want to anthropomorphize AlphaGo and make us appreciate how smart it (should I say he/she?) is ... while no one is even criticising the model's capabilities, but analyzing it.
Can you back that up with some logic for me?

I don't really play Go but I play chess, and it seems to me that most of what humans consider creativity in GM level play comes not in prep (studying opening lines/training) but in novel lines in real games (at inference time?). But that creativity absolutely comes from recalling patterns, which is exactly what OP criticizes as not creative(?!)

I guess I'm just having trouble finding a way to move the goalpost away from artificial creativity that doesn't also move it away from human creativity?

No. AlphaGo does search, and does so imperfectly. It does come up with creative new patterns not seen before.
How do you know that? We don't have access to the logs to know anything about its training, and it's impossible for it to have trained on every potential position in Go.