| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Drakim 1225 days ago

I recall reading that when training AlphaZero they would start pitching it against itself doing millions of games in a few days, which worked great because there is an external metric (who wins the chess game) that would objectively be a good measure to train towards.

But if you let an AI's approval be the metric, things turn a lot more fussy and subjective. The goal is not actually "to write a good answer without error" but actually "to write an answer that is approved by the AI". Those are very different goals, and as you keep using it you'll get a bigger and bigger divergence, until eventually the AI is just answering complete garbage nonsense that precisely hits certain sweet spots in the grading AI.

This divergence of the target vs the actual human goal is a pretty interesting problem in AI safety research. I love the example where an AI trained to stay alive as long as possible in Tetris realized that pausing the game was the best strategy.

3 comments

aqme28 1225 days ago

You’re describing a GAN basically.

But yeah, you’re going to need an objective metric or human input otherwise the system is going to diverge in strange ways.

link

newswasboring 1225 days ago

I honestly think I might do this experiment, just to see what comes out. I know it will be utter garbage, but it will probably be interesting utter garbage.

link

callesgg 1225 days ago

Please do :)

The correction prompt is very important, it will definitely determine the outcome of the process, a bad correction prompt will obviously lead to a garbage result.

Training in steps with different prompts might be of value. First step might be to fix contradictions, then factual errors if that is an issue. This is an idea that I got when viewing the he output of LLaMA, it often contains contradictions (eg. an example I have seen is "Peter is a boy and he is part of the Gama sorority"). Asking it to fix those types of issues should be a first good step.

But I suspect that this type of training would need to be mixed with original training data. Otherwise the restructuring in the model caused by the new training would most likely garble the rest of the model.

link

Dwedit 1225 days ago

That wasn't an AI, that was a "Make the numbers go up" (lexagraphic ordering) system with TAS rewinding for short term bruteforcing.

link

MattPalmer1086 1225 days ago

Interesting, but the core point remains true. The algorithm optimises for something which may not entirely coincide with the creators intentions.

link