Hacker News new | ask | show | jobs
by K0balt 1042 days ago
Yeah, it makes some sense that you could use a more intense introspection to train weaker ones… I wonder what the human analogue for that looks like.

Maybe working up a proof and then quizzing yourself on it?

As long as we get >N supervision and the difference is more than the model retrograde, it seems that could work. But it seems like there is a definite limit to that. The N-n1 difference will only stay above the improvement delta up to a point.

1 comments

The model would learn from feedback, not just regurgitate the training set, as long as the model is part of a system that can generate this feedback. AlphaGo Zero had self play for feedback. Robots can check task execution success. Even chatting with us generates feedback to the model.