|
|
|
|
|
by santadays
239 days ago
|
|
It seems like there is a bunch of research/working implementations that allow efficient fine tuning of models. Additionally there are ways to tune the model to outcomes vs training examples. Right now the state of the world with LLMs is that they try to predict a script in which they are a happy assistant as guided by their alignment phase. I'm not sure what happens when they start getting trained in simulations to be goal oriented, ie their token generation is based off not what they think should come next but what should come next in order to accomplish a goal. Not sure how far away that is but it is worrying. |
|
It's been some time since LLMs were purely stochastic average-token predictors; their later RL fine tuning stages make them quite goal-directed, and this is what has given some big leaps in verifiable domains like math and programming. It doesn't work that well with nonverifiable domains, though, since verifiability is what gives us the reward function.