|
|
|
|
|
by sudosysgen
946 days ago
|
|
Absolutely, maximizing conditional probabilities is easily modeled as a Markov decision process, which is why you can use RL to train Transformers so well (hence RLHF, I've also been experimenting with RL based training for Transformers for other applications - it's promising!). Using a transformer as a model for RL to try to choose tokens to maximize overall likelihood given immediate conditional likelihood estimation is something that I imagine many people experimented with, but I can see it being tricky enough for OpenAI to be the only ones to pull it off. |
|