|
|
|
|
|
by thanhdotr
682 days ago
|
|
well, as yann lecun said : "Adversarial training, RLHF, and input-space contrastive methods have limited performance.
Why?
Because input spaces are BIG.
There are just too many ways to be wrong" [1] A way to solve the problem is projecting onto latent space and then try and discriminate/predict the best action down there. There's much less feature correlation down in latent space than in your observation space. [2] [1]:https://x.com/ylecun/status/1803696298068971992
[2]: https://openreview.net/pdf?id=BZ5a1r-kVsf |
|