Hacker News new | ask | show | jobs
by lossolo 1253 days ago
I think the problem is time. To replicate the OpenAI RLHF architecture, they need their own high-quality dataset, which takes time to create. Without details about the hyperparameters, RL architecture, and omitted steps, they need to test a lot of things, which takes time and money. It requires more resources than it took SD to replicate DALL-E 2, which took months and was an easier task.
2 comments

There is Open Assistant by LAION (WiP): https://github.com/LAION-AI/Open-Assistant
Check out Constitutional AI from Anthropic. They automated "RLHF" by simply writing a few rules.
Thanks, just skimmed the paper, I think that "they automated RLHF" statement is maybe too strong here, there is still manual process but it seems like they optimized away a lot of manual labeling work.