Hacker News new | ask | show | jobs
by blueorange8 1185 days ago
I don't believe they spent 7 months just "making gpt-4 safer" - I think they spent a very long time doing human reinforcement learning to make it better and now hope to speed that process up the next time using gpt-4 itself as the reinforcement.
1 comments

AFAIK the paper mentioned that RLHF mostly decreased its capabilities. It seems more likely to me that just longer normal training was one of the main reason for the increased capabilities.