Hacker News new | ask | show | jobs
by throwaway4aday 931 days ago
Sounds like the process of tuning the reward space is a type of labelling and ranking problem. If I'm not mistaken, those are two things that GPT-4 is pretty good at. You wouldn't even necessarily pre-label every possible action since GPT-4 could do it in real time.