|
|
|
|
|
by throwaway4aday
931 days ago
|
|
Sounds like the process of tuning the reward space is a type of labelling and ranking problem. If I'm not mistaken, those are two things that GPT-4 is pretty good at. You wouldn't even necessarily pre-label every possible action since GPT-4 could do it in real time. |
|