| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throwaway4aday 978 days ago
	Sounds like the process of tuning the reward space is a type of labelling and ranking problem. If I'm not mistaken, those are two things that GPT-4 is pretty good at. You wouldn't even necessarily pre-label every possible action since GPT-4 could do it in real time.