| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ainch 4 days ago
	I'd spent 6 hours solving a gnarly RL problem (mathematically solving divergence of off-policy TD-Lambda for any value of lambda or behaviour policy). As a punt I gave it to o3 (remember LLMs were 'bad at maths') - after 15 minutes it returned with the answer that had taken me hours.