| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yorwba 14 days ago
	There's no training code because the author is using an external service for that https://docs.primeintellect.ai/hosted-training/getting-start... The reward function is https://github.com/HarleyCoops/Math-To-Manim/blob/d1c412d22a... The environment is iterative LLM prompting. The idea is apparently that a model that is bad at fixing its own mistakes might become better if you train it on this task using reinforcement learning.