| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by turtles3 746 days ago

As a random thought, this seems to be about the same order of magnitude compute as Karpathy's recent GPT-2 work:

https://github.com/karpathy/llm.c/discussions/677

You could take the final checkpoint from that page and run it for some additional steps and see if it improves? You could always publish the final checkpoint and training curves - someone might find it useful.