|
|
|
|
|
by turtles3
698 days ago
|
|
As a random thought, this seems to be about the same order of magnitude compute as Karpathy's recent GPT-2 work: https://github.com/karpathy/llm.c/discussions/677 You could take the final checkpoint from that page and run it for some additional steps and see if it improves? You could always publish the final checkpoint and training curves - someone might find it useful. |
|