|
|
|
|
|
by karpathy
754 days ago
|
|
That is 100% my intention and hope and I think we are very close to deleting all of that. Right now on master, I am already only using Python for the tokenization preprocessing. In principle the requirements for llm.c should be extremely minimal. I think this a few days of work that is high on my mind. Biggest problem right now is finding a place that can host the 135GB of tokens for FineWeb100B. Will probably use S3 or something. Related see:
https://github.com/karpathy/llm.c/issues/482 |
|