| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by karpathy 754 days ago

That is 100% my intention and hope and I think we are very close to deleting all of that. Right now on master, I am already only using Python for the tokenization preprocessing. In principle the requirements for llm.c should be extremely minimal. I think this a few days of work that is high on my mind.

Biggest problem right now is finding a place that can host the 135GB of tokens for FineWeb100B. Will probably use S3 or something.

1 comments

metadat 754 days ago

Could this be a good case for a torrent?

link