Hacker News new | ask | show | jobs
by ryanackley 658 days ago
This is for LLM's which deal mainly with text. An entire book can be stored as .42 MB according to https://www.quora.com/How-many-megabytes-are-in-a-book.

424 terrabytes text is over a billion books worth of data. On the common crawl website it even says "Over 250 billion pages spanning 17 years." That's an impressive amount of information.

1 comments

LLMs can deal with more than text. Impressive today is nothing tomorrow
The technology that allows an LLM to "see" images and video is completely different though. It's not what is being trained on common crawl.
not really. embeddings are embeddings. check out llava