Hacker News new | ask | show | jobs
by MilStdJunkie 1100 days ago
Data, data, data, data. 1990s don't have wikipedia, Youtube, megapixel cameras every which where, every single adult human hooked up to a sensor package 24 hours a day, and who knows what else. I know as a 1990s guy I would never have imagined the amount of data we would eventually all throw up into the ether even ten years later, to say nothing of today. Without that corpus . .
3 comments

And none of those examples except Wikipedia were used to train the various LLMs. I wonder how much better multi-modal models are going to get if they start incorporating the 24/7 sensor data from billions of people.
I cant wait for the time when someone trains a multimodal LLM on all of youtube
"Don't forget to like and subscribe"

On a side note, long time ago I saw someone who make a bot trained on a selected sample of chats between people on the internet - and the tool swore a lot.

encyclopedia Britannia existed. I came to USA in late 90s and my school had the CD set.
Wikipedia is ~100 bigger than the Encyclopædia Britannica

https://en.m.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

Nice link! I never saw that page before. This quote surprised me:

    it should be noted that the amount of text added to Wikipedia articles every year has been constant since 2006, at roughly 1 gigabyte of (compressed) text added per year.
Yes, Wikipedia is surprisingly small. You can fit the whole thing on an iPad and access all of it without internet. Plenty of rabbit holes to fill even the longest airplane flight.
That's honestly much smaller than I expected.
gpus don't forget the gpus ! compute was too slow for the task at hand.