Hacker News new | ask | show | jobs
by koochi10 1094 days ago
I disagree with this slightly, modern computer chips follow a general purpose architecture not special purpose ones. The reason for this is building a computer chip is expensive and difficult to do. Similarly building any useful language model requires tons of compute power, and very smart ML researchers. Most of the smaller open source one's are just trained on GPT output.

By "cramming all of the web" on a model what is really going on is the hidden layers of that network are getting better at understanding language and logic. Imagine trying to teach a kid who doesn't know how to read to learn about a Science by only giving them science textbooks. Chances are they won't get very far.

Building little specialist model's don't really work either. It's like trying to train a parrot to do science, sure it can repeat some of the phrases that you give it, but at the end of the day it's not really making any new connections for you.

1 comments

Since when did the world decide to shed the grammatically correct "computing power" for this weird tech bro "compute power" phrase?
Definitions are a little fuzzy but "compute" is often used to distinguish between "compute," as in CPU, memory, and disk. And GPU is a very specialized kind of compute power. There's really no "grammatically correct" here these are different senses of the word. "Computing power" doesn't exactly have the same sense of specifically referring to a CPU or GPU as "compute power."
Since we adopted English as the standard language, complete with its penchant for butchering words by shortening them for the sake of convenience and speed.

I wonder what people said about "bus" back in the day, especially those who knew Latin.