|
|
|
|
|
by koochi10
1094 days ago
|
|
I disagree with this slightly, modern computer chips follow a general purpose architecture not special purpose ones. The reason for this is building a computer chip is expensive and difficult to do. Similarly building any useful language model requires tons of compute power, and very smart ML researchers. Most of the smaller open source one's are just trained on GPT output. By "cramming all of the web" on a model what is really going on is the hidden layers of that network are getting better at understanding language and logic. Imagine trying to teach a kid who doesn't know how to read to learn about a Science by only giving them science textbooks. Chances are they won't get very far. Building little specialist model's don't really work either. It's like trying to train a parrot to do science, sure it can repeat some of the phrases that you give it, but at the end of the day it's not really making any new connections for you. |
|