|
|
|
|
|
by sqrt17
2173 days ago
|
|
Here's a thing: incorrect assumptions that are built into a model are more harmful than a model that assumes too little structure. If you model the vocal tract and the actual exciting things are the transient noises that occur when we produce consonants, at best there's lots of work with not much to show and at worst you're limiting your model in a negative way. That's the basis for the "every time we fired a linguist, recognition rates improved" from 90s speech recognition. On the other end of the spectrum, data and compute ARE limited and for some tasks we're at a point where the model eats up all the humanity's written works and a couple million dollars in compute and further progress has to come from elsewhere because even large companies won't spend billions of dollars in compute and humanity will not suddenly write ten times more blog articles. |
|
And the nice thing about these large models is that you can reuse them with little fine-tuning for all sorts of other tasks. So the industry and any hacker can benefit from these uber-models without having to retrain from scratch. Of course, if they even fit the hardware available, otherwise they have to make due with a slightly lower performance.