|
|
|
|
|
by kang
47 days ago
|
|
The answer should be obvious that its both. Zurada was one of our AI textbook that makes it visual that right from a simple classifier to a large language model, we are mathematically creating a shape(, that the signal interacts with). More parameters would mean shape can be curved in more ways and more data means the curve is getting hi-definition. They reach something with data, treating neural network as blackbox, which could be derived mathematically using the information we know. |
|
However: the labs releasing these high-intelligence-density models are getting them by first training much larger models and then distilling down. So the most interesting question to me is, how can we accelerate learning in small networks to avoid the necessity of training huge teacher networks?