Hacker News new | ask | show | jobs
by college_physics 1217 days ago
Why are gazillions of parameters needed in the first place? From an information perspective it feels that there might be some fundamentally inefficient use of parametric freedom. A brute force approach to combinatorial explosion so to speak. Are there any research efforts that look into how to reduce model complexity (without substantially sacrificing performance obviously).
2 comments

One way to think about it is that the model needs to essentially encode the entirety of human knowledge. If you can do it with just 175b parameters then it looks quite efficient to me. GPT-3 is about 400gb in size which would even fit in some modern IPhones! Another metric to consider is that there are about 100 trillion connections in the human brain. If you roughly equate brain connection to a model parameter then GPT-3 would be only 0.175% size of human brain.
A model parameter is not the same as a "fact". Facts can multiply uncontrollably, but the logical relationships between facts that (at least we as humans) care about are much more economical. It feels that this approach is missing some key abstractions that might help reduce redundancy in encoding. But its just a hunch. Need to dig deeper to understand at least conceptually why this dimensional explosion.
Given a large enough model, model architecture becomes increasingly less relevant as any specialized architecture can be discovered by the larger model automatically.

The only benefit of a specialized architecture is minimizing resource usage.