Hacker News new | ask | show | jobs
by rnosov 1214 days ago
One way to think about it is that the model needs to essentially encode the entirety of human knowledge. If you can do it with just 175b parameters then it looks quite efficient to me. GPT-3 is about 400gb in size which would even fit in some modern IPhones! Another metric to consider is that there are about 100 trillion connections in the human brain. If you roughly equate brain connection to a model parameter then GPT-3 would be only 0.175% size of human brain.
1 comments

A model parameter is not the same as a "fact". Facts can multiply uncontrollably, but the logical relationships between facts that (at least we as humans) care about are much more economical. It feels that this approach is missing some key abstractions that might help reduce redundancy in encoding. But its just a hunch. Need to dig deeper to understand at least conceptually why this dimensional explosion.