|
|
|
|
|
by advael
726 days ago
|
|
The formalism that data-driven machine learning leans on is empirical tuning of stochastic search to drive approximation of functions, and despite what Silicon Valley would have you believe, most of the significant advances have been in creating useful meta-structures for modeling certain kinds of problems (e.g. convolution for efficiently processing transformations that care about local structure across dimensions of data, or qkv attention for keeping throughlines of non-local correspondences intact through a long sequence). Neural networks as a flavor of empirical function approximation happened to scale well, and then a bunch of people who saw how much this scale improved the models' capabilities but couldn't be bothered to understand the structural component concluded that scale somehow magically gets you to every unsolved problem being solved. It's also convenient for business types that if you buy this premise, any unicorn they want to promise is just a matter of throwing obscene amounts of resources at the problem (through their company of course) I think probably the general idea of dynamic structures that are versatile in their ability to approximate functional models is at least a solid hypothesis for how some biological intelligence works at some level (I think maybe the "fluid/crystallized" intelligence distinction some psychology uses is informative here - a strong world model probably informs a lot of quick acquisition of relationships, but most intelligent systems clearly posess strong feedback mechanisms for capturing new models), though I definitely agree that a focus on how best to throw a ton of scale at these models doesn't seem like a fruitful path for actionably learning how to build or analyze intelligent systems in the way we usually think about, nor is it, well, sustainable. Moore's law appeals to business people because buying more computronium feels more like a predictable input-output relationship to put capital into, but even if we're just talking about raw computation speed advances in algorithms tend to dwarf advances in computing power in the long run. I think the same will hold true in AGI |
|