Hacker News new | ask | show | jobs
by anuragvohraec 1247 days ago
isn't it curve fitting at the end of the day. A multi parameter curve fitting ? why do people say they don't now how it works. Yeah i get it that the cocktail is fairly complex, after training it on very huge dataset (all most all possible logical scenarios). But telling it we do not know how it works, seems like just adding mysticism to it, which attracts "clicks", but is not an honest description.
2 comments

> Why do people say they don't now how it works

"Curve Fitting" is the objective, the function encoded in the weights is the solution, and not actually well understood. See work from Anthropic[1] and Google[2] that explores this.

As an analogy consider applying the same argument to the AlphaGo value function. It's "just" fitting a bunch of curves to the statistics of millions of self-played games. However, to effectively capture those statistics the network needed to develop a bunch of heuristics. Needless to say these heuristics are not understood (else we'd already know the principles needed to play at AlphaGo's level), and are not just exhaustive lists of statistical trends but more like strategies[3].

Recent work[4] strongly suggests that "grokking" (a striking but not unnatural[5] form of generalization) involves networks transitioning from memorized statistics/solutions to a general solution. The curve fitting perspective would totally miss all this for a comfortable but misleading story: "the objective is curve fitting so it's just interpolating data points".

[1] https://transformer-circuits.pub

[2] https://arxiv.org/abs/2212.07677

[3] https://www.pnas.org/doi/10.1073/pnas.2206625119

[4] https://arxiv.org/abs/2301.05217

[5] https://arxiv.org/abs/2210.01117

Would "it's curve fitting by building an internal representation to better describe all the curves seen so far" be a better layperson-ish analogy in your opinion?

Depending on how the model is set up, we'd say 'set of basis functions', 'language', 'strategy'.

Even assuming that curve fitting is actually in any way a meaningful description, let’s say I give you that. It tells us absolutely nothing about the mechanisms evolved in the neural pathways, how it encodes memory if the current game distinctly from memory of past games, the reasons for the strength or weaknesses of one trained instance against another, or ways we could optimise the architecture to better complement the way it function. It doesn’t help us engineer the system, or reason about its possible limitations or failure modes. In other words it doesn’t tell us anything actually useful about it.
So how does mysticism and unprovable-and-likely-false analogies-to-life help the engineering?
What are you even talking about?