| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by HarHarVeryFunny 865 days ago
	Sure, but their title seems poorly chosen and doesn't match what they are claiming in the article itself, which includes understanding how GPT-2 makes it's predictions. How does GPT-2 learn, for example, that copying a word from way back in the context helps it to minimize the prediction error? How does it even manage to copy a word from the context to the output? We know that it is minimizing prediction errors, and learned to do so via gradient descent, but HOW is it doing it? (we've discovered a few answers, but it's still a research area)

1 comments

cfgauss2718 865 days ago

I haven’t read the manuscript yet, and am not sure that I will. However I don’t agree with the question. Gradient descent, the properties of the loss function are the “how”. It seems like you want to know how some properties of the data are manifested in the network itself during/after training (what these properties are doesn’t seem to be something that people know they are looking for). Maybe that’s what the authors are interested in as well. If I could bet money in Vegas on the answer to that question, my bet would be in most cases that structures we may probe in the network and see in them correlations to aspects of the problem or task that we (as humans) can recognize, well very likely this will boil down to approximations of fundamental and eminently useful quantities like, say, approximate singular value decompositions of regions in the data manifold, or approximate eigenfunctions etc. I could see how these kind of empirical investigations are interesting, but what would their impact be? Another guess, that these investigations may lead to insights that help engineers design better architectures or incrementally improve training methods. But I think that’s about it - this type of research strikes me as engineering and application.

link

HarHarVeryFunny 865 days ago

Outside of pure interest - how these LLMs are working, the utility/impact of understanding them would be to be able to control them - how to remove capabilities you don't want them to have (safety), or perhaps even add capabilities, or just steer their behavior in some desirable way.

Pretty much everything about NNs is engineering - it's basically an empirical technology, not one that we have much theoretical understanding of outside of the very basics.

link

xanderlewis 865 days ago

> Pretty much everything about NNs is engineering - it's basically an empirical technology, not one that we have much theoretical understanding of outside of the very basics.

This pretty much answers the question some have asked: “why are the world’s preeminent mathematicians not working on AI if AGI will solve everything eventually anyway?”.

At least for now, the skills required to make progress in AI (machine learning as it largely is now) are those of an engineer rather than a mathematician.

link