|
|
|
|
|
by HarHarVeryFunny
818 days ago
|
|
Sure, but their title seems poorly chosen and doesn't match what they are claiming in the article itself, which includes understanding how GPT-2 makes it's predictions. How does GPT-2 learn, for example, that copying a word from way back in the context helps it to minimize the prediction error? How does it even manage to copy a word from the context to the output? We know that it is minimizing prediction errors, and learned to do so via gradient descent, but HOW is it doing it? (we've discovered a few answers, but it's still a research area) |
|