|
|
|
|
|
by gmueckl
530 days ago
|
|
Intuitively, I wouldn't expect a wrong answer to show up that easily if the network was overfitted to that particular input token sequence. The questions as I understand it is whether the network learned enough of a simulacrum of the concept of weight to answer similar questions correctly. |
|