|
|
|
|
|
by codesushi42
2414 days ago
|
|
This. Exactly this. No sophisticated tokenization. No interesting architecture using attention. And the author is completely clueless about overfitting... and even cross entropy loss. He could have gotten better results just using a bag of words approach. But this ends up on frontpage anyway. Welcome to HN. |
|