|
|
|
|
|
by wizzwizz4
323 days ago
|
|
Transformer models are excellent at translation, but next-token prediction is not the correct architecture for it. You want something more like seq2seq. Next token prediction cares more about local consistency (i.e., going off on a tangent with a self-consistent but totally fabricated "translation") than faithfulness. |
|