|
|
|
|
|
by joaogui1
760 days ago
|
|
I would say 2 big problems are: 1. latency, which would get worse if you have to sequentially generate more output 2. These models very roughly turn tokens -> "average meaning" on the embedding layer, followed by attention layers that combine the meanings, and feed forward layers that match the current meaning combination to some kind of learned archetype/prototype almost. When you move from word parts to characters all of that becomes more confusing (what's the average meaning of a?) and so I don't think there are good enough techniques to learn character-based models yet |
|