|
|
|
|
|
by frabcus
932 days ago
|
|
I see this slightly the other way round - the difficulties caused by tokenisation are why it is good at segmentation. Words break and jump around due to it, and more so with typos in the vast amounts of training data. Also regarding backtracking... It sees all the input at once, so not sure why it needs to backtrack? |
|