|
|
|
|
|
by matt4711
770 days ago
|
|
A paper [1] we wrote in 2015 (cited by the authors) uses some more sophisticated data structures (compressed suffix trees) and Kneser–Ney smoothing to get the same "unlimited" context. I imagine with better smoothing and the same larger corpus sizes as the authors use this could improve on some of the results the authors provide. Back then neural LMs were just beginning to emerge and we briefly experimented with using one of these unlimited N-gram models for pre-training but never got any results. [1] https://aclanthology.org/D15-1288.pdf |
|