|
|
|
|
|
by entilzha
551 days ago
|
|
(Author Here) Good description! Maybe what parent got mixed up on is an alternate way to view this is trying to chunk bytes to have roughly similar information. EG we initially tried a bunch of patching schemes, EG, keep a running total of entropy until the total exceeds a threshold, but ended up finding simple things worked better. I’ll see if we can add more information about the small CNN in a next update to arXiv paper. |
|
https://aclanthology.org/Y03-1017/ https://aclanthology.org/I05-1009/ https://aclanthology.org/P06-2056/
Exactly the same approach of segmenting a word when the entropy goes up compared to the previous byte.