|
|
|
|
|
by egorfine
1175 days ago
|
|
Hey it seems that UTF-8 support is broken on the page. Test phrase could be something like "Жизнь прекрасна и удивительна" ("Life is great" in russian). I make an assumption that this is the implementation on the page that is broken, not the actual tokenizer. The reason: russian works perfectly in GPT-3 which I guess wouldn't be the case with a tokenization as presented on the page. |
|