|
|
|
|
|
by xodn348
97 days ago
|
|
Really interesting approach — attacking token efficiency at the encoding level is more fundamental than what I did. Even without retraining BPE from scratch, starting with YUTF-8 and measuring how existing tokenizers handle it would already be a worthwhile experiment. Hope you find the time to build it, good luck! |
|