|
|
|
|
|
by kevingadd
1162 days ago
|
|
If you familiarize yourself with ideographic/ideographic-adjacent languages like Japanese or Chinese you will probably notice that they are way more efficient than English. Yet those languages pay a tokenization tax too (thanks in no small part to the decisions of the largely western Unicode committees to favor western character sets - the UTF8 encoding favors ASCII tremendously) |
|
But when it comes to Chinese...something weird is going on.