| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by viscanti 1161 days ago
	> GPT-4's tokenizer is already far more efficient though still weighted to English. Right. It's a general question. Should they be allowed to take the kinds of optimizations they can with tokenization when it's a function of how much data they can use, even if that means some languages get more optimization than others? Or should users of those languages that could be optimized effectively pay a tax out of some sense of fairness?