| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by arjie 46 days ago
	Character-density and token-efficiency are different things. Latter is data and, therefore, tokenizer specific e.g. take GPT-5's tokenizer o200k_base and run mandarin text and its translation through. Some amount of the time en will beat zh. I just tested with news articles and wikipedia. After all `def func():` is only 3 tokens on o200k_base.