| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ethan_smith 352 days ago
	Typically, multilingual capabilities consume 20-30% of model parameters in small LLMs, primarily in token embeddings and early transformer layers. Monolingual variants of similar models often perform better on English benchmarks with the same parameter count.