| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wrs 167 days ago
	Now that we know code is a killer app for LLMs, why would we keep tokenizing code as if it were human language? I would expect someone's fixing their tokenizer to densify existing code patterns for upcoming training runs (and make them more semantically aligned).