| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ben_w 1191 days ago
	My wild guess is that if it could get things done by tokenising like that all the time, they wouldn't need to also have word-like tokens. If that is a inference time performance or training time performance or a model size issue or just total nonsense, I wouldn't know.