| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cma 152 days ago
	The TPU implementation used approximate top-k instead of the exact used on nvidia. While that wouldn't matter too much and there was a bug with it, it still was a cost savings thing not to use exact from the beginning because it wasn't efficient on TPUs which they were routing to under load. So it was a bit of a model difference under load, even aside from the bug.

1 comments

To the extent this is an accurate characterization (somewhat, I think), they considered the quality difference a bug and fixed it!