Hacker News new | ask | show | jobs
by cma 152 days ago
The TPU implementation used approximate top-k instead of the exact used on nvidia. While that wouldn't matter too much and there was a bug with it, it still was a cost savings thing not to use exact from the beginning because it wasn't efficient on TPUs which they were routing to under load. So it was a bit of a model difference under load, even aside from the bug.
1 comments

To the extent this is an accurate characterization (somewhat, I think), they considered the quality difference a bug and fixed it!