Hacker News new | ask | show | jobs
by Translationaut 1098 days ago
Those minified models are still equal or bigger compared to the initial "attention is all you need" transformer.