Hacker News new | ask | show | jobs
by PeterStuer 919 days ago
They have postprocessed the models specifically for size and latency. They published several papers on this.

Their optimized models are not downloaded from HF, but from dropbox. I have no idea why.