Hacker News new | ask | show | jobs
by Havoc 974 days ago
Why quantize something that is already very small (270mb)?
1 comments

Just making up stuff here, but smaller models are great for serverless compute like functions, which would also benefit from lighter computation. Don't forget, some people are dealing with hundreds of millions of documents. Accelerating this by 4x may be worth a small performance hit.