Hacker News new | ask | show | jobs
by willy_k 315 days ago
Yes they do, if the model size / vram requirement keeps shrinking for a given performance target, like has been happening, then it gets cheaper to run X level of model.