Hacker News new | ask | show | jobs
by hedora 11 days ago
My current rule of thumb is 1GB gets you 1B parameters with a big context. (Qwen 32B fits in 32GB with 200K+ contexts)

That’s with heavy compression of the weights and the context, of course.

I haven’t gone through model evaluation + shoehorning at 128GiB yet.