Hacker News new | ask | show | jobs
by zamalek 473 days ago
39gb if you use a fp8 quantized model.[1] Remember that your OS might be using some of that itself.

As far as I recall, Ollama/llama.cpp recently added a feature to page-in parameters - so you'll be able to go arbitrarily large soon enough (at a performance cost). Obviously more in RAM = more speed = more better.

[1]: https://token-calculator.net/llm-memory-calculator