| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zamalek 473 days ago

39gb if you use a fp8 quantized model.[1] Remember that your OS might be using some of that itself.

As far as I recall, Ollama/llama.cpp recently added a feature to page-in parameters - so you'll be able to go arbitrarily large soon enough (at a performance cost). Obviously more in RAM = more speed = more better.

[1]: https://token-calculator.net/llm-memory-calculator