|
|
|
|
|
by zamalek
473 days ago
|
|
39gb if you use a fp8 quantized model.[1] Remember that your OS might be using some of that itself. As far as I recall, Ollama/llama.cpp recently added a feature to page-in parameters - so you'll be able to go arbitrarily large soon enough (at a performance cost). Obviously more in RAM = more speed = more better. [1]: https://token-calculator.net/llm-memory-calculator |
|