|
|
|
|
|
by wkat4242
779 days ago
|
|
Nope. You can run the model fine but if you actually want to take advantage of the big context window the memory usage will grow enormously. For the 256k they already require 64GB... So for this I guess 256GB? Source: https://ollama.com/library/dolphin-llama3:256k > Note: using a 256k context window requires at least 64GB of memory. If I run that 256k model with simple typed prompts it behaves the same as the normal version. But I have to be careful how much I stick in it. I only have 24GB in my GPU. There doesn't seem to be any drawback running the 256k version for small contexts though. That's pretty nice. The only thing is that it will get stuck when it runs out of memory (it just keeps twirling with the GPU pegged at 100%). With the regular model that won't happen because it will just get amnesia and just remember the last part of the context. |
|