|
|
|
|
|
by leach
306 days ago
|
|
I'm a little confused how these models run/fit onto VRAM. I have 32gb system RAM and 16gb VRAM. I can fit the 20b model all within vram, but then I can't increase the context window size past 8k tokens or so. Trying to max the context size leads to running out of VRAM. Can't it use my system ram as backup though? Yet I see other people with less resources like 10GB of vram and 32gb system ram fitting the 120b model onto their hardware. Perhaps its because ROCm isn't really supported by ollama for RDN4 architecture yet? I believe I'm using vulkan to currently run and it seems to use my CPU more than my GPU at the moment. Maybe I should just ask it all this. I'm not complaining too much because it's still amazing I can run these models. I just like pushing the hardware to its limit. |
|