20/33 layers offloaded to GPU, 4K context. Uses 25GB system RAM and all 24GB VRAM. 5-7 tokens per second.
Thanks for making me feel better about investing in tht motherboard + CPU + RAM upgrade and deferring the GPU upgrade.
I am not sure local models have any future other than POC/research. Depends on the cost of course.
Thanks for making me feel better about investing in tht motherboard + CPU + RAM upgrade and deferring the GPU upgrade.