|
|
|
|
|
by tarruda
876 days ago
|
|
> You would need multiple GPUs with shared memory if you wanted to offload the higher precision models to VRAM. Or just a powerful apple silicon machine? I've tried dolphin mixtral 4bit on a 36gb ram MacBook m3, and inference is super fast. |
|