|
|
|
|
|
by brucethemoose2
1085 days ago
|
|
I mean... It depends? You are just trying to host a llama server? Matching the VRAM doesn't necessarily matter, get the most you can afford on a single card. Splitting beyond 2 cards doesn't work well at the moment. Getting a non Nvidia card is a problem for certain backends (like exLLaMA) but fine for llama.cpp in the near future. AFAIK most backends are not pipelined, the load jumps sequentially from one GPU to the next. |
|