Hacker News new | ask | show | jobs
by ForOldHack 1085 days ago
I have an 8GB, and I am considering two more 8GB, it should I get a single 16GB? The 8GB card was donated, and we need some pipelining... I have 10~15 2GB quadro cards... Apparently useless.
1 comments

I mean... It depends?

You are just trying to host a llama server?

Matching the VRAM doesn't necessarily matter, get the most you can afford on a single card. Splitting beyond 2 cards doesn't work well at the moment.

Getting a non Nvidia card is a problem for certain backends (like exLLaMA) but fine for llama.cpp in the near future.

AFAIK most backends are not pipelined, the load jumps sequentially from one GPU to the next.