Hacker News new | ask | show | jobs
by talldayo 623 days ago
According to this, you should be able to leverage multi-GPU machines using the stock-and-standard Llama.cpp: https://github.com/ggerganov/llama.cpp/pull/1703