Y
Hacker News
new
|
ask
|
show
|
jobs
by
valvar
1113 days ago
That much good RAM in itself isn't super expensive. So does the rest of the hardware have to be particularly powerful?
2 comments
esperent
1113 days ago
It has to be GPU RAM from my understanding, unless you're happy to wait several minutes/hours for each response.
link
GaggiX
1113 days ago
From what I can find online LLAMA-65B 4-bit quantized can run 1 token/s on a Ryzen 7 3700X (using llama.cpp).
link