Hacker News new | ask | show | jobs
by ddren 1193 days ago
They have recently merged support for x86. I get 230ms/token on the 13B model on a 8 core 9900k under WSL2.
1 comments

What's your ram usage for this?
I've got the (4-bit quantized) 65B param model running at somewhat acceptable speed on an i9-7900. It uses around 55GB of RAM.
The (quantized) 13B model is 7.6 GB on disk and the program uses around 8 GB to run. It runs without hitting the swap with just 9 GB assigned to WSL2.