| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ddren 1193 days ago
	They have recently merged support for x86. I get 230ms/token on the 13B model on a 8 core 9900k under WSL2.

1 comments

qumpis 1193 days ago

What's your ram usage for this?

link

JoeMattie 1193 days ago

I've got the (4-bit quantized) 65B param model running at somewhat acceptable speed on an i9-7900. It uses around 55GB of RAM.

link

ddren 1193 days ago

The (quantized) 13B model is 7.6 GB on disk and the program uses around 8 GB to run. It runs without hitting the swap with just 9 GB assigned to WSL2.

link