| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by brokensegue 848 days ago
	I'm just looking for ballpark figures. Maybe a common aws instance type

2 comments

notum 848 days ago

Not sure if this is of any value to you, but Ryzen 7 generates 2 tokens per second for the 7B-Instruct model.

The model itself is very unimpressive and I see no reason to play with it over the worst alternative from Hugging Face. I can only imagine this was released for some bizarre compliance reasons.

link

brokensegue 848 days ago

the metrics suggest it's much better than that

link

janwas 847 days ago

For the 7B IT and a short factual query I see 5.3 tps on a 5 year old Skylake Gold 6154 CPU @ 3.00GHz, 16 threads. Expect a slight increase as we improve scalability.

FYI using the NUQ (4.5-bit) quantization improves throughput by about 1.4x.

link