| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by p12tic 479 days ago
	For all intents and purposes cache may not exist when the working set is 17B or 109B parameters. So it's still better that less parameters are activated for each token. 17B parameters works ~6x faster than 109B parameters just because less data needs to be loaded from RAM.

1 comments

TOMDM 479 days ago

Yes loaded from RAM and loaded to RAM are the big distinction here.

It will still be slow if portions of the model need to be read from disk to memory each pass, but only having to execute portions of the model for each token is a huge speed improvement.

link

mlyle 479 days ago

It's not too expensive of a Macbook to fit 109B 4-bit parameters in RAM.

link

utopcell 479 days ago

Is a 64GiB RAM Macbook really that expensive, especially compared against NVidia GPUs?

link

mlyle 479 days ago

That's why I said it's not too expensive.

link

utopcell 479 days ago

Apologies, I misread your comment.

link