| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by conwayanderson 307 days ago
	Only 2b active also - very fast

2 comments

lawlessone 307 days ago

Can run it on a phone then?

Seems like it could be somewhat useful for people with poor eyesight or blindness

link

conwayanderson 307 days ago

In terms of size yes, but I think it needs some work to get the model in the right format

couple people got it running on a raspberry pi though

link

apwell23 306 days ago

sorry what does it mean for only 2b to be active?

link

simonw 306 days ago

My understanding is that, while all 8B are loaded into memory, for each token inference step only 2B are selected and used - so tokens are produced faster because there is less computation needed.

Hoping someone will correct me if that's not the right mental model!

link