Hacker News new | ask | show | jobs
by conwayanderson 260 days ago
Only 2b active also - very fast
2 comments

Can run it on a phone then?

Seems like it could be somewhat useful for people with poor eyesight or blindness

In terms of size yes, but I think it needs some work to get the model in the right format

couple people got it running on a raspberry pi though

sorry what does it mean for only 2b to be active?
My understanding is that, while all 8B are loaded into memory, for each token inference step only 2B are selected and used - so tokens are produced faster because there is less computation needed.

Hoping someone will correct me if that's not the right mental model!