|
|
|
|
|
by rgbrgb
998 days ago
|
|
fwiw I get more like 35-40 tokens/sec on my m1 macbook with a 7B model. That's way faster than I can read or skim. If we can figure out how to focus the expertise in small models, I don't see why it wouldn't be viable for those of us that don't want to share all of our convos with big tech. |
|