|
|
|
|
|
by Ritewut
3 days ago
|
|
Tokens per second. The difference between 8B and something like 16B is not as big as you might think in practical usage and 8B is a lot faster and interactive than 16B but there are certain things where it is useful to farm it out to the large model. |
|
Creating conversation titles and parsing HTML/JSON don't benefit from 27B models.
The B70 can run both models comfortably side-by-side so it makes better use of time and resources.