|
|
|
|
|
by cypress66
1020 days ago
|
|
1.1B with 3T tokens will never be comparable to 7B with 2T tokens. And I'm not sure what you mean by inference latency being infeasible. Most people using thsss models at home don't even bother with the 7B and go straight to 13B because it's easy to run too and much smarter. And any cloud gpu can run 13B. |
|