|
|
|
|
|
by trifurcate
1176 days ago
|
|
In my experience, the smaller models are almost completely worthless as-is. 65B is the only decent one (I'd say just behind gpt-3.5-turbo, and obviously it's not instruction tuned but I mean the coherency of the core language model), and understandably people aren't really paying attention or devoting much resources to the largest one. 30B shows promise for specific tasks with fine tuning, but 7B and 13B are just toys. |
|