I've heard a number of people say (from earlier) that the quantization and default sampling parameters is way wacked. Honestly even running that model size alone is the big achievement here and getting the accuracy to actually reach the benchmark is the beeg next step nao, I believe. <3 :'))))
It does not seem fine.
It is incomprehensible and doesn’t match the results I’ve seen from 7B through 65B.
It is true that RLHF could improve it, and perhaps then this severe of optimization will seem fine.