|
|
|
|
|
by plipt
184 days ago
|
|
Thanks Was it being closed weight obvious to you from the article? Trying to understand why I was confused. Had not seen the "Flash" designation before Also 30B models can beat a semi-recent 235B with just some additional training? |
|
For the evals it's probably just trained on a lot of the benchmark adjacent datasets compared to the 235B model. Similar thing happened on other model today: https://x.com/NousResearch/status/1998536543565127968 (a 30B model trained specifically to do well in maths get near SOTA scores)