|
|
|
|
|
by nabakin
494 days ago
|
|
I'm not so sure it's impressive even for mathematical tasks. When ChatGPT came out, there was a flood of fine-tuned LLMs claiming ChatGPT-level performance for a fraction of the size. Every single time this happened, it was misleading. These LLMs were able to score higher than ChatGPT because they took a narrow set of benchmarks and fine-tuned for those benchmarks. It's not difficult to fine-tune an LLM for a few benchmarks, cheaply and beat a SOTA generalist LLM at that benchmark. Comparing a generalist LLM to a specialist LLM is like comparing apples to oranges. What you want is to compare specialist LLMs to other specialist LLMs. It would have been much more interesting and valuable if that was done here. Instead, we have a clickbait, misleading headline and no comparisons to math specialized LLMs which certainly should have been performed. |
|