|
|
|
|
|
by ldjkfkdsjnv
1022 days ago
|
|
I dont trust any benchmarks for any LLM thats not coming from FB, Google, OpenAI, Anthropic, or Microsoft. These models are so dynamic, the simple benchmark numbers never tell the whole story of the quality of the model. Take for instance, a recent posting by anyscale, claiming their fine tuning of Llama 2 was competitive with OpenAI's model. The reality being their fined tuned model is basically worthless, and was competitive along a single metric/very narrow commoditized task. Its a great way to get clicks by posting these metrics though |
|
I have a feeling that the more robust models might be the ones that don’t perform best on benchmarks right away.