|
|
|
|
|
by 3d27
843 days ago
|
|
This is great. I'm also building an LLM evaluation framework with all these benchmarks integrated in one place so anyone can go benchmark these new models on their local setup in under 10 lines of code. Hope someone finds this useful: https://github.com/confident-ai/deepeval |
|