|
|
|
|
|
by nisten
486 days ago
|
|
This looks nice and flashy for an investor presentation, but practically I just need the thing to work off of an API or if it is all local to at least have vllm support so it doesn't take 10 hours to run a bench. The extra long documentation and abstractions for me personally are exactly what I DONT want to have in a benchmarking repo. I.e. what transformers version is this, will it support TGI v3, will it automatically remove thinking traces with a flag in the code or running command, will it run the latest models that need custom transformer version etc. And if it's not a locally runnable product it should at least have a public accessable leaderboard to submit oss models too or something. Just my opinion. I don't like it. It looks like way too much docs and code slop for what should just be a 3 line command. |
|
We actually designed it to make it easily work off any API. How it works is you just have to create a wrapper around your API and you're good to go. We take care of the async/concurrent handling of such benchmarking so the evaluation speed is really just limited by the rate limit of your LLM API.
This link shows what a wrapper looks like: https://docs.confident-ai.com/guides/guides-using-custom-llm...
And once you have your model wrapper setup, you can use any benchmark we provide.