|
|
|
|
|
by Kostchei
216 days ago
|
|
We have 20+ services in prod that use llms. So I have 50k (or more) per service per day of data to evaluate. The question is- do people actually evaluate properly. And how do you do an apples to apples evaluation of such squishy services? |
|