|
|
|
|
|
by sirlapogkahn
226 days ago
|
|
We’ve tried geval but it hasn’t been super useful in practice. If we run the same input on the same model and same geval 10 times we get significantly different results, so you can’t really arrive at any conclusions based on the results. |
|