|
|
|
|
|
by deepsquirrelnet
151 days ago
|
|
This is just evaluation, not “benchmarking”. If you haven’t setup evaluation on something you’re putting into production then what are you even doing. Stop prompt engineering, put down the crayons. Statistical model outputs need to be evaluated. |
|