|
|
|
|
|
by XCSme
85 days ago
|
|
Oh, I didn't think about this, that's a good idea. I also feel generally model performance changes over time (usually it gets worse). The problem with doing this is cost. Constsntly testing a lot of models on a large dataset can get really costly. |
|
I was thinking that tokens spent in such case could also be an interesting measure, but some agent can do small useful refactoring. Although prompt could specify to do the minimal change required to achieve the goal.