|
|
|
|
|
by bisonbear
108 days ago
|
|
assume you're referencing coding agents - I don't think people are. If they are, it's likely using - AI to evaluate itself (eg ask claude to test out its own skill)
- custom built platform (I see interest in this space) I've actually been thinking about this problem a lot and am working on making a custom eval runner for your codebase. What would your usecase be for this? |
|
I like to play with knowledge base powered chatbots but what's most useful to me (and probably my primary use case) is coding agents since I use CC every day. Recently I just heard about Minimax m2.5 which apparently is a pretty good coding agent (they say it's comparable to opus 4.6) but I haven't tried it yet — plus it'd take a lot of time to figure out whether it's better or not.