Hacker News new | ask | show | jobs
by bisonbear 108 days ago
assume you're referencing coding agents - I don't think people are. If they are, it's likely using

- AI to evaluate itself (eg ask claude to test out its own skill) - custom built platform (I see interest in this space)

I've actually been thinking about this problem a lot and am working on making a custom eval runner for your codebase. What would your usecase be for this?

1 comments

I'd love to hear more about what you're working on (if you're open to sharing!).

I like to play with knowledge base powered chatbots but what's most useful to me (and probably my primary use case) is coding agents since I use CC every day. Recently I just heard about Minimax m2.5 which apparently is a pretty good coding agent (they say it's comparable to opus 4.6) but I haven't tried it yet — plus it'd take a lot of time to figure out whether it's better or not.