Hacker News new | ask | show | jobs
by richardblythman 313 days ago
If coding agents are the new entry point to your library, how sure are you that they’re using it well?

I asked this question to about 50 library maintainers and dev tool builders, and the majority didn't really know.

Existing code generation benchmarks focus mainly on self-contained code snippets and compare models not agents. Almost none focus on library-specific generation.

So we built a simple app to test how well coding agents interact with libraries: • Takes your library’s docs • Automatically extracts usage examples • Tasks AI agents (like Claude Code) with generating those examples from scratch • Logs mistakes and analyzes performance

We’re testing libraries now, but it’s early days. If you're interested: Input your library, see what breaks, spot patterns, and share the results below.

We plan to expand to more coding agents, more library-specific tasks, and new metrics. Let us know what we should prioritize next.

6 comments

> If coding agents are the new entry point to your library, how sure are you that they’re using it well?

> I asked this question to about 50 library maintainers and dev tool builders, and the majority didn't really know.

Why should they even bother to answer such a loaded and hypothetical question?

im paraphrasing. the questions i asked to dev tool builders were more neutral.
If making dev tooling is selling shovels to the miners, then this is like selling sheet metal to the shovel makers.
Yeah. Feels like a data mining operation for training data.

I could be wrong.

Note that this comment is not hijacking. The author of this comment is also the author of the post.
That's the more likely assumption. Accounts with only self-promotion spam activity have become more of a rule here than an exception.
IMO a tool like this doesn’t make sense until the hallucination problem is fixed
Let’s meet and see if it might make sense for us to team up. We’re working on this from the agent/library-specific-task side, and we might be better than chatgpt at marketing your product :)
Why do we need to log in?
we send out an email when the tests are finished (takes about 30 mins)
That makes you sound like you are dodging the question.
i mean that we wanted an email address to send the results to when they finish.

based on comments here, i do think we should allow users to run the audit first (and provide an email address if they want us to follow up with results later).