| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by richardblythman 313 days ago

If coding agents are the new entry point to your library, how sure are you that they’re using it well?

I asked this question to about 50 library maintainers and dev tool builders, and the majority didn't really know.

Existing code generation benchmarks focus mainly on self-contained code snippets and compare models not agents. Almost none focus on library-specific generation.

So we built a simple app to test how well coding agents interact with libraries: • Takes your library’s docs • Automatically extracts usage examples • Tasks AI agents (like Claude Code) with generating those examples from scratch • Logs mistakes and analyzes performance

We’re testing libraries now, but it’s early days. If you're interested: Input your library, see what breaks, spot patterns, and share the results below.

We plan to expand to more coding agents, more library-specific tasks, and new metrics. Let us know what we should prioritize next.

6 comments

bdhcuidbebe 313 days ago

> If coding agents are the new entry point to your library, how sure are you that they’re using it well?

> I asked this question to about 50 library maintainers and dev tool builders, and the majority didn't really know.

Why should they even bother to answer such a loaded and hypothetical question?

link

richardblythman 313 days ago

im paraphrasing. the questions i asked to dev tool builders were more neutral.

link

justonceokay 313 days ago

If making dev tooling is selling shovels to the miners, then this is like selling sheet metal to the shovel makers.

link

grim_io 313 days ago

Yeah. Feels like a data mining operation for training data.

I could be wrong.

link

dotancohen 313 days ago

Note that this comment is not hijacking. The author of this comment is also the author of the post.

link

add-sub-mul-div 313 days ago

That's the more likely assumption. Accounts with only self-promotion spam activity have become more of a rule here than an exception.

link

mxkopy 313 days ago

IMO a tool like this doesn’t make sense until the hallucination problem is fixed

link

weitendorf 313 days ago

Let’s meet and see if it might make sense for us to team up. We’re working on this from the agent/library-specific-task side, and we might be better than chatgpt at marketing your product :)

link

spankalee 313 days ago

Why do we need to log in?

link

richardblythman 313 days ago

we send out an email when the tests are finished (takes about 30 mins)

link

grim_io 313 days ago

That makes you sound like you are dodging the question.

link

richardblythman 313 days ago

i mean that we wanted an email address to send the results to when they finish.

based on comments here, i do think we should allow users to run the audit first (and provide an email address if they want us to follow up with results later).

link