| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Arech 114 days ago
	That's what I thought of too. Given their task formulation (they basically said - "check these binaries with these tools at your disposal" - and that's it!) their results are already super impressive. With a proper guidance and professional oversight it's a tremendous force multiplier.

1 comments

selridge 114 days ago

We are in this super weird space where the comparable tasks are one-shot, e.g. "make me a to-do app" or "check these binaries", but any real work is multi-turn and dynamically structured.

But when we're trying to share results, "a talented engineer sat with the thread and wrote tests/docs/harnesses to guide the model" is less impressive than "we asked it and it figured it out," even though the latter is how real work will happen.

It creates this perverse scenario (which is no one's fault!) where we talk about one-shot performance but one-shot performance is useful in exactly 0 interesting cases.

NitpickLawyer 114 days ago

Something I found useful is to "just figure it out" the first part (usually discovery, or library testing, new cli testing, repo understanding, etc.) and then distill it into "learnings" that I can place in agents.md or relevant skills. So you get the speed of "just prompt it" and the repeatability of having it already worked in this area. You also get more insight into what tasks work today, and at what effort level.

Sometimes it feels like it's not dissimilar to spending 4 hours to automate a 10 minute task that I thought I'll need forever but ended up just using it once in the past 5 months. But sometimes I unlock something that saves a huge amount of time, and can be reused in many steps of other projects.