| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by EastLondonCoder 137 days ago

This matches my experience, especially "don’t draw the owl" and the harness-engineering idea.

The failure mode I kept hitting wasn’t just "it makes mistakes", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX).

What ended up working for me was treating chat as where I shape the plan (tradeoffs, invariants, failure modes) and treating the agent as something that does narrow, reviewable diffs against that plan. The human job stays very boring: run it, verify it, and decide what’s actually acceptable. That separation is what made it click for me.

Once I got that loop stable, it stopped being a toy and started being a lever. I’ve shipped real features this way across a few projects (a git like tool for heavy media projects, a ticketing/payment flow with real users, a local-first genealogy tool, and a small CMS/publishing pipeline). The common thread is the same: small diffs, fast verification, and continuously tightening the harness so the agent can’t drift unnoticed.

5 comments

protocolture 137 days ago

>The failure mode I kept hitting wasn’t just "it makes mistakes", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX).

Yeah I would get patterns where, initial prototypes were promising, then we developed something that was 90% close to design goals, and then as we try to push in the last 10%, drift would start breaking down, or even just forgetting, the 90%.

So I would start getting to 90% and basically starting a new project with that as the baseline to add to.

link

ricardobeat 137 days ago

No harm meant, but your writing is very reminiscent of an LLM. It is great actually, there is just something about it - "it wasn't.. it was", "it stopped being.. and started". Claude and ChatGPT seem to love these juxtapositions. The triplets on every other sentence. I think you are a couple em-dashes away from being accused of being a bot.

These patterns seem to be picking up speed in the general population; makes the human race seem quite easily hackable.

link

pixl97 137 days ago

>makes the human race seem quite easily hackable.

If the human race were not hackable then society would not exist, we'd be the unchanging crocodiles of the last few hundred million years.

Have you ever found yourself speaking a meme? Had a catchy toon repeating in your head? Started spouting nation state level propaganda? Found yourself in crowd trying to burn a witch at the stake?

Hacking the flow of human thought isn't that hard, especially across populations. Hacking any one particular humans thoughts is harder unless you have a lot of information on them.

link

direwolf20 137 days ago

How do I hack the human population to give me money, and simultaneously, hack law enforcement to not arrest me?

link

WA 137 days ago

> How do I hack the human population to give me money

Make something popular or become famous.

> hack law enforcement to not arrest me

Don't become famous with illegal stuff.

The hack is that we live in a society that makes people think they need a lot of money and at the same time allows individuals to accumulate obscene amounts of wealth and influence and many people being ok with that.

link

paulryanrogers 136 days ago

> Don't become famous with illegal stuff.

Is this still a constraint? It would seem that money beyond a certain point allows one to wash ones reputation.

link

WA 136 days ago

For sure. But becoming famous for robbing the bank is not the way.

link

bdangubic 137 days ago

This is the most common answer from people that are rocking and rolling with AI tools but I cannot help but wonder how is this different from how we should have built software all along. I know I have been (after 10+ years…)

link

EastLondonCoder 137 days ago

I think you are right, the secret is that there is no secret. The projects I have been involved with thats been most successful was using these techniques. I also think experience helps because you develop a sense that very quickly knows if the model wants to go in a wonky direction and how a good spec looks like.

With where the models are right now you still need a human in the loop to make sure you end up with code you (and your organisation) actually understands. The bottle neck has gone from writing code to reading code.

link

sksisksbbs 137 days ago

> The bottle neck has gone from writing code to reading code.

This has always been the bottleneck. Reviewing code is much harder and gets worse results than writing it, which is why reviewing AI code is not very efficient. The time required to understand code far outstrips the time to type it.

Most devs don’t do thorough reviews. Check the variable names seem ok, make sure there’s no obvious typos, ask for a comment and call it good. For a trusted teammate this is actually ok and why they’re so valuable! For an AI, it’s a slot machine and trusting it is equivalent to letting your coworkers/users do your job so you can personally move faster.

link

miyuru 137 days ago

This is what I experienced as well.

these are some ticks I use now.

1. Write a generic prompts about the project and software versions and keep it in the folder. (I think this getting pushed as SKIILS.md now)

2. In the prompt add instructions to add comments on changes, since our main job is to validate and fix any issues, it makes it easier.

3. Find the best model for the specific workflow. For example, these days I find that Gemini Pro is good for HTML UI stuff, while Claude Sonnet is good for python code. (This is why subagents are getting popluar)

link

apitman 137 days ago

Would love to hear more about your geneology app.

link