Hacker News new | ask | show | jobs
by perkovsky 29 days ago
I like the “claim-driven” framing.

For stateful systems, tests named after setup details often get weakened over time. Tests named after the claim they are trying to falsify are harder to water down.

The part I’d be most interested in is how well this works for business invariants like idempotent posting, no lost acknowledgements and recovery after partial failure.

2 comments

I think all these scripts become poor where they're context based as opposed to actual guardrails; what we need is various silo'd protocols like a ssh protocol that keeps the harness producing work through the protocol rather than a bunch of loosely based bash scripts, etc. Plus, the harness needs to be outside the environment so it's not something you have to install ever on a remote system, whether it's a container, a vm, a ssh location. We shouldn't base everything around running bash without a secure tunnel into the location of interest.

The failure mode of these tools is self destructive in many cases.

Idempotency is what bites me most in practice — I've been driving these against an unreleased database I work on. The main trap is using the op_id as the idempotency key rather than a business key the client reuses on retry. When they're the same thing, the checker is trivially true and the test passes without testing anything.

No-lost-ack is conceptually the same shape with a simpler property (every acked write shows up at the end), but it breaks the same way most checkers break — if the recorder treats timeouts as success or failure instead of "unknown," real lost writes silently disappear.

Recovery after partial failure is where the AI-agent angle gets shaky honestly. Quiescence is the hard part. Agents will declare a system "recovered" while compaction is still running in the background. The skill forces a three-part check (no in-flight ops, no pending background work, replicas converged) before the invariant runs. How reliably that holds up against a specific SUT, I'm still figuring out.