| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by joegaebel 93 days ago

In my view, Spec-Driven systems are doomed to fail. There's nothing that couples the english language specs you've written with the actual code and behaviour of the system - unless your agent is being insanely diligent and constantly checking if the entire system aligns with your specs.

This has been solved already - automated testing. They encode behaviour of the system into executables which actually tell you if your system aligns or not.

Better to encode the behaviour of your system into real, executable, scalable specs (aka automated tests), otherwise your app's behaviour is going to spiral out of control after the Nth AI generated feature.

The way to ensure this actually scales with the firepower that LLMs have for writing implementation is ensure it follows a workflow where it knows how to test, it writes the tests first, and ensures that the tests actually reflect the behaviour of the system with mutation testing.

I've scoped this out here [1] and here [2].

[1] https://www.joegaebel.com/articles/principled-agentic-softwa... [2] https://github.com/JoeGaebel/outside-in-tdd-starter

5 comments

oakpond 93 days ago

Sort of agreed. Natural language specs don't scale. They can't be used to accurately model and verify the behavior of complex systems. But they can be used as a guide to create formal language specs that can be used for that purpose. As long as the formal spec is considered to be the ground truth, I think it can scale. But yeah, that means some kind of code will be required.. :)

j45 93 days ago

Things like Github's speckit seems to have a fair amount of usage.

The idea that specs are code now, is one can effectively rebuild in the future with newer models. Test requirements could be defined upfront in the specs too, no?

oakpond 92 days ago

I think natural language leaves too much room for ambiguities. If you treat it as code I expect you will run into frequent bugs and unintended side effects of LLM-authored changes as your software evolves. So I'm skeptical about this approach.

A formal language helps in this regard because it makes visible the inconsistencies that are hidden in the specifications.

Coding is difficult sometimes because it turns out the problem you are trying to solve is more difficult than expected (not because it's difficult to code).

j45 92 days ago

Sounds like this perspective is theoretical.

Been building for a long time, and more specifically overseeing building in detail, which transfers interestingly to overseeing LLMs.

Just like with coworkers, providing the right amount of context (not too much, or too little) for the request to succeed is critical.

I shared similar views, but I have seen first hand (using in production myself) that specs, well done in a way for LLMs, can do development with AI that works. If something doesn't work out, you don't fix the code, you adjust the spec. Highly recommend watching doers on Youtube who are sharing screens.

Discovering a problem is more difficult than expected allows you to take more shots at it, quicker by adjusting the spec, for example and running again. We are used to just plowing ahead to make the code right, instead of improving/clarifying the ask/spec.

oakpond 92 days ago

In my experience, when you sell expensive complex systems, customers are very worried about any differences in system behavior as a result of software updates.

When you implement a new feature with these tools, how do you convince yourself that existing system behavior remains unchanged?

When you have the code in front of you, atleast you can reason about the full system behavior before and after because code is unambiguous like that.

With spec driven development, the LLM can rewrite anything as long as it meets the spec. That's a problem if your customer relies on behavior that's written down ambiguously (or omitted entirely).

So, I think this is only going to work if you write specs with mathematical precision.. at which point you probably want to write them using a mathematical language.

j45 92 days ago

Appreciate learning from your perspective.

I've built, integrated and sold expensive complex systems. They want it working, connected, and reliable. Lots of paths there.

Have you built with LLMs? I'm asking because I would refer to things from having something working on a complex code base.

Specifications, or inputs in a way are a new code. The added focus on documentation, before and after is a bonus too, and also helps with alignment.

Code styles/formats/philosophies can be documented and followed.

The human process of what to look into, in what way, for what areas of the code base, can also be trained and remembered. There are ways to achieve and maintain precision without 100% mathematical precision, because there are only so many ways to solve a problem, or step and the mechanisms for deciding can also be defined in general, or specific.

internet_points 93 days ago

See also recent post "A sufficiently detailed spec is code" which tried and failed to reproduce openai's spec results: https://hn.algolia.com/?q=https%3A%2F%2Fhaskellforall.com%2F...

zby 93 days ago

Spec Driven Development is a curious term - it suggests it is a kind of, or at least in the tradition of, Test Driven Development but it goes in the opposite direction!

sveme 93 days ago

Don't understand this - you can go spec -> test -> implementation and establish the test loop. Bit like the v model of old, actually.

joegaebel 93 days ago

In my view, the problem with specs are:

1. Specs are subject to bit-rot, there's no impetus to update them as behaviour changes - unless your agent workflow explicitly enforces a thorough review and update of the specs, and unless your agent is diligent with following it. Lots of trust required on your LLM here.

2. There's no way to systematically determine if the behaviour of your system matches the specs. Imagine a reasonable sized codebase - if there's a spec document for every feature, you're looking at quite a collection of specs. How many tokens need be burnt to ensure that these specs are always up to date as new features come in and behaviour changes?

3. Specs are written in English. They're ambiguous - they can absolutely serve the planning and design phases, but this ambiguity prevents meaningful behaviour assertions about the system as it grows.

Contrast that with tests:

1. They are executable and have the precision of code. They don't just describe behaviour of the system, they validate that the system follows that behaviour, without ambiguity.

2. They scale - it's completely reasonable to have extensive codebases have all (if not most) of their behaviour covered by tests.

3. Updating is enforcable - assuming you're using a CI pipeline, when tests break, they must be updated in order to continue.

4. You can systematically determine if the tests fully describe the behaviour (ie. is all the behaviour tested) via mutation testing. This will tell you with absolute certainty if code is tested or not - do the tests fully describe the system's behaviour.

That being said, I think it's very valuable to start with a planning stage, even to provide a spec, such that the correct behaviour gets encoded into tests, and then instantiated by the implementation. But in my view, specs are best used within the design stage, and if left in the codebase, treated only as historical info for what went into the development of the feature. Attempting to use them as the source of truth for the behaviour of the system is fraught.

And I guess finally, I think that insofar as any framework uses the specs as the source of truth for behaviour, they're going to run into alignment problems since maintaining specs doesn't scale.

zby 93 days ago

SDD is about flowing the design choices from the spec into the rest of the system. TDD was for making sure that the inevitable changes you make to the system later don't break your earlier assumptions - or at least warn that you need to change them. Personally I don't buy TDD - it might be useful sometimes - but it is kind of extreme - but in general agile methodologies were a reaction to the waterfall model of system development.

anthonyrstevens 93 days ago

This is just one way to use TDD. I personally get the most value from TDD as a design approach. I iteratively decompose the project into stubbed, testable components as I start the project, and implement when I have to to get my tests to pass. At each stage I'm asking myself questions like "who needs to call who? with what data? What does it expect back as a return value?" etc.

j45 93 days ago

Specs see more about alignment and clarity increasing code that works, and increase the success of tests.

locknitpicker 93 days ago

> This has been solved already - automated testing.

This is specious reasoning. Automated tests are already the output of these specs, and specs cover way more than what you cover with code.

Framing tests as the feedback that drives design is also a baffling opinion. Without specialized prompts such as specs, you LLM agent of choice ends up either ignoring tests altogether or even changing them to fit their own baseless assumptions.

I mean, who hasn't stumbled upon the infamous "the rest of your tests go here" output in automated tests?

polytely 93 days ago

> Automated tests are already the output of these specs, and specs cover way more than what you cover with code.

ok but how are you sure that the AI is correctly turning the spec into tests. if it makes a mistake there and then builds the code in accordance with the mistaken test you only get the Illusion of a correct implementation

locknitpicker 93 days ago

> ok but how are you sure that the AI is correctly turning the spec into tests.

You use the specs to generate the tests, and you review the changes.

mattmanser 93 days ago

I've seen a few comments recently that start with:

This is specious reasoning

It's an insulting phrase and from now on I'm immediately down voting it when I see it.

nelox 93 days ago

On the face of it is insulting, until you dig a little deeper

locknitpicker 93 days ago

> It's an insulting phrase ( ...)

I'm sorry you feel like that. How would you phrase an observation where you find the rationale for an assertion to not be substantiated and supported beyond surface level?