Hacker News new | ask | show | jobs
by brendanmc6 54 days ago
Author here, if you don't want to read all that, I'll post one excerpt that I think sums it up nicely:

> My point is, the spec must live somewhere, even if you don’t write it down. The spec is what you want the software to be. It often exists only in your head or in conversations. You and your team and your business will always care what the spec says, and that’s never going to change. So you’re better off writing it down now! And I think that a plain old list of acceptance criteria is a good place to start. (That’s really all that `feature.yaml` is.)

16 comments

The traditional name for this spec is ‘source code’ — a canonical source of truth for the behaviour of a system that is as human-readable as we know how to make it, that will be processed by automated tools into a less-readable derived artefact for a computer to execute.

Checking the compiled artefact into the codebase without checking in its source code has always been a risky move!

A specification, whether formal or less formal, is very different from the source code.
But it is also always less specific than source code, even if the attempt is to dictate the latter as close as possible.
I agree with you, I think the replies are misunderstanding the basis for code and specs and making semantic distinctions. Code is specs, just in a different syntax for machines to understand. This is a pillar of the discipline of the discipline of requirement specifications that Uncle Bob talks about in Clean Agile.
Technology evolves and traditions change. What persists is the role, not the filename and its extension. Weddings are still weddings even after things went from painted portraits to film cameras to camcorders to smartphones to livestreams. Same with birthdays. Cards became phone calls, Facebook wall posts, group chats, shared albums, or generated videos (Sora, RIP).

The tradition of having a deck of punch cards evolved to having assembly, to Pascal, Fortran, C, basic. The important part is a human-auditable directive, not an opaque, generated artifact as the thing that matters.

have evolved and adapted. Photography, film cameras, polaroids, camcorders, digital cameras, smartphones, social media, Zoom/virtual attendees. Same with birthdays. Handwritten cards, to phone calls to e-cards, Facebook wall posts, video calls, shared photo albums and Sora (RIP) videos.

> The important part is a human-auditable directive, not an opaque, generated artifact as the thing that matters.

Your arguments create a false dichotomy. You look at it from consumer perspective, while coding and it's artifacts are usually done by suppliers. If you change camcorder to tv advertisement, the requirements shift. The human auditable directive and the outcome matter. Coca Cola probably has very high standards for their IP (the directive) and doesn't care about the outcome (AI slop ads). The result is disgruntled consumers.

If you don't care about the "opaque" generated artifact, then you are Coca Cola.

As far as I understand it I'm on your side in this argument — I think ‘code’ continues to matter so long as the LLM-to-execution pipeline remains a leaky abstraction — but I don't think the analogy is correct. The ads are the resultant behaviour of the software, e.g. the UX, not the code.
>The traditional name for this spec is ‘source code’

Specs are the end goal, not how the software look at a moment in time.

Specs also evolve over time. There's no ‘end goal’ because requirements are always changing.

Specs are traditionally more forward-looking only because, by removing a lot of the implementation details that are required to write code, the specification can be written to be much broader in scope than code in an equivalent time period. But periodically we invent software that lets us automatically fill in more details of the software that now don't need to be specified by humans, and a level of specification that was previously ‘spec’ turns into ‘code’.

spec isn't code. There's a C language specification and many implementations. There are a handful of browsers each implementing HTML, JS, and CSS specs in their own way.
And given a C description of a program, a C runtime can implement that program in various different ways — interpreted vs compiled, explicit memory management vs garbage collection, different pointer sizes and memory layouts, parallelism at various points or not. It's turtles all the way down :) It just becomes ‘code’ at the point where a computer can execute it (in one way or another) without further human intervention.
The source code is not the specification, the source code is an implementation of the specification. The specification tells you what happens, the source code tells you how it happens. Ideally you also have some additional documentation for the why.
As any four-year-old can tell you, ‘why’ is infinitely recursive. ‘What’ from the perspective of level n is ‘how’ looking down from level n+1 and ‘why’ looking up from level n-k.
That usually does not matter in practice because you quickly reach a level of sufficient understanding.

We usually use UUIDs for this type of object but we have to send those objects to the legacy system XYZ, which only supports IDs with up to sixteen characters and is case insensitive, so we generate sixteen character random alphanumeric strings with uppercase letters which provides 82 bits of entropy.

Could you go deeper? Sure. Why do we have to send those objects to XYZ? Why does the legacy system still exist? Why does it not support UUIDs? Why is there no secondary key specifically for that system? Why are we using UUIDs?

But most likely you do not have to spell all those out. The point of a why is to explain why something is not what one would expect, you explain on top of some common knowledge. Everyone involved might know what XYZ does and why some objects have to get send there. If not, that is probably written down elsewhere. Why is the system using UUIDs? Maybe written down in the design for the persistence layer.

Sure, I'm not suggesting we need to go into infinite regress for every explanation! I'm saying that you should bear in mind that you _are_ in the middle of an infinite stack, and what is a ‘how’, a ‘what’ or a ‘why’ is just a function of your current position in it relative to the thing you're talking about. In the ID generation code you might want to explain why you're using this weird format here instead of a more standard format (because it needs to be passed to legacy system XYZ). But if you go up a step or two to where the ID is passed to XYZ in code, that ‘why’ has become a ‘what’ — the calling code acts as a ‘specification’ for the behaviour of that ID generation code.
I independently converged on something similar. I use two to three specification docs for my c++ work: a firmware manual (describes features and interfaces)) , an implementation plan (order of implementation, mechanisms where specified - new features go in here) and a product manual ( user story, external effects) I start with a user story, build an implementation plan, write the code, write the firmware manual, check the 3 documents +code for consistency and coherence. Either change the code or the documentation to reflect a coherent unified truth. (Implementation plan gradually becomes as-built) I also have the code comprehensively commented so that it is difficult to misinterpret. “Correct, coherent, consistent, commented”

We iterate feature by feature through this process, and occasionally circle back on the original product manual to identify drift.

After the original documentation is drafted, I have the agent write up placeholder files and define all of the interfaces we expect to need (we will end up adding a lot later, but that’s ok) every file should reflect a clear separation of concerns, and can only be reached into through its defined interface, all else is private. I end up with more individual files than I would by hand, but by constraining scope at file granularity, and defining an inviolate interface per file, I avoid the LLM tendency to take shortcuts that create unmaintainable code.

I also open each new context with an onboarding process that briefly describes the logos and the ethos of the project, why the agent should be deeply invested in the success of the project, as well as learnings.md which the agent writes as it comes across notable gotchas or strong preferences of mine.

Needless to say, I use one million context , and it’s a token fire… but the results are solid and my productivity is 5-10x

I wrote something similar recently about how agent-generated code lacks the institutional memory that human-written code has. There's nobody to ask why a decision was made (1).

“Specsmaxxing” is basically the right response to this. When you can't rely on authorial memory, you have to put the intent somewhere durable. Specs become the source of truth by default if we continue down the road of AI generated code.

1: https://ossature.dev/blog/ai-generated-code-has-no-author/

I've been attaching to my commit messages a Git Trailer [1] of the Session UUID from the Claude Code conversation that created that commit.

It allows Claude to look back into the session where a change was made and see the decisions made, tradeoffs discussed and other history not captured by code, tests.

[1] https://git-scm.com/docs/git-interpret-trailers

A few questions:

- Does Claude leverage the trailers automatically, or is usage initiated by you?

- How often are you using the trailer lookups?

- Any idea how this relates to token usage? If you're frequently busting cache on old sessions, it might be cheaper to read a local doc.

> Does Claude leverage the trailers automatically, or is usage initiated by you?

Trailers hint is in my global CLAUDE.md so it knows: when debugging, saying something like "didn't we already discuss this in a previous session?" it will know what to look for.

I also have a manually invoked `/search-session-transcripts` that I can use to natural-language inspect previous session by day, project, session id etc. Claude often uses this skill to narrow down on parts of the conversation that are relevant to the current query.

> How often are you using the trailer lookups?

Mondays are usually the day I need to refer to previous sessions from the week before. Trailer lookups are also good for continuing buildout of adjacent features. They've also been excellent in incident post-mortems where the PR text and commit message aren't enough to gauge the "how" of decisions that led to issues.

> Any idea how this relates to token usage?

I tested this. Session-transcripts are append-only so `/clear` and `/compact` don't clear out old messages, they stay stable and accessible. I also don't clean out my `~/.claude/sessions` ever so there's a lot in there, but the info is valuable and cheap.

Nice, thanks for sharing.
I had a similar experience refactoring a large codebase• The only thing that made it possible was that each commit message had a JIRA ticket number tying it to a requirement or task. I could find the people behind the business logic and ask them about it.
the recursive-mode workflow has full traceability, including why decisions were made, what the original requirement was, what the previous state was, etc. https://recursive-mode.dev/introduction
You have rediscovered the job of Software Analyst, which until the early 90's was a thing. Then that all got upended and we ended up with a mix between product owners, project managers and developers / devops but I think that that ignores the fact that Analyst is a different set of skills.
There is a lot of room to reevaluate the lessons of software development pre-web in the context of the current environment.

Like, if waterfall of a project can be done in 2 weeks, is it agile now?

> Like, if waterfall of a project can be done in 2 weeks, is it agile now?

Sure. The thing is, the waterfall guys would tell you it's impossible to do it in 2 weeks because you need to have written down everything first. "Thousands of pages" was the terms they used.

Agile guys would point you to the Agile manifesto which would lead you to "working code over documentation" and "people over process".

A 2 week period to go from initial spec to product in a user's hands to capture feedback and make changes from there is much closer to agile than to waterfall. In fact it's more or less exactly some older versions of Scrum (which didn't permit deviating from the planned sprint user stories midway through the sprint, instead changes influenced the subsequent sprint).

The DoD's 2167 standard from the late '80s mentions the following documentation that should be produced as part of the development process (section 6.2 and Appendix D):

- System/Segment Specification

- Software Development Plan

- Software Configuration Management Plan

- Software Quality Evaluation Plan

- Software Requirements Specification

- Interface Requirements Specification

- Software Standards and Procedures Manual

- Software Top Level Design Document

- Software Detailed Design Document

- Interface Design Document

- Data Base Design Document

- Software Product Specification

- Version Description Document

- Software Test Plan

- Software Test Description

- Software Test Procedure

- Software Test Report

- Computer Sytem Operator's Manual

- Software User's Manual

- Computer System Diagnostic Manual

- Software Programmer's Manual

- Firmware Support Manual

- Operational Concept Document

- Computer Resources Integrated Support Document

- Configuration Management Plan

- Engineering Change Proposal

- Specification Change Notice

This is a particular artifact of the government system process. These are contracted pieces of work that Company A would deliver, Company B would administer, and Company C would be contracted out for additional work. Further, all specifications were created ahead of time because changes would cost extra. (Anyone who has done government contracting can talk to the shenanigans involved with it - I have not lived in this world for a long time.)

That said, we still do ad-hoc versions of many of these. For example, a system/segment specification today is an OpenAPI document between microservices. Most larger SaaS companies have the equivalent of a Software Configuration Management plan - Who can change terraform or a GHA, what are the standards that they conform to (linter, peer review standards).

> This is a particular artifact of the government system process.

Yes, a government process meant to implement the waterfall approach.

If you look at Dr. Royce's paper which originated the concept, he was very explicit that it required upwards of thousands of pages of documentation to be written up front, if you were doing it "right".

By the time the required documentation had all been written, there should be essentially nothing left to do but to actually type out the punch cards as specified and turn then into a system of compiled programs.

Now, this appealed to government because it put documentation in place that was felt to be more viable for contracting processes, but ever since Dr. Brooks chaired a 1987 Defense Science Board study on the issues already facing the DoD trying to implement waterfall methods, they've been trying to restructure their software acquisition methods to pursue better outcomes rather than more concretely defined outputs.

Of course it's still a tremendous challenge for them even now, and it remains common to see defense acquisition projects that will say "Agile" to the right people even as they prescribe a full waterfall-style 'system engineering V' approach behind the scenes.

The ad-hoc responses that the commercial space often involves is usually more appropriate, believe it or not. They get process added when process is helpful, but not before it is helpful.

at one point or another in my career (gov contracting) I had to write or co-write or review every one of these. and without fail, within 6-12 months they would be stale/inaccurate/obsolete/… the truth is, even on projects where sufficient time is allocated to write these, there is never (literally) time allocated to keep them up-to-date
That doesn't do justice to either waterfall or agile.
Oh certainly - I'm conflating the adjective of agile with the manifesto of agile. I've been on projects with multi-hundred page design docs and multi-week UATs. And nobody wants to go back to prince2 for example.

The point I was trying to make is we should be diving back into the older methodologies and accumulated wisdom and re-evaluate some of the older dead ends with new context.

Or referring more to the process of building the specs, requirements engineer(ing). Imho agile became a way of hand waving most of this critical process and responsibility, in place of a new inefficient and ill-defined process.

Yes, you don’t know the nuances of all specs upfront and revision will be necessary. Turning the ship with arbitrary degrees of freedom outside of bullet points on a roadmap is not an efficient way to resolve that for many projects.

I came up as a software requirements analyst before the weird transition between business analyst to product owner to product manager to technical product manager. But living in requirements for 15+ years really gave me a leg up on these “let’s go back to requirements!” efforts.
It always amazes me how bad the software world is at keeping lessons learned as learned, especially when compared to say engineering. It's as if every 20 years or so we throw away the books and reinvent it all from first principles, hopefully this time with fewer mistakes overall but usually we end up finding both new ones and re-do some old ones.
It's because software engineering, which deals with bits, evolves dramatically faster than other engineering disciplines, which deal with the physical world.
It's almost like "move fast and break things" isn't such a good mantra.
It’s almost as if we need to write some unit tests. For our profession.
This ultimately converges on what source code is though.

The most common form of what you'd call a "spec" is the acceptance criteria on a work ticket, which is an accretive spec i.e. a description of desired change -- "given what already exists, change it as follows". I.e. if you somehow layered and summarized and condensed all tickets that have been made since product started, you'd have your "spec".

But it's the devs who were doing that condensing via understanding each desired spec addition vs reality of existing codebase.

So the gap between what people are currently calling "specs" what the code was already doing is not big and will not stay big, but for the fact you're effectively adding another (quasi) compile step underneath - and in this case its a non-deterministic one.

What's the difference between this and Jira. Your specs already live somewhere, it's where you defined them. That's why it's nice to put the Jira ticket number in your code / commit, so you can refer back to the spec when something breaks
A specification isn't a series of change requests! Using Jira as your source of truth is no different to just recording all your prompts. There's nothing you can easily review to spot contradictions or how things interact with one another.

I've been doing "specmaxxing" for a few months now. Unlike the author I don't use Yaml, I use a mix of Markdown and Gherkin. If you haven't encountered Gherkin before, it's not new and you might know it under the name Cucumber or BDD.

https://cucumber.io/docs/

Gherkin is basically a structured form of English that can be fed into a unit testing framework to match against methods.

The nice thing about writing acceptance criteria this way is that they become executable and analyzable. You write some Gherkin and then ask the model to make the tests execute and pass. Now in a good IDE (IntelliJ has good support) you can run the acceptance criteria to ensure they pass, navigate from any specific acceptance criteria to the code which tests it (and from there to the code that implements it), you can generate reports, integrate it into CI and so on.

And when writing out acceptance tests that are quite similar, the IDE will help you with features like auto-complete. But if you need something that isn't implemented in the test-side code yet, no big deal. Just write it anyway and the model will write the mapping code.

There's a variant of Gherkin specifically designed for writing UI tests for web apps that also looks quite interesting. And because it's an old ecosystem there's lots of tooling around it.

Another thing I've found works well is asking the models to review every spec simultaneously and find contradictions. I've built myself a tool that does this and highlights the problems as errors in IntelliJ, like compiler errors. So I can click a button in the toolbar and then navigate between paragraphs that contradict each other. It's like a word processor but for writing specs.

Once you're doing spec driven development, you don't need to write prompts anymore. Every prompt can just be "Update the code and tests to match the changes to the specs."

I agree, Cucumber works really well with LLMs.

> I use a mix of Markdown and Gherkin

Gherkin also has a Markdown based syntax that is not well known:

https://github.com/cucumber/gherkin/blob/main/MARKDOWN_WITH_...

I prefer that to the 'verbose' original syntax. MDG also renders nicely in code forges.

The problem with gherkin is that it was a badly designed language.

The general idea of "readable specification language" was an inspired one but it failed on execution - it has gnarly syntax, no typing and bad abstractions.

This results in poor tests which are hard to maintain and diverge between being either too repetitive to be useful or too vague to be useful.

The ecosystem is big but it's built on crumbling foundations which is why when most people used it most of them got frustrated and gave up on it.

Annoyingly there's a certain amount of gaslighting around it too ("it didnt work for you coz you werent using it correctly") which is eleven different kinds of wrong.

I solved this five months ago with recursive-mode: recursive-mode.dev/introduction
Thanks! Looks great, bookmarked.
Curious to hear how it works for you.
Jira is only a set of changes though. What happens on a long (10+ year) and complex (10+) developer project with many changes and revisions? Eventually you need an explicit specification that itself has a "current state", and a change log. Theoretically you could generate this from Jira, but in my experience it eventually became a mess on any larger project that didn't have explicit and maintained writen requirements.
Jira has current state and a change log. The proposal here is "use yaml instead of jira." Same damn thing, same damn mess.
What about when you migrate away from Jira, or when there’s a Cloudflare outage?
1) export 2) backup
> will always care what the spec says, and that’s never going to change

Did I miss something or is everyone back in 1970s, working in waterfall processes now?

All through the agile era I wrote detailed specs for projects and then followed an agile process. The most successful parts of every project were the ones that we were able to spec best even when they diverged significantly from the original spec.

You don't plan to follow the plan. You plan in order to understand the whole problem space. Obviously no plan survives contact with reality.

> You plan in order to understand the whole problem space.

I like to do spikes to understand problem spaces before planning. The planning is then usually effortless and just to get in sync with stakeholders.

But in that regard AI coding is really backwards. We don't necessarily need hard separation of planning and coding, but we need a deliberate separation of experimental/explorative coding and the code that is supposed to make it into prod. AI coding does all that in the same place, I don't even want to know how hard it is to "fix" AI code that started on behalf of a completely wrong premise. AIs certainly don't have a good measure when to refactor something completely messed up.

> We don't necessarily need hard separation of planning and coding, but we need a deliberate separation of experimental/explorative coding and the code that is supposed to make it into prod. AI coding does all that in the same place ...

This is a very good point. The AI speedup some PMs fantasize about is skipping planning and instead generate code directly from end user discussions, POC-ing our way into shipping.

Agree!

Another point of view is that LLM:s perform to an extent on the same level as outsourcing does. This interface requires a bit more contract mass than doing everything within single team.

"Plans are worthless, but planning is essential."
Pointless nit, but replace "essential" with "invaluable" for a play with words.
I like wordplay too, but in this case it'd have risked muddling the lone point I was making. Doubling down on tangents, since you like wordplay I bet you'd enjoy https://tiledwords.com/ which has been posted on HN a couple times.
We never left waterfall in the end. Working with and for dozens, collaborating with probably a hundred software companies in different scales, every single one said:

We do agile

Guess what? Every single one of them was doing waterfall.

Their agile included preplanning and pre-specifying the full spec and each task, before the project kicked off. We'd have meetings where we'd drill down into tasks, folks would write them down so detailed that there would be no other way than doing that. Agile would be claimed, but the start date, end date, end spec and number of developers was always concrete.

Sometimes, the end date was too late, so a panic would ensue. Most of the time, the date was too late because developers had "unknowns" which then had to be "drilled down and specced so they wouldnt be unknowns". Sometimes, nearly 50% of the workweek was spent on meetings.

A few times, a project was running late - so to make sure we are _really_ doing it agile, we'd have morning standups, evening standups, weekly plannings, retrospectives, and backlog refinement. It would waste the time, and the "unknowns" aka "tickets to refine" were again, as always, dependant upon the PM/PO/CEO's wishes, which wouldn't get crystallized until it was _really last minute_.

One customer wanted us to do a 2 year agile plan on building their product. We had gigantic calls with 20+ people in them, out of which at least half had some kind of "Agile SCRUM Level 3 Black belt Jirajitsu" certificates.

To them, Agile was just a thing you say before you plan things. Agile was just an excuse to deal with project being late by pinning it on Agile. Agile was just a cop out of "PM didn't know what to do here so he didnt write anything down". Agile was a "we are modern and cool" sticker for a company.

And unfortunately, to most of them, agile was just a thing you say for the job, as their minds worked in waterfall mode, their obligations worked in waterfall mode, companies worked in waterfall mode, and if they failed their obligation to the waterfall, their job would go down one.

So while we were doing the Agile ceremonies, prancing around with our Scrum master hats, using the right words to fit into the Agile™ worldview - we were doing waterfall all along.

And after 15 years, I'm not even sure - did agile really ever exist?

Continuous integration and demos to stakeholders (devs, designers, product managers etc) every 2 weeks - these practices are now engrained :-) It's frequent to then do corrections after these demos, and that really helps ensuring the product manager is getting what their customers need.

Easy to forget waterfall in 1970s / 80s really meant teams working on their own for months and then realizing there is no way to assemble the whole product from the parts. Or that the industry has moved on and the product is obsolete.

Agile as "devs can do what they want" never really existed ;-) Managers always have to plan / T-Shirt size resources (time, devs) to some degree. For stuff that's really hard to break into tasks, the magic word is "the plan is to do a POC first".

Coming from someone who also doesn't like teams being asked to break their unknowns into 30 known tasks. It's a compromise... I agree with all your points on how Agile is abused / misunderstood. Yet i believe in the progress from continuous integration and regular demos to stakeholders as a sign we did change something....

Most companies don't do that much of a regular demo to customers anyways - turns out most customers aren't even interested for the first 30-50% of the project, then they become mildly amused, until the final 80% - that's they start getting incredibly interested and opinionated.

> Agile as "devs can do what they want" never really existed ;-)

No real agile ever really exists in the end :)

But it's not devs not doing "what they want" that bothers me - it's the absurdly over-planned project estimates and timelines, with every detail of the project being specced out, not a lot of margin room for errors, invoking the name of "agile principles" as a way to deal with exactly things the PM's don't want to deal with in that moment.

I'd be fine with some degree of planning ahead, or starting with prototypes/PoC's, but such a huge part of the industry just chunks it into "same boat but we'll put agile stickers on the holes", and there is a whole industry of ceremonies around it, that it breaks the "core principles" of agile.

What a beautiful irony have we built :)

Specs weren’t the problem with waterfall. The difficulty in changing them to match reality was.

The waterfall process I experienced went like this:

- Product folks created requirements

- architects produced detailed specs

- project managers created tickets based on specs

- lengthy estimation ensued.

- Then finally developers proceeded with implementation.

- QA tested it.

Each step above involved lengthy review with like 5-10people. If the devs found an issue with the spec or god forbid the requirement it triggered a massive cascade of work for everyone above. Things needed to be reviewed again, customers may need to get contacted, …etc.

I think we can learn from that and optimize for change. Specs as living documents close to the code should be less cumbersome. But, just like anything else large corporations will probably fumble this like they did with “agile” (SAFe I am looking at you).

This is a long way to say specs aren’t bad. Specs that are difficult to change are though.

Sort of, but the downside of waterfall was you build the wrong thing and waste a shitload of time rewriting it.

When rewriting the entire codebase is very quick and cheap, why bother iterating on small components?

> When rewriting the entire codebase is very quick and cheap, why bother iterating on small components?

We are nowhere near this scenario tbh. Token cost is very high and is currently heavily subsidized by VC money to gain market share. Also this realistically only applies to small projects, small codebases and mostly greenfield ones. No way you can rewrite the whole codebase quickly and cheaply in any mid-sized+ projects

But even assuming token cost plummets, any non-trivial piece of software that is valuable enough to generate income for the company is also big, complex, interconnected enough that cannot be rewritten quickly even by AI, also for business reasons too. If a piece of code works, is stable and is tested, then rewriting it will always bring a high degree of risk and uncertainty that in a lot of business critical applications is just not worth it. A stable system can stay untouched for years besides minor dependencies updates.

waterfall is not the sole purveyor of written docs

distributed teams do well when proposals, decision, etc, are written down, and can be easily found and referenced

it doesn't mean docs are frozen in time and can't be patched like code

I read that as "the business caring about what the spec says will never change" rather than "the spec will never change".
waterfall doesn’t mean writing down decisions
Nice! Your spec-maxxing is very resonant. I've been doing working with explicit requirements: elicit them from conversation with me or introspecting another piece of software; one-shot from them; and keep them up-to-date as I do the "old man shouts at Claude" iterations after whatever one-shotting came up with.

Unlike you, I wish for the LLM to do as much of the work as possible -- but "as possible" is doing a lot of work in that sentence. I'm still trying to get clear on exactly where I am needed and where Opus and iterations will get there eventually.

It has really challenged me to get clearer on what a requirement is vs a constraint (e.g., "you don't get to reinvent the database schema, we're building part of a larger system"). And I still battle with when and how to specify UI behaviours: so much UI is implicit, and it seems quite daunting to have to specify so much to get it working. I have new respect for whoever wrote the undoubtedly bajillion tests for Flutter and other UI toolkits.

Forgot to add: I get several benefits from doing this.

1. Specifications that live outside the code. We have a lot of code for which "what should this do?" is a subjective answer, because "what was this written to do?" is either oral legend or lost in time. As future Claude sessions add new features, this is how Claude can remember what was intentional in the existing code and what were accidents of implementation. And they're useful for documenters, support, etc.

2. Specifications that stay up to date as code is written. No spec survives first contact with the enemy (implementation in the real world). "Huh, there are TWO statuses for Missing orders, but we wrote this assuming just one. How do we display them? Which are we setting or is it configurable?" etc. Implementer finds things the specifier got wrong about reality, things the specifier missed that need to be specified/decided, and testing finds what they both missed.

I have a colleague working on saving architecture decisions, and his description of it feels like a higher-abstraction version of my saving and maintaining requirements.

Specifications doesn't tell you what to do, they say what the end state should be. In between that you need a codebase analysis step and an implementation plan.

My recursive-mode workflow handles all of that and more and gives you full traceability: https://recursive-mode.dev/introduction

I do (1) the same but (2) differently. In my workflow, (2) are AI generated specs using human written (1) as the input. It's an intermediate stage between (1) and the codebase, allowing for a gradual token expansion from 30k to 250k to the final code which is 2-3M. The benefit I've found with this approach is it gives the AI a way to iterate on the details of whole system in one context window, whereas fitting the whole codebase into one prompt is impossible. The code is then nothing more than a style transfer from (2).
Let's cut through the noise - what did you build with this very elaborate process and how much ARR is it generating ?
Asking the real questions. I would also really like know how much value AIs are bringing in terms of ARR or MRR.
Jfc
So what I'm building is a github clone with epics/issues/kanban + specs/requirements/standards + CI/testing/coverage with the idea that all of those things connect so issues+requirements+testing all interact via code+webUI+CLI the point being that we can specify how a product is to function and the steps to get there and it's less a matter of telling a person or an LLM to read and implement the spec and more software actually keeping track at all times.
What not just record the conversation? If contains all that is needed. The initial at large scoping, the failed attempts at doing x and not y, how that specific line of code solves that specific edge case, etc.

When it’s time to review, review both code and conversation. 200 “user written messages asking why and what”? Likely a good PR. 15 “yes, yeah, ok, whatever”? Well you might want to give that PR some love.

It feels to me that when we commit, we throw away half, if not most, of the work done by not recording it.

The exact reason you should start with one first. I support maximum specification. Atomic if you will. Lucky for modern development, you don't have to write it, you just have to proof it. If you can read a spec, it can guide your development, might as well have a system to manage them.
I think my use case is a little different than yours - I’m wanting to use it as a framework for me and an agent to manage specs, then split implementation into a separate agent fleet and use the spec as the interface layer - but the tool itself looks great and should work well!
I actually read it all since it did not contain any hints of being AI generated (although I wouldn't be surprised to learn you did use AI to write it), so thank you for that. It's kind of crazy how I now have the default expectation that posts posted here are AI slop with little thought or care put in.

I am also stealing the idea of talking to LLMs as if it's an email. So funny, we need to be joymaxxing a bit more I think :)

Beyond writing the spec down, you can share the spec or use someone else's spec. That's why spex.build was created, to be a hub with versioned specs so people can just create their own implementations, in the language, style, and particulars that they want.
So basically, talk with a rubber duck, but record the conversation
Great idea -- just one suggestion if you want it to catch on: perform some IncelCultureMinning on the nomenclature.

You probably don't want people associating your work with abusing crystal meth and hitting yourself in the face with a hammer.

For anyone missing the reference, SNL has a pretty good explainer:

https://www.youtube.com/watch?v=4XMPLdiXB1k