It would be good to see a real example. There’s a sketch of one in the README.md but I’d be interested to see how it works in real life with something complicated.
> Add users with authentication
> No, not like that
> Closer, but I don’t want avatars
> I need email validation too
> Use something off the shelf?
Someone in this place was saying this the other day: a lot of what might seem like public commits to main are really more like private commits to your feature branch. Once everything works you squash it all down to a final version ready for review and to commit to main.
It’s unclear what the “squash” process is for “make me a foo” + “no not like that”.
Yeah the squash question is the whole thing. If your commit history is "do X" -> "no, not like that" -> "closer" then your final commit message is just "do X" with no trace of why certain approaches were rejected. Which is arguably the most useful part of the conversation.
If you have a compiler the same source code and the same options, it should generate the same output everything provided you aren't using some compiler pragmas or something similar that embeds timestamps or random numbers or similar. If you give an LLM the same input, it can generate different outputs (controlled by the temperature setting).
I'll be charitable here but you need to go out of your way to introduce non-determinism. Bit reproducible builds and distros exist so it is possible to have an entire distro that can be reliably reproduced bit-by-bit on different systems and at different times.
It's the other direction. You have to put in an extreme amount of effort, like Debian has, which you can just piggyback off of, to cause determinism to be introduced. 2013 they started that initiative. They're reasonably there, thirteen years later, but to disregard the amount of effort it took to get there would be to forget history. Give ChatGPT thirteen more years to iterate, and see where it is then.
Folks who bring up these "gotchas" should be forbidden from using or taking advantage of the things they are disingenuously whataboutism-ing. Reminds me of sovereign citizen behavior.
It is not clear to me that keeping prompts/conversations at something like this level of granularity is a _bad_ idea, nor that it's a good one. My initial response is that, while it seems cute, I can't really imagine myself reading it in most cases. Perhaps though you'd end up using it exactly when you're struggling to understand some code, the blame is unclear, the commit message is garbage, and no one remembers which ticket spawned it.
In my CLAUDE.md, I have Claude include all new prompts verbatim in the commit message body.
While I haven't used Claude long enough to need my prompts, I would appreciate seeing my coworkers' prompts when I review their LLM-generated code or proposals. Sometimes it's hard to tell if something was intentional that the author can stand behind, or fluff hallucinated by the LLM. It's a bit annoying to ask why something suspicious was written the way it is, and then they go ahead and wordlessly change it as if it's their first time seeing the code too.
Huh? Either I don't get it, or they don't get it, or both. I'm so puzzled it's probably both.
> Every ghost commit answers: what did I want to happen here? Not what bytes changed.
Aren't they just describing what commit messages are supposed to be? Their first `git log --online` output looks normal to me. You don't put the bytes changed in the commit message; git can calculate that from any two states of the tree. You summarize what you're trying to do and why. If you run `git log -p` or `git show`, then yeah you see the bytes changed, in addition to the commit message. Why would you put the commit messages in some separate git repo or storage system?
> Ghost snapshots the working tree before and after Claude runs, diffs the two, and stages only what changed. Unrelated files are never touched.
That's...just what git does? It's not even possible to stage a file that hasn't changed.
> Every commit is reproducible. The prompt is preserved exactly. You can re-run any commit against a fresh checkout to see what Claude generates from the same instruction.
This is not what I mean by reproducible. I can re-run any commit against a fresh checkout but Claude will do something different than what it did when they ran it before.
It's surely supposed to be the perfect bridge between both - but we know in practice it often isn't.
I'd also quibble minorly with intent being what the code should be clearly communicating - it's its place and function and meaning within the wider system, which isn't necessarily what the particular programmer wished. Free-form English is a good medium for intent - not so much for "what this thing actually does"
I tried maintaining chat hostory and summary in a 'changes' dir in the repo. Claude creates a md file before commiting (timestamp.md, commit hash doesn't work as filename because rebase/squash).
I had to stop doing this because it greatly slowed down and confuse the model, when it did a repo search and found results in some old md files. Plus token usage went through the roof.
So keeping changes in the open like that in the repo doesn't work.
Not sure how tfa works, but hopefully the model doesn't see that data.
I dont know. I get the idea that its like comitting c code that then gets compiled to machine code when someone needs the binary, but what if the prompt isnt complete?
For any formal language, there was a testing and iteration process that resulted in the programmer verifying that this code results in the correct functionality, and because a formal compiler is deterministic, they can know that the same code will have the same functionality when compiled and ran by someone else (edge cases concerning different platforms and compilers not withstanding)
But here, even if the prompt is iterated on and the prompter verifies dlfunctionality, its not guaranteed (or even highly likely) to create the same code with the same functionality when someone else runs the prompt. Even if the coding agent is the same. Even if its the same version. Simply due to the stochastic nature of these things.
This sounds like a bad idea. You gotta freeze your program in a reproducable form. Natural language prompts arent it, formal language instructions are
I noticed in the README that each commit message includes the agent and model, which is a nice start toward reproducibility.
I’m wondering how deep you plan to go on environment pinning beyond that. Is the system prompt / agent configuration versioned? Do you record tool versions or surrounding runtime context?
My mental model is that reproducible intent requires capturing the full "execution envelope", not just the human prompt + model & agent names. Otherwise it becomes more of an audit trail (which is also a good feature) than something you can deterministically re-run.
That’s fair - strict determinism isn’t possible in the traditional sense. I was thinking more along the lines of bounded reproducibility.
If the model, parameters, system prompt, and toolchain are pinned, you might not get identical output, but you can constrain the space of possible diffs.
It reminds me a bit of how StrongDM talks about reproducibility in their “Digital Twin” concept - not bit-for-bit replay, but reproducing the same observable behavior.
I like this concept because everyone's thought of "commit the agent prompts and reproduce everything from scratch every time" as a "dumb idea" I'm unsure if anyone has actually executed on it in a snappy git-like UI.
Now, because the author took the time to work on it, we can see if this is actually a better method of software development. If LLM development continues deflating the cost of quality software, maybe this will turn out to be the future.
You don't need to be committing the prompts you're using. There's a whole bunch of back and forth in the prompts as you refine. That's not useful information.
What you should do, is use the context window that you've got from writing the code and refine that into a commit message using a skill.
I start with a conversation and then ask the coding agent to write a design doc. It might go through several revisions. The implementation might also be a bit different if something unexpected is found, so afterwards I ask the agent to update it to explain what was implemented.
This naturally happens over several commits. I suppose I could squash them, but I haven't bothered.
I love this idea although not sure I’d be comfortable with the level of steering control I would get without trying it for real! What would be even better would be to unshittify my poorly written commit message into a beautiful detailed commit message. We can still keep the original in the footnote if we have to.
I'm wondering if this is what I've been thinking of as "prompt source code": the prompts you use, viewed as source code, for producing whatever code actually comes out...
so people can look at what prompts you used to get whatever code you have generated
Poe’s Law applies here. “This is a pretty good parody of vibe coding” I thought, but then I didn’t see example commits like “update heart pacer firmware and push to users”.
I think I'd be more interested in a new worktree job type CLI. At the end of the day I don't want to be reverting commit who were a clear mistake, no matter how good AI will be in the near future.
It’s unclear what the “squash” process is for “make me a foo” + “no not like that”.