|
let to unwrap, let me try to do it i misunderstood "wall of text" (i was thinking about bloating repo with it), my solution to understanding is just to create ad-hoc tools to parse the json i coded a web ui with simple toggles: show me what user said, what llm said (nice to see what I was thinking about, nice to see how LLM came up with solution X, you get tools calls, maybe it found something i didn't think about or viceversa)
you can search/grep (.ie: did i consider idempotency when i build feature X? open session, search/grep idempotency)
you can, up to some point, resume the conversation (yes i know, cache busting makes some usages of this impractical, but in general resuming and asking "when we did this, did we think about that" tends to work... let's say that research is ok, time travel, meh) overall, one of the advantages of LLMs is to be able to direct then ad data for insights, via standard CLI tools, via specific prompts, or building some mini tools (yeah vibe coding is fine sometimes)
whatever my question, if i have data i can have an answer LFS helps with the second aspect (buried bullshit). Unless you smudge, you have a pointer, and that is just 3 lines.
You need to learn some ergonomics, but ok, some of us learnt how to use Jira XD Taking your position a bit further, yes, committing chat sessions implies that you also need to review them so that bullshit doesn't filter through. Milage varies based on your personal preferences, which project you are working on, and many more heuristics.
Some will find it boring, some will think it's good project maintenance, all should be able to find a way to handle this based on their preference. It is also nice to pointout that cleaning bullshit doesn't need to happen at merge. LFS blobs being stored separately, you can have side flows helping you out, without clogging yoou CI pipelines. "no dude"-> rules
you can put down SOME rules
usually this happens to me at PR. I am tired fo saying "you should always check X", so i bolt it down "someplace".
I am running the usual motions i suppose most of us try to adopt: put this down in agents.md, in folder x or y, in path-scoped rules using agents, in memory files (i am exporting/importing those too), in subagents that review code before PRs. in the end it's an unsolved problem at large, but
1. hopefully it will get better, my feeling is that it's just a cambrian explosion, and the fittest will survive... (also, owning the harness should help, i suppose .. i use claude code :D )
2. in a team, having personal styles surface is valuable. "dude don't do that" is quite often .... design. When rules go in the repo, at least we can find an agreement in person, and at PR is not about linking a document you read at onboarding, but finding out why the agent did not respect the rule. To me, that is more grounded in a tool.
3. rules are ... not static? We change our minds? We get better at things? We want to experiment? I am not advocating for a perfect rule system that replaces me, but for a good enough one that removes cruft from my daily job. I think my approach is actually helpful when it's time to find root causes (YMMV). Via tools that parse sessions, you can see when that specific portion of code has been written with a better granularity. During that bit of the conversation the user was worried about X and asked AI to do Z and AI read this and that file and "thought" this and that and wrote that piece of code.
Maybe the user was making wrong assumptions, maybe the LLM did not read the correct files or instructions, in any case you have a better tool for investigation.
It's up you to decide wether to use it, wether this lead to just solving the bug or also fixing instructions, etc, i am just saying that it actually helps to have some measure of the context on which this change made sense. "Fluffy wall of text that looks good but is factually wrong. " it might be good or bad, right or wrong, but what is in the sessions is the truth of what happened.
PR desc are horrible, i share that feeling with you, but having the story of how that thing happened is just not the same as "final summary of what we did in the past X hours". As a sideline: LFS doesn't really pollute your repo once you get to learn its ergonomics. Having chats in LFS also lets you approach this Reproducibility...
To me those conversation are basically the history for decisions taken while implementing. They are documentation.
The real problem with docs was that no-one has ever liked writing them, nor it was easy to implement a standardization around them.
If you just record/log, there';s no extra effort needed, and once there tools and LLMS are pretty good at helping us extract insights. I am also assuming, there is a correlation between quality in the conversation and in the code. I know, i'm being hand-wavey, but overall i think critical thinking is what makes code better, and being able to see if/which it has been applied can be a good proxy.
I ask for forgivness already: I not going down the rabbit hole of quantifying quality etc. It's a broad statement that should be taken with a grain of salt. If you want to go abstract, you can think of coding as going from thoughts to 0-1 in bits. We have high(er) level languages that help us organize thoughts to help us so that we can better keep them in our cognitive flow/load.
LLMs are an upper layer, that scrambles the code and make it more difficult to grasp.
But the reasoning behind the code is now available, and quite easy to parse.
I think this is the core point to me.
Code is an intermediate artifact between thinking and bits.
Now we have a second artifact: the conversation/decision that led to that code.
Why are we not storing it? disclaimer:
I am, of course, mildly in love with my own project and ideas, so possibly i like this too much just because i built it. IKEA effect or whatever. |
Just very recently, I saw a PR comment on why someone was choosing to do something in that particular way and what the other bad options would've been, i.e. justfying thei choice (at least they did do the "calling out" part. I had to comment about how none of that made any sense to me and why we didn't just do "other thing Y". Well turns out the AI had misled them, they believed it and it went downhill into a rabbit hole from there. I do believe that w/ the right spidey senses, even in an "unknown situation", it's entirely possible to come out the other end. But many if not most people succumb to the AI's nice and "sounds true" type language.
LFS doesn't. Walls of text do, whether you use LFS or not. I.e. Nobody's really gonna read all that. The only way to get through it is to use LLMs, e.g. through summarization. That doesn't solve anything though. LLM summaries are very often wrong. Depends on the text/conversation and the LLM but have you tried slack summarizing a thread? Ouch! I've also tried Claude making tickets from slack threads. Ouch but less so. Still needs polishing. And more time polishing it than it would've required from myself to just type up the ticket myself. What LLMs are good at is if you put the actual "meat" down and they "fluff it up". But sorry, I'd rather juts have the meat and skip the fluff entirely.Most LLM assisted bug reports on the other hand are huge walls of text with low signal to noise ratio. I.e. essentially the old
Famously the first known instance in the English language apparently was a sentence translated from a text written by the French mathematician and philosopher Blaise Pascal. The French statement appeared in a letter in a collection called "Lettres Provinciales" in the year 1657. It totally absolutely 150% applies to LLM use ;) Absolutely! And the issue with LLMs is that they tend to make it less likely for people to apply critical thinking. Even from people that (I at least thought) applied it in the past. "Does ChatGPT harm critical thinking abilities? A new study from researchers at MIT’s Media Lab has returned some concerning results." https://time.com/7295195/ai-chatgpt-google-learning-school/Btw, I write all of this as someone that has been coding exclusively w/ the use of Claude Code and Codex for more than 6 months now. On purpose.