If they can be generated from the source they probably shouldn't be in the source. Maybe it should be an IDE plugin that displays comments for code as you hover over it.
I used to think that too, even as recently as the last time I saw someone posting how they made GPT-3 write commit messages automatically.
However
There are couple reasons why I think this may be valid use case:
- Code is a static serialization artifact. It's unfortunate we still work with it directly, but that's another conversation; fact is, we are working with it directly, and if there is commentary relevant to the code, it's best placed in the code, so it remains if the tools generating them dynamically (like your "on-hover" IDE plugin idea) become unavailable or stop working (or start generating different outputs after some update).
- It's true that the comments should talk about "why" much more than "how", as the "why" is often not apparent in the code itself. However, "not apparent" doesn't mean "independent" - the code structures and the reasons for their existence are correlated. GPT-4 may have seen enough and be smart enough to actually spot those - as if it was a developer familiar with the problem space, going all "yeah, I've seen this type of code before - it's likely trying to ${high level goal}". Generating those comments would be valuable.
Of course, generated comments should be reviewed and edited at the point of generation, to make sure they're accurate and reference actual documents/tickets/design decisions. Whether or not an average developer will do that - that's another topic.
Multiple different views and representations at every level - e.g. syntactic (coding to AST nodes, not characters), structural (imagine having editable class/module outline), semantic (editable views optimized for specific abstractions that allow you to e.g. edit anything resembling a state machine in a state-machine-specific view).
Views that give you a vertical slice though code, auto-inlining function calls when you need it, allowing you read and edit a large block of code, and propagating changes to their respective source locations - this is the solution for the "lots of tiny functions vs. few large ones" "clean code" pseudo-problem.
Views that let you control focus. No more constant dealing with exceptions vs. Result<T, E>, 50 shades of async, or how to mix logging into it all - problems to which modern solution seems to be all kinds of monadic bullshit that makes code completely unreadable, unless the language itself gives you some arcane syntax and semantics to hide it all. Instead, have your IDE hide all the things you don't care about - aka. cross-cutting concerns - from your view.
For example, are you working on the business logic, and focusing on what the code is trying to achieve, aka. the golden/success path? Have your IDE hide all error handling for you. Turn all the Result<T, E> return types into just T, making it look as if the code was using exception handling (and doing the handling somewhere else). Then do some vertical-slice auto-inlining to make a specific functionality more apparent. Too noisy with logging code? Turn display of that off. Conversely, if you're interested in error propagation, turn display of all the business code off.
(Think of it as Aspect-Oriented Programming on steroids, in an interactive form.)
This and much, much, more. It starts with a simple idea though: stop thinking in terms of source code as text in files. Start looking at it as semantic units (classes, functions, statements, expressions) in a database of some kind. Instead of opening a file and editing its text, you would query the database to get an abstract code graph, and feed it to a view that renders it the way you need it. "SELECT Foo Bar from Classes, JOIN Fields, JOIN Methods", feed it to an editable outline view. "SELECT" whatever else you care about, feed it through some transformer, to a different custom view. Edit it, and have it automatically apply changes/"refactorings" to affected code.
And yes, editing raw plaintext is something that's often very efficient and we have well-optimized tools for this. But this doesn't mean the plaintext in question has to correspond 1:1 with source code. Instead, you could have the class/module outline view be editable plaintext, so you could regex-replace half of it in 5 seconds, and then press a button, and it would rename and move methods and classes across the codebase, making it conform to your edited outline. Basically what dired mode does to filesystem in Emacs.
> This sounds like it would enable more complex systems. Is that the goal?
Enabling more complex systems, making it much easier and faster to create safe, stable and efficient systems at current complexity levels - both are really the same goal. Making current complexity level easier to deal with also means you can increase complexity level to the point the work is as difficult as it was before. Your favorite cake suddenly costing half as much means you can save half the cost, or... just buy two.
> The skeptic in me thinks these are fancy bandaids for failure to keep complexity under control.
To me, most of the recent programming language trends are such fancy bandaids. You can't optimize for every possible concern simultaneously in a single plaintext format, but $deity, people try. That's how you get special syntax for Result<T, E> handling (e.g. ?, ?!), or increasingly impenetrable abstractions at the intersection of typing and monads - all because you'd like to represent error handling and logging and futures and few other things in maximally easy/readable way, in the same text, at the same time.
You're fighting two limits here - "in the same text" and "at the same time". IMHO, we should give up on both, and accept that the final "single source of truth" form will become some sort of unholy blend between C and Haskell, serving the role of assembly above assembly. Expressing everything in one place, but not casually readable. For day to day work, you would use many specialized representations, each focused on its specific concern, and free from constraints of a single common text format.
> The optimist in me thinks this sounds like a fabulously interesting development experience.
That's what I think too. It's about raising the tooling to meet us at the level we think at, making it work the way we think about code and systems - instead of trying to project every possible way of thinking into single programming syntax directly.
Note: there is prior art for this, mostly in Smalltalk world (including, recently, the Glamorous Toolkit). The short time I spent playing with those tools tells me this approach has great potential, but could use a lot larger dev community giving it prolonged focus, to improve and streamline the tooling.
The way I see it, I'd expect that the generated comments would often get some human attention immediately afterward. Even if they don't get edited but there's just a crude "filter" where the developer keeps some comments and throws out those which are undesirable in some way, that's very useful signal which can't be re-generated from the source and thus needs to be stored somewhere along with the code.
Under mine, the AI should sign and date the comments, something like so
AI 20230524
Comments are an odd asset. Some of the most useful ones I've seen when debugging are rotted ones that describe the code that used to be there.
You'll then turn back the clock and find the bozo that "fixed" something by removing important code but for some reason left the comment behind. It's pretty common.
This AI tactic could protect against that so when the next bozo comes in with their wrecking ball fingers and leaves the ai comments behind, you can go in there with your mop and broom and recover things quicker.
What would be really nice is my favorite commenting technique which is where I write obvious but also counterintuitively very wrong code and I aggressively comment it like
"Hello, if you're here you may be asking yourself why it's not like this
(Wrong code)
I did too! That's what #231 and #302 are about! I know I know, I fell for it too. If you're going to change it, open a ticket or something because you're probably going to break it as well"
If the AI can do that then we're in gold territory
That's another thing that could get removed/modified/disabled and it's action at a distance
Proximity is really fucking important but of course is only a proxy for other practices and anti pattern avoidance
Regardless, if you need to engage, do so at the obvious point of engagement, not off in some test suite where you cross your fingers on predicting the future on the diligence of its upkeep.
I mean write the test, do your rain dance. It's not going to protect you against the stupidity you have to worry about, X years after you've left the building.
Your code will live longer than you think and be modified more times by more people, who you will never meet, then you realize.
You don't have to feel responsible for that. But I do and that's one of the examples of how I practice it. You're leaving notes for future archaeologists to remove their guesswork.
Again, tests are great, linters are fine, but your nth generational successors may not agree with you or how you did them or how often they should be run or... and there goes your hardwork. Don't rely on them for assuring protection past your tenure.
Future coders are probably 10 times more likely to curse you as a nuisance than appreciate your diligence. Assume they'll hate you.
I actually completely agree with you in your approach, but also feel you/we should not take upon our shoulders quite so much responsibility to make the things we build perfectly resilient to bad decisions made by successors in the future. Put another way, I love and also do the “Note: do not use “‘86400 seconds’ - fails 2x per year” comment - however, I have not ever had the “there goes my hard work” thought after I have quit. If they don’t hire equal or more competent developers than me, and they mess up things, that’s actually kind of funny! (Note: if I worked in the medical, military, or nuclear fields, I’d feel very differently!)
Those industries are fundamentally different. They don't build software by the same rules.
It's like comparing say, clothing you buy at the mall to the clothing of a hazmat suit - sewing is involved, materials, you need something that fits over a body, it's roughly the same but they're fundamentally different.
There's extensive compliance and regulation. I've done avionics/automobile software. It's not fun or sexy and it isn't supposed to be.
If you try to play by the same rules as the rest of software you get Theranos. It doesn't work.
Agreed this makes much more sense, from both a UX perspective as well as a code-cleanliness perspective.
There's also the AI evolution question. When we have GPT-10 next year, do we go through and regenerate all the comments? That would introduce a lot of noise into the repo's commit history and `git blame`, which I think is another indicator that the repo is not the right place to store this sort of thing. (And it'd have to be done again every time the AI got smarter...)
Having the AI perched on your shoulder and just analyzing the code as you look at it seems much simpler. Like a friendly, modern version of the pirate's parrot.
> There's also the AI evolution question. When we have GPT-10 next year, do we go through and regenerate all the comments? That would introduce a lot of noise into the repo's commit history and `git blame`, which I think is another indicator that the repo is not the right place to store this sort of thing.
Counterpoint: comments in code reflect what the writer thought at the time of their generation - whether they were a human or an AI (and, excepting automated stupidity, there would be a human reviewing and accepting AI-generated comments). "Having the AI perched on your shoulder" is like re-reading and re-interpreting what the code means. You get the benefit of experience (and improved AI models), but you'll also miss the context, long lost to time since the code in question was written.
I'd say, we should to both. And code cleanliness... we won't make much more progress here than we've already made, not until we stop coding directly in the final, plaintext source form. There are too many conflicting concerns wrt. readability, and you can't have them individually optimized at the same time in a single piece of text.
Wow, thanks for the insightful discussion and feedback! This is definitely something that we will take into consideration and ideally, provide as an option.
Would it be so bad on git blame? Assuming all/nearly all comments are on their own lines, I would not expect that part to be a problem. The main problem I would see would be finding a way to merge in such a huge PR with lots of people actively working in these files, so there would be a lot of “merge conflicts” each time people tried to land their branches after one of these mega-comment-PRs went in.
I can see the argument and wouldn’t want to go overboard with generated comments, but it’s nice to have some in the source for now, since IDEs have tooling to display source comments in various contexts (eg hover over a function and get its docstring).
I can definitely see the utility of a “tell me more about this method” button that gets descriptions from GPT.
I also like the “ChatGDB” style of interface where all the local UI context is added to your GPT session (eg “what is this code doing” will answer about what you have selected, in the context of the whole file, and perhaps with the ability to retrieve other files too if needed for the explanation).
I can actually imagine a stage, past where we are now but before AGIs just writing all the software, where a repo consists of the prompts describing each module in a way that an AI would be able to generate it. Update the software by editing the prompts, or more likely, by asking an AI to make the necessary changes to all the prompts to add a particular feature.
However
There are couple reasons why I think this may be valid use case:
- Code is a static serialization artifact. It's unfortunate we still work with it directly, but that's another conversation; fact is, we are working with it directly, and if there is commentary relevant to the code, it's best placed in the code, so it remains if the tools generating them dynamically (like your "on-hover" IDE plugin idea) become unavailable or stop working (or start generating different outputs after some update).
- It's true that the comments should talk about "why" much more than "how", as the "why" is often not apparent in the code itself. However, "not apparent" doesn't mean "independent" - the code structures and the reasons for their existence are correlated. GPT-4 may have seen enough and be smart enough to actually spot those - as if it was a developer familiar with the problem space, going all "yeah, I've seen this type of code before - it's likely trying to ${high level goal}". Generating those comments would be valuable.
Of course, generated comments should be reviewed and edited at the point of generation, to make sure they're accurate and reference actual documents/tickets/design decisions. Whether or not an average developer will do that - that's another topic.