Hacker News new | ask | show | jobs
Keeping documentation in sync with source code (cerbos.dev)
45 points by oguzhand95 1600 days ago
4 comments

Why not just track the number of commits since the documentation has changed as a quick way of knowing which docs are most likely to be out of sync? Docs written in Markdown could have comment identifying which feature they belong to; code could similarly have a comment which feature they belong to. Doc pages could even dynamically display the number of commits on a feature since that doc page was updated... just thinking out loud here.
You could gamify this by generating an update score. The more often and more consistently documentation is updated relative to commits, the higher the score. Nothing will be tied to the score, it’s just a number. It’s existence would incentivize developers to drive it higher. Sure you could get around it by just adding a space to the .md file, but at least then you’re opening the file and you may see something in there that needs a quick change anyways. The score would have done its job.

This would stop working if bonuses or compensation were tied to the score. The moment that happens, people will start writing hooks to update it with a space every commit, then it will be ignored by devs.

I think it might actually be better to reward someone for updating extremely old documentation. Like if there have been N commits since this documentation was updated, it should be worth N points. Then people are incentivized to update the oldest documentation. Sure someone could wait for yet another commit and have it be worth N+1 points, but someone else might take it up.
This assumes the docs written at the same time the code was changed is actually correct. This is an enormous assumption.

Documentation needs to be written as automated tests as far as humanly possible. I've worked towards this goal on iommi and it's been eye opening. Every time I run more code examples from the docs in the test suite I've found bugs.

> This assumes the docs written at the same time the code was changed is actually correct. This is an enormous assumption.

One would hope that the pull request/merge request review process would catch this. Ultimately the humans are the weak link in all of this but if the tooling could point out that DocumentationForThisFeature.md has a comment pointing to the class you just updated and you didn't assert one way or the other that the docs are still valid (not sure of the syntax of this... update the date on the class of an attribute like featureDocsLastUpdatedOrVerified="2020-02-04T14:51Z"?) could at least raise the "Oh yeah -- I forgot to update that!" awareness. You could even take it as far as making such an indication a reason to fail a CI/CD pipeline if documentation matters that much.

I wrote the docs for the code I myself wrote and I had bugs in it! And they were reviewer by a colleague. There were still bugs!

The the underlying code was changed, introducing more bugs.

Cool project and nice write-up. I wonder why you didn't use the OpenDoc standard format? Were there project-specific reasons for the documentation format you chose?
We chose ASCIIDoc for our documentation because it's a modern, feature-rich markup language that makes writing technical documentation really pleasant. To be honest, we didn't even consider OpenDoc as an option. Wasn't it discontinued a long time ago?

(I am the author btw)

> Another difficult problem was checking for correct indentation visually (because the configuration is YAML-based and indentation matters).

why yaml? I hate it so much! my eyes hurt looking at it?

why not using https://hjson.github.io/ ?

Shouldn't there be a GPT bot that can do this by now? I can see how translating human language to computer code would be hard, but going the other way seems like it should be fairly easy. There's a large corpus of well documented code out there to train on too.
This seems like some kind of holy grail to me, and frankly quite useful for development. Got some library you want to use? Feed it into the documentator! Need to integrate with some firmware that nobody cared to document? Feed it into the documentator! Show me the data formats by reversing the code! Would save a ton of time.

However, since I did try to start this project once, on a fairly small 1-million line firmware code set written in C, let me be the first to tell you that it is no small feat. Even producing a tool that could analyze code and then output what the valid inputs are to any given function would be a phenomenal achievement.