| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by simonw 750 days ago

A subset of this idea is a hill I am willing to die on: the documentation for a codebase should live in the same repository as the codebase itself.

I'm talking about API documentation here - for both code-level APIs (how to use these functions and classes) as well as HTTP/JSON/GRPC/etc APIs that the codebase exposes to others.

If you keep the documentation in the same repo as the code you get so many benefits for free:

1. Automatic revision control. If you need to see documentation for a previous version it's right there in the repo history, visible under the release tag.

2. Documentation as part of code review: if a PR updates code but forgets to update the accompanying documentation you can catch that at review time.

3. You can run documentation unit tests - automated tests that check that the documentation at least mentions specific pieces of the code (discovered via introspection). I wrote about that a few years ago and it's been working great for me: https://simonwillison.net/2018/Jul/28/documentation-unit-tes...

4. Most important: your documentation can earn trust. Most documentation is out of date and everyone knows that, which means people default to not trusting documentation. If anyone who looks at the commit log can see that the documentation is being actively maintained alongside the code it documents they are far more likely to learn to trust it.

The exception to this rule for me is user-facing documentation describing how end users should use the features provided by the software. I'd ideally love to keep this in the repo too, but there are rational reasons not to - it might be maintained by the customer support team who may want to work in more of a CMS environment, for example.

9 comments

xorcist 750 days ago

Love your blog, but in this case I want to take a more nuanced, if not opposite, stance:

There are many things closely related to code, that shouldn't necessarily live in the same repository. First, we need a common understanding of what should live together in a repository. This is much like the discussion about mono vs. multi-repo. A good rule of thumb is that if it is branched together, it lives together.

Effective documentation is not only a strict API reference, and not something that can be generated from docstrings alone. It offers a high level overview to understand the problem being solved, the architecture of the software, and a general roadmap of how it is developed. Effective documentation should cover both backwards and forwards revisions and how those migrations should be handled.

But this is also true on a reference level. Reading the documentation of a specific function I want to know if something relevant happens to this function in the next revision. There is nothing worse than checking out documentation for current production revision 34.5 and follow best practice there only to discover I should have checked out revision 34.6 instead because best practice changes there. Specific revisions should be documented, but documentation should not be limited to a specific revision.

There is a scale of how closely other artifacts follow code revisions: Tests is mostly branched with code, and should probably live together. Documentation can sometimes be branched with code, some should and some shouldn't live together with code. Deployment code and configuration management must be able to deploy old and new code from the same code base, and is even less likely to benefit from living with it. Then there's application state and test data which is something else entirely.