| I'm going to mostly disagree with everyone here, much to my karma's detriment ;P I agree the end-goal should be infrastructure as code, and everyone here has covered those tools well. You also want monitoring across your infrastructure. Prometheus is the new poster-boy here, but the Nagios family, and many other decent OSS solutions exist as well. But you still need documentation. Your documentation should exist wherever you spend most of your time. Some examples: * If you spend most of your time on a Windows Desktop, doing windows admin type things, then OneNote or some other GUI note-taking/document program makes sense. * If you spend most of your time in Unix land(linux, BSD, etc) then plain text files on some shared disk somewhere for everyone to get to, makes WAY more sense. Bonus if you put these files in a VCS, and treat it like code, and super bonus if your documentation is just a part of your Infra as code repositories. * If you spend your time in a web browser, then use a Wiki, like MediaWiki, wikiwiki, etc. In other words, put your documentation tools right alongside your normal workflow, so you have a decent chance of actually using it, keeping it up to date, and having others on your team(s) also use it. We put our docs in the repo's right alongside the code that manages the infrastructure.. in plain text. It's versioned. We don't publish it anywhere, it's just in the repo, but then we spend most of our time in editors messing in that repo. |
Instead of documenting all the commands involved in configuring a machine as service X (ssh, run apt-get, paste this, etc.), I have documentation on how work with the configuration management system (roles in the roles/ directory, each node gets one role, commit to git, open PR, etc.). That documentation is in .md files in the config management source repo.
Instead of documenting how to rack a server (print and attach label to front and back, plug power into separate PDUs, enter PDU ports into management database, etc.), I document Terraform conventions (use module foo, name it xxx-yyy, tag with zzz, etc.).
It ends up being less documentation, as the "code" serves to document the steps taken, so the documentation can be higher level. Or if it isn't less documentation, it is documentation that needs to be updated less often, so hopefully there will be less drift between docs and what actually exists.