Hacker News new | ask | show | jobs
by tmm84 827 days ago
I felt most of this was just plain common sense. People read things by headings and sub headings. People look for relevant documentation for product x under product x not y. QA and code samples primes the LLM for what most developers like myself hunt for (a quick answer or a simple code snippet). The forum part got me. Seeing as forums tend to be variable in the quality and quantity of info. If the author(s) suggest forums why not discord servers and gitter chat as well? I know of serveral projects where the real documentation, examples and help is locked up on the discord/gitter channels. Also, in the same vein, why not Github PRs/issues as well? Having the LLM diagnose when an issue was cleared up, migration strats, etc. from github PRs/issues (as I've had to use from time to time) would be great too. Of course, Github/Discord/Gitter would require some kind of filtering to make sure it is data worth ingesting into the LLM but if it can identify it was worth ingesting then perhaps it could also suggest to the documentation team something worth documenting.
2 comments

You nailed it. All of above sources are also super helpful for LLMs, but you have to be careful about how to ingest/parse them. For example for a Discourse Forum only including questions that have been marked "Resolved" by an official team member can work quite well. Same goes for Discord/Slack forums and GitHub Discussions etc.
A lot of innovation is common sense because oddly enough, common sense is not widely distributed.