Hacker News new | ask | show | jobs
by dcre 656 days ago
The lack of detail here makes this post pretty useless, though I guess I’m not surprised generic docs bots aren’t that great.

Without knowing any more detail than “We got in touch with a few docs bot services and set up demos that were trained on our docs and blog posts.” it is hard to generalize to RAG + chat in general. I’ve had very good results with a custom setup that uses Claude Haiku to narrow down the set of relevant docs for a question and then 3.5 Sonnet to answer it. The corpus is on the small side, so no vector embeddings or even text search are required — the trick is understanding the different kinds of docs involved (OpenAPI schemas, hand-written guides) and writing code that abbreviates them in an appropriate way for the retrieval/narrowing step to work well.

I also manually tuned the system prompts to get the kind of answers I want and avoid the ones I don’t. I imagine off-the-shelf solutions are mostly lacking this customization, and they kind of can’t add it, because if they do, you’d be wondering what the value-add is and why you don’t build the same thing yourself in a couple of days. I’m sure techniques will improve, and it’s possible that turnkey solutions will be decent eventually.

I also think the distinction between supervised and unsupervised is misapplied here at the end, even accepting the colloquial use of a technical term. A docs tool powered by a bunch of hand-written documents and a custom system prompt, with a person asking questions of it — that doesn’t sound very unsupervised.

2 comments

I don't think I mean to indict RAG + chat in general! I think it's totally possible that, if we put more work in, we'd get a great bot out.

But the bar is so, so high though. It's gotta be a truly great bot for us to not be scared of misleading our new users. And I'm still worried that "truly great" is going to take a LOT of work.

And for now, that's the problem. We're still a startup with limited resources. This tool isn't ready for us because we don't have the bandwidth to put the work in.

I can't wait til that bar drops, though. GPT 4o is a really solid step in that direction.

That much I will concede. I said we’ve had good results, but we’ve still been a bit scared to roll it out, more for potential cost and polish reasons than baseline quality, but of course I’m still worried about it saying something wrong.
Oh yeah, and I was worried about the "supervised/unsupervised" comment you made.

I'm not talking about supervised training. I think I mean to say that the OUTPUT is supervised/unsupervised. Like, I'm an experienced programmer, so I can supervise the output of Copilot, unlike our unexperienced docs users.

That's on me for not making that train of thought clear enough, and unfortunately choosing a term that's already in use by the AI/ML industry.

Added a footnote to clarify

Yeah, I had some promising results in a project that split markdown-based docs by second-level headers, embedding them all, and then doing basic RAG with GPT-4 serving a response. It was too slow at the time (June last year) but I'll probably pick it back up again this year.

The main things I took away were (1) if the information archictecture isn't very splittable, this gets too hard, and (2) always link back to source information.

Agreed on both counts. I do the same thing with headings and I use the results of the retrieval step to display a list of relevant docs while the answer is generating.

The latest models are way better and faster than GPT-4 was. You’ll probably be happy when you get back into it.