| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dcre 656 days ago

The lack of detail here makes this post pretty useless, though I guess I’m not surprised generic docs bots aren’t that great.

Without knowing any more detail than “We got in touch with a few docs bot services and set up demos that were trained on our docs and blog posts.” it is hard to generalize to RAG + chat in general. I’ve had very good results with a custom setup that uses Claude Haiku to narrow down the set of relevant docs for a question and then 3.5 Sonnet to answer it. The corpus is on the small side, so no vector embeddings or even text search are required — the trick is understanding the different kinds of docs involved (OpenAPI schemas, hand-written guides) and writing code that abbreviates them in an appropriate way for the retrieval/narrowing step to work well.

I also manually tuned the system prompts to get the kind of answers I want and avoid the ones I don’t. I imagine off-the-shelf solutions are mostly lacking this customization, and they kind of can’t add it, because if they do, you’d be wondering what the value-add is and why you don’t build the same thing yourself in a couple of days. I’m sure techniques will improve, and it’s possible that turnkey solutions will be decent eventually.

I also think the distinction between supervised and unsupervised is misapplied here at the end, even accepting the colloquial use of a technical term. A docs tool powered by a bunch of hand-written documents and a custom system prompt, with a person asking questions of it — that doesn’t sound very unsupervised.

2 comments

dardarbinks 656 days ago

I don't think I mean to indict RAG + chat in general! I think it's totally possible that, if we put more work in, we'd get a great bot out.

But the bar is so, so high though. It's gotta be a truly great bot for us to not be scared of misleading our new users. And I'm still worried that "truly great" is going to take a LOT of work.

And for now, that's the problem. We're still a startup with limited resources. This tool isn't ready for us because we don't have the bandwidth to put the work in.

I can't wait til that bar drops, though. GPT 4o is a really solid step in that direction.

link

dcre 656 days ago

That much I will concede. I said we’ve had good results, but we’ve still been a bit scared to roll it out, more for potential cost and polish reasons than baseline quality, but of course I’m still worried about it saying something wrong.

link

dardarbinks 656 days ago

Oh yeah, and I was worried about the "supervised/unsupervised" comment you made.

I'm not talking about supervised training. I think I mean to say that the OUTPUT is supervised/unsupervised. Like, I'm an experienced programmer, so I can supervise the output of Copilot, unlike our unexperienced docs users.

That's on me for not making that train of thought clear enough, and unfortunately choosing a term that's already in use by the AI/ML industry.

Added a footnote to clarify

link

phillipcarter 656 days ago

Yeah, I had some promising results in a project that split markdown-based docs by second-level headers, embedding them all, and then doing basic RAG with GPT-4 serving a response. It was too slow at the time (June last year) but I'll probably pick it back up again this year.

The main things I took away were (1) if the information archictecture isn't very splittable, this gets too hard, and (2) always link back to source information.

link

dcre 656 days ago

Agreed on both counts. I do the same thing with headings and I use the results of the retrieval step to display a list of relevant docs while the answer is generating.

The latest models are way better and faster than GPT-4 was. You’ll probably be happy when you get back into it.

link