Hacker News new | ask | show | jobs
by tom1337 371 days ago
Interesting approach using Gemini for checking whether the answer helped. I wonder how that would work if the response is hallucinated.

As an example: Stripe has sandboxes, I wanted to know how to create them via the CLI but found nothing in the docs. Asked the AI and it was like "Just do stripe sandboxes create <name>". The command does not exist and the whole functionality is not planned to be added. If another AI now reads this it could interpret it as answered successfully?

Just thinking - but I like the idea of the tool. Would love to use it but currently don't have any public docs to test with. Good luck with your project :)

1 comments

As an example: Stripe has sandboxes, I wanted to know how to create them via the CLI but found nothing in the docs. Asked the AI and it was like "Just do stripe sandboxes create <name>". The command does not exist and the whole functionality is not planned to be added. If another AI now reads this it could interpret it as answered successfully?

--> I would say so. Not quite sure what Stripe is using on their docs, but sounds like its been given free reign and not grounded in actual docs.

Easiest way is to weed out hallucinations, hasnt been a problem for me at all to be honest.

P.S. For anyone reading, happy to refund all answers that you deem shouldnt be billed.

Just tested the tool on an open source documentation. Looks and feels great but somehow after the initial Preview in the onboarding flow I cannot get the Preview to work. It seems to always answer with the wrong dataset and suddenly says "The provided documents describe Agionic, an Al tool for documentation, and do not contain details about UI component libraries like Grid." Not sure whats happening here.

https://imgur.com/a/XiVmosl

I'm guessing the page was refreshed and the preview lost context to the right customer ID...

Thanks for trying it!