| This was a fun read. I’m not a AI expert by any means. I’m also ESL. Please bear with me. However the inaccuracy threshold seems fine for a museum, but in enterprise operations inaccuracy can mean lost revenue or worse lost trust and future business flow. I’m struggling with some more advanced AI use cases in my collaborative work platform. I use AI (LLMs) for things like summarizations, communication, finding information using embedding. However, sometimes it is completely wrong. To test this I spent a few days (doing something unrelated) building up a recipes database and then trying to query it for things like “I want to make a quick and easy drink”. I ran the data through classification and other steps to get as good data as I could. The results would still include fries or some other food result when I’m asking for drinks. So I have to ask what the heck am I doing wrong? Again, for things like sending messages and reminders or coming up with descriptions, and finding old messages that match some input - no problem. But if I have data that I’m augmenting with additional information (trying to attach more information that maybe missing but possible to deduce from what’s available) to try and enable richer workflows I’m always being bit in the butt. I feel like if I can figure this out I can provide way more value. Not sure if what I said makes sense. |
Not sure either. But here is the lesson from this and other sources. To improve the output use multistep approach. Get the first answer, one or more, and pass it through the second verification step(s). Like 'for this * is this *' relevant? Or is it correct, does it solve the problem, etc.. Then select the answer with the best scores on the filters. You see, it's very similar to that in the original post. Get first candidates, filter.