| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gkk 865 days ago

Hi hansonkd,

I'm working on Hotseat - a legal Q&A service where we put regulations in a hot seat and let people ask sophisticated questions. My experience aligns with your comment that vanilla GPT often performs poorly when answering questions about documents. However, if you combine focused effort on squeezing GPT's performance with product design, you can go pretty far.

I wonder if you have written about specific failure modes you've seen in answering qs from documents? I'd love to check whether Hotseat is handling them well.

If you'r curious, I've written about some of the design choices we've made on our way to creating a compelling product experience: https://gkk.dev/posts/the-anatomy-of-hotseats-ai/

2 comments

hansonkd 864 days ago

Thanks for the response. I will check it out.

Specific failure modes can be something as simple as extraction of beneficiary information from a Trust document. Sometimes it works, but a lot of times it doesn't even with startups with AI products specific to extracting information from documents. For example it will have an incomplete list of beneficiaries, or if there are contingent beneficiaries, it won't know what to do. Not even a hard question about the contingency. Just making a simple list with percentages of if no-one dies what is the distribution.

Further trying to get an AI to describe the contingency is a crap shoot.

While I expect these options to get better and better, I have fun trying them out and seeing what basic thing will break. :)

link

gkk 864 days ago

Thanks for the response! I'm not familiar with Trust documents but I asked ChatGPT about them: https://chat.openai.com/share/c9d86363-b64a-4e44-9fd4-1d5b18...

If the example is representative, I see two problems: a simple extraction of information that is laid out bare (list of beneficiaries), and reasoning to interpret the section of contingent beneficiaries and connect it facts from other parts. Is that correct?

If that's the case, then Hotseat is miles ahead when it comes to analyzing regulations (from the civil law tradition, which is different from the US), and dealing with the categories of problems you mentioned.

link

DanielSantos 865 days ago

Your post is very interesting. Thanks for sharing.

If your focus is narrow enough the vanilla gpt can still provide good enough results. We narrow down the scope for the gpt and ask it to answer binary questions. With that we get good results.

Your approach is better for supporting broader questions. We support that as well and there the results aren’t as good.

link

gkk 864 days ago

Thanks for reading it! I agree that binary questions are easy enough for vanilla GPT to answer. If your problem space fits them - great. Sadly, the space I'm in doesn't have an easy mode!

link