| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tempestn 1141 days ago

Yeah, I saw something similar in a reply to another comment. I don't think it would be quite as bad as that because it's not just completing the phrase in a vacuum though, but in the context of the prompt. So if the prompt was "where was GW born", then "in Virginia" would be much more likely than "in 1732". But I do understand that there would often be multiple ways to word the same thing, or multiple correct answers to the same prompt.

In the case of multiple wordings of the same thing, I wonder if there could be a way to determine closeness of responses, and consider them together when calculating confidence. As a simple example, if responses have the same rare words (like 1732) and differs only in the sentence order or the more common words ("in", etc.) used, those would be more similar than ones that used different rare words. So perhaps that could be accounted for.

As for multiple correct answers to the same prompt, I think that's fine. The confidence of a response might be low because it's one correct answer of many, or because the model has no idea and it's taking a wild-ass guess. But the user asking the question probably has an idea of whether what's being asked is very common knowledge or something obscure or controversial. At least much of the time. And even if the metric wasn't perfect, I still feel it could be useful.

Of course this is all the rambling of someone who doesn't really know anything about this stuff. You could just say I'm spitting out some likely tokens I guess; consider the confidence low.

1 comments

kgwgk 1141 days ago

You’re right, there are ways to tackle this problem but they may require some case-by-case effort to define what you are trying to find out and to incorporate information external to the model itself. Not fairly trivial :-)

link

tempestn 1141 days ago

Ha, I mean it would be fairly trivial to output "a confidence factor of some sort". It just becomes less trivial when you try to actually make it useful!

link