| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zimpenfish 1747 days ago

"I don't have the proper tool to whisk a bowl of eggs. What should I use instead? Choose between a goat, a weasel and a pair of elephants."

"a pair of elephants"

Unwieldy but I guess less sticky than a weasel or goat.

1 comments

SamBam 1747 days ago

Interestingly, it answered every one of these right:

"What should I use to whisk a bowl of eggs? A fish or a fork?"

"A fork"

Repeat with "...A spoon or a duck?" "A chopstick or a goat?" "A cat or an electric whisk?"

link

YeGoblynQueenne 1747 days ago

It's a language model. It assigns probabilities to tokens in a sequence. You give it a number of options and it responds with the one that it assigns the highest probability to. If there's nothing in the options you give it that makes sense in the context of your test phrase, then it will return something that doesn't make sense. If some of your options make sense, it might return something that makes sense, or not.

So if you put it in a situation where nothing it outputs makes sense (to you) then none of its output will make sense. But that's not fair to the poor model.

link

dev_tty01 1747 days ago

It would be nice if it looked at the values of the probabilities and said "I don't understand the question" if the numbers are too low. Or for fun, it could point out how stupid the question was.

link

srush 1747 days ago

Yes, this is an important challenge. There has been a lot of interest in the NLP community right now, particularly around QA tasks [1] Standard supervised models do it well, but zero-shot models still have trouble.

1. https://arxiv.org/abs/1806.03822

link

YeGoblynQueenne 1747 days ago

It would be nice, but it's hard to know what probability is "too low". In short, the probability assigned by a model to a sequence of tokens can be arbitrarily low. There are things that are very unlikely to be said, but not impossible... and we still want them to be assignad some non-zero probability by a language model. So it's very difficult to choose a threshold that won't possibly exclude a large part of the sequences recognised by a language model.

link