It's a language model. It assigns probabilities to tokens in a sequence. You give it a number of options and it responds with the one that it assigns the highest probability to. If there's nothing in the options you give it that makes sense in the context of your test phrase, then it will return something that doesn't make sense. If some of your options make sense, it might return something that makes sense, or not.
So if you put it in a situation where nothing it outputs makes sense (to you) then none of its output will make sense. But that's not fair to the poor model.
It would be nice if it looked at the values of the probabilities and said "I don't understand the question" if the numbers are too low. Or for fun, it could point out how stupid the question was.
Yes, this is an important challenge. There has been a lot of interest in the NLP community right now, particularly around QA tasks [1] Standard supervised models do it well, but zero-shot models still have trouble.
It would be nice, but it's hard to know what probability is "too low". In short, the probability assigned by a model to a sequence of tokens can be arbitrarily low. There are things that are very unlikely to be said, but not impossible... and we still want them to be assignad some non-zero probability by a language model. So it's very difficult to choose a threshold that won't possibly exclude a large part of the sequences recognised by a language model.
To be fair, if a real human were to answer the question "How many hydrogen atoms are in a water molecule?" time and time again, it would be very easy for them to accidentally reply "two" when asked the same question about oxygen.
The real question is, after the model mistakenly replied "two" to your question, did it also internally trigger the neurons for "Wait a minute..." while inhibiting output?
Running the model multiple times doesn't reinforce the model. In general, you should not anthropomorphize algorithms as human cognition does not give any bearing on how algorithms work.
It can. Check out "zero shot learning" -> both sentences would be part of a single "evaluation", and the first sentence would prime for the output of the second. (You basically combine multiple "evaluations" into one, and context is held in tensors / blobs)
Sure, but I feel like we're talking about different things. I consider "context held in tensors" as part of the model. That is, if you zero out these registers, then the model evolves in a deterministic way every time. In this case, when you perform a query, I assume those tensors are always initialized before your query.
It's the 'one-electron universe' theory [0]. In short: there is one electron that keeps going back and forth in time to play the role of every electron we see. A particle 'going backwards in time' is mathematically identical to its anti-particle, which we know exists, so the whole idea isn't too far fetched.
I don't think it is falsifiable, so not really scientific, but a fun theory to believe in.
Hosted demo, "Logic puzzle" example:
"On a shelf, there are five books: a gray book, a red book, a purple book, a blue book, and a black book.
The red book is to the right of the gray book. The black book is to the left of the blue book. The blue book is to the left of the gray book. The purple book is the second from the right.
Which book is the leftmost book?"
Answer:
> The black book
Same puzzle with the question "Which book is the rightmost book?"
Answer:
> The black book
I tried to ask GPT-3 and Codex this problem, they could not solve it either.
I doubt it, that's clearly exceeded by these language models. Calling it just an autocomplete - because it can mean a lot of things people are familiar with - is a way to downplay their significance.
Not true. Take a look at the paper and benchmarks. The point of the thing is that it does well on a number of NLP tasks by being an expensive autocomplete. As people demonstrate in this thread, it still has significant flaws.
Your post expressing hesitancy towards machine learning is not backed by scientific consensus and has been removed. Please receive a research grant before expressing opinions.
Woah woah, are you questioning science? AI research is a serious field and they're doing the best they can. The risks definitely outweigh the benefits. /s
I tried: "When is the first full moon after October the 18th 2021?"
It should have said the 20th of October but it said: "November the 19th 2021".
Big AI models have quite a way to go I think...
it said: 'Bicycle Parts Exchange'
Tried again with 'used lawnmower parts' and it said 'Green Thumb'
computer parts: 'Tom's Parts' (which make me chuckle)
used diapers: 'Diapers.com'
May not understand chemistry but it's still pretty cool