|
|
|
|
|
by stevenhuang
1204 days ago
|
|
You're right, it seems particularly bad in navigation. I lead with some general queries on the NYC system, asked it for the common stops for service lines F and A, and it also hallucinated. The wiki for the service lines have pretty complete representations here, so this data should be suitably represented in its training corpus. Would be interested in how a multimodal LLM such as PaLM-E (trained on maps, etc) fair in these sorts of queries. https://www.reddit.com/r/MachineLearning/comments/11krgp4/r_... |
|
Here's another one that fails spectacularly. The digits 0-9 drawn as an ASCII 7 segment display. It gets it mostly correct, but it throws in a few erroneous non-numbers and repeated/disordered/forgotten numbers. Asking it for ASCII drawings of simple objects can really go off the rails quickly.
The fault mode is very consistent. When a prompt forces it to be specific and accurate on an unambiguous topic for 10 or more line items, it will virtually always hallucinate at least one or two. Especially if the topic is too simple to hide behind a complex answer. Even if its learned not to hallucinate 90% of the time, and even if that's good enough to pass at first glance, within a list of 10 things it only has a 35% chance of not hallucinating any of them.
For what its worth, it did very well on law questions. Try as I might, it refused to accept that there's a legal category known as "praiseworthy homicide". Though, I suspect this has less to do with the underlying model and more to do with openAI paying special attention to profitable classes of queries.
I'm sorry to say, I think the problem maybe more intrinsic to the current approach to AI. Frankly, LLM works unreasonably beyond my expectations, but its making up for a lack of a real first principles theory of cognition with absurd amounts of parameters and training.