| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ben_w 586 days ago

> They can't handle counter-intuitive but absolutely logical cases like how eggplants and potatoes belong to same biological family but not radishes

"Can't" you say. "Does", I say: https://chatgpt.com/c/6735b10c-4c28-8011-ab2d-602b51b59a3e

Not that it matters, this isn't a demonstration of reasoning, it's a demonstration of knowledge.

A better test would be if it can be fooled by statistics that have political aspects, so I went with the recent Veritasium video on this, and at least with my custom instructions, it goes off and does actual maths by calling out to the python code interpreter, so that's not going to demonstrate anything by itself: https://chatgpt.com/share/6735b727-f168-8011-94f7-a5ef8d3610...

But this then taints the "how would ${group member} respond to this?"; if I convince it to not do real statistics and give me a purely word-based answer, you can see the same kinds of narratives that you see actual humans give when presented with this kind of info: https://chatgpt.com/share/6735b80f-ed50-8011-991f-bccf8e8b95...

> They're language models. It's in the name. They work like one.

Yes, they are.

Lojban is also a language.

Look, I'm not claiming they're fantastic at maths (at least when you stop them from using tools), but the biasing I'm talking about is part of language as it is used: the definition of "nurse" may not be gendered, but people are more likely to assume a nurse is a woman than a man, and that's absolutely a thing these models (and even their predecessors like Word2Vec) pick up on:

https://chanind.github.io/word2vec-gender-bias-explorer/#/qu...

(from: https://chanind.github.io/nlp/2021/06/10/word2vec-gender-bia...)

This is the kind of de-bias and re-bias I mean.

1 comments

numpad0 586 days ago

> "Can't" you say. "Does", I say:

Have you seriously not seen them make this kinds of grave mistakes? That's too much kool-aid you're taking.

link

ben_w 586 days ago

I literally gave you a link to a ChatGPT session where it did what you said it can't do.

And rather than use that as a basis for claiming that it's reasoning, I'm also saying the test that you proposed and which I falsified, wasn't actually about reasoning.

Not sure what that would even be in a kook-aid themed metaphor in this case… "You said that drink was poisoned with something that would make our heads explode, Dave drank some and he's fine, but also poison doesn't do that and if the real poison is α-Amanitin we wouldn't even notice problems for at about a day"?

link