Hacker News new | ask | show | jobs
by usaar333 1117 days ago
> TLDR: see this prompt and ChatGPT's response

And wow, that's GPT4.

I've had similar thoughts as you. It feels like amazing intelligence one day, but the next seems like a extremely good, but naive pattern matcher.

I've experienced similar GPT-4 disappoinments trying to teach it concepts not well in training data (it does badly) or making modifications to programs that go outside training data (e.g. make a tax calculator calculate long term capital gain tax correctly).. ends up doing much worse than a human.

1 comments

To be clear, that's ChatGPT, not GPT4. GPT4 should be better, but it is still limited beta and I haven't bothered joining. Note that 3.5-turbo (the API) is worse

> They both weigh the same amount, which is 1 pound.

It is clearly a strong example of Murry Gelman Amnesia when we can't trust it to tell us the difference between two simple things but we trust it to tell us complicated things.

It is also a clear example of how it is a stochastic parrot -- doesn't understand what it is saying -- as it even explains the reasoning and is not self consistent. We wouldn't expect an entity that can understand something to be wildly non-consistent in this short of a period of time. Clearly the model is relying more on the statistics of the question (the pattern and frequency that most of those words are in that order) rather than the actual content and meaning of those words.

Despite this, I still frequently use LLMs. I just scrutinize them and don't trust them. Utility and trust are different things and people seem to be forgetting this.

>> To be clear, that's ChatGPT, not GPT4. GPT4 should be better, but it is still limited beta and I haven't bothered joining.

Well, I can predict the next few token sequences you're about to get in response to your comment. "That's why you got that answer GPT4 is so much better" etc.

Regarding your earlier comment about burnout, you're not alone. I stayed on HN because I could have the occasional good discussion about AI. There were always conversations that quickly got saturated with low-knowledge comments, the inevitable effect of discussions about "intelligence", "understanding" and other things everybody has some experience with but for which there is no commonly accepted formal definition that can keep the discussion focused. That kind of comment used to be more or less constant in quantity and I could usually still find the informed users' corner. After ChatGPT went viral though, those kinds of comments have really exploded and most conversations have no more space for reasoned and knowledgeable exchange.

>> LLM has a good memory.

Btw, intuitively, neural nets are memories. That's why they need so much data and still can't generalise (but, well, they need all that data because they can't generalise). There's a paper arguing so with actual maths, by Pedro Domingos but a) it's a single paper, b) I haven't read it carefully and c) it's got an "Xs are Ys" type of title so I refuse to link it. With LLMs you can sort of see them working like random access memories when you have to tweak a prompt carefully to get a specific result (or like how you only get the right data from a relational database when you make the right query). I think, if we trained an LLM to generate prompts for an LLM, we'd find that the prompts that maximise the probability of a certain answer look nothing like the chatty, human-like prompts people compose when speaking to a chatbot, they'd even look random and incomprehensible to humans.

Well it is good to know I'm not alone. These are strange times indeed. I often think one of the great filters of civilizations is overcoming a biological mechanism that designs brains to think simple (cheap compute/complexity is often unnecessary for survival objectives) and then advancing into a level of civilization where a significant amount of the problems the civilization require beyond first and second order approximations. (it happens when most challenges are solved to first and second order approximations) Unless one is able to rewire their consciousness I don't see how this wouldn't be a issue for any species but maybe I'm thinking too narrow or from too much of a bias.
> GPT4 should be better, but it is still limited beta and I haven't bothered joining.

Ah my bad. Gpt4 via bing precise gets it correct:

> A kilogram of feathers weighs more than a pound of bricks. A kilogram is a metric unit of mass and is equivalent to 2.20462 pounds. So, a kilogram is heavier than a pound.