|
|
|
|
|
by godelski
1120 days ago
|
|
To be clear, that's ChatGPT, not GPT4. GPT4 should be better, but it is still limited beta and I haven't bothered joining. Note that 3.5-turbo (the API) is worse > They both weigh the same amount, which is 1 pound. It is clearly a strong example of Murry Gelman Amnesia when we can't trust it to tell us the difference between two simple things but we trust it to tell us complicated things. It is also a clear example of how it is a stochastic parrot -- doesn't understand what it is saying -- as it even explains the reasoning and is not self consistent. We wouldn't expect an entity that can understand something to be wildly non-consistent in this short of a period of time. Clearly the model is relying more on the statistics of the question (the pattern and frequency that most of those words are in that order) rather than the actual content and meaning of those words. Despite this, I still frequently use LLMs. I just scrutinize them and don't trust them. Utility and trust are different things and people seem to be forgetting this. |
|
Well, I can predict the next few token sequences you're about to get in response to your comment. "That's why you got that answer GPT4 is so much better" etc.
Regarding your earlier comment about burnout, you're not alone. I stayed on HN because I could have the occasional good discussion about AI. There were always conversations that quickly got saturated with low-knowledge comments, the inevitable effect of discussions about "intelligence", "understanding" and other things everybody has some experience with but for which there is no commonly accepted formal definition that can keep the discussion focused. That kind of comment used to be more or less constant in quantity and I could usually still find the informed users' corner. After ChatGPT went viral though, those kinds of comments have really exploded and most conversations have no more space for reasoned and knowledgeable exchange.
>> LLM has a good memory.
Btw, intuitively, neural nets are memories. That's why they need so much data and still can't generalise (but, well, they need all that data because they can't generalise). There's a paper arguing so with actual maths, by Pedro Domingos but a) it's a single paper, b) I haven't read it carefully and c) it's got an "Xs are Ys" type of title so I refuse to link it. With LLMs you can sort of see them working like random access memories when you have to tweak a prompt carefully to get a specific result (or like how you only get the right data from a relational database when you make the right query). I think, if we trained an LLM to generate prompts for an LLM, we'd find that the prompts that maximise the probability of a certain answer look nothing like the chatty, human-like prompts people compose when speaking to a chatbot, they'd even look random and incomprehensible to humans.