Hacker News new | ask | show | jobs
by roenxi 171 days ago
As far as I can tell from poking people on HN about what "AGI" means, there might be a general belief that the median human is not intelligent. Given that the current batch of models apparently isn't AGI I'm struggling to see a clean test of what AGI might be that a human can pass.
3 comments

LLMs may appear to do well on certain programming tasks on which they are trained intensively, but they are incredibly weak. If you try to use an LLM to generate, for example, a story, you will find that it will make unimaginable mistakes. If you ask an LLM to analyze a conversation from the internet it will misrepresent the positions of the participants, often restating things so that they mean something different or making mistakes about who said what in a way that humans never do. The longer the exchange the more these problems are exacerbated.

We are incredibly far from AGI.

We do have AI systems that write stories [0]. They work. The quality might not be spectacular but if you've ever gone out and spent time reading fanfiction you'd have to agree there are a lot of rather terrible human writers too (bless them). It still hits this issue that if we want LLMs to compete with the best of humanity then they aren't there yet, but that means defining human intelligence as something that most people don't have access to.

> If you ask an LLM to analyze a conversation from the internet it will misrepresent the positions of the participants, often restating things so that they mean something different or making mistakes about who said what in a way that humans never do.

AI transcription & summary seems to be a strong point of the models so I don't know what exactly you're trying to get to with this one. If you have evidence for that I'd actually be quite interested because humans are so bad at representing what other people said on the internet it seems like it should be an easy win for an AI. Humans typically have some wild interpretations of what other people write that cannot be supported from what was written.

[0] https://github.com/google-deepmind/dramatron

I haven't tried Dramatron, but my experience is that it isn't possible to do sensibly. With regard to the second part

>AI transcription & summary seems to be a strong point of the models so I don't know what exactly you're trying to get to with this one. If you have evidence for that I'd actually be quite interested because humans are so bad at representing what other people said on the internet it seems like it should be an easy win for an AI. Humans typically have some wild interpretations of what other people write that cannot be supported from what was written.

Transcription and summarization is indeed fine, but try posting a longer reddit or HN discussion you've been part of into any model of your choice and ask it to analyze it, and you will see severe errors very soon. It will consistently misrepresent the views expressed and it doesn't really matter what model you go for. They can't do it.

I can see why they'd struggle, I'm not sure what you're trying to ask the model to do. What type of analysis are you expecting? If the model is supposed to represent the views expressed that would be a summary. If you aren't asking it for a summary what do you want it to do? Do you literally mean you want the model to perform conversational analysis (ie, https://en.wikipedia.org/wiki/Conversation_analysis#Method)?
Usually I use the format "Analyze the following ...".

For simple discussions this is fine. For complex discussions, especially when people get into conflict-- whether that conflict is really complex or not, problems usually result. The big problems are that the model will misquote or misrepresent views-- attempted paraphrases that actually change the meaning, the ordinary hallucinations etc.

For stories the confusion is much greater. Much of it is due to the basic way LLMs work: stories have dialogue, so if the premise contains people not being able to speak each other's language problems come very soon. I remember asking some recent Microsoft Copilot variant to write some portal scenario-- some guys on vacation to Teneriffe rent a catamaran and end up falling through a hole in the world of ASoIAF and into the seas off Essos, where they obviously have a terrible time, and it kept forgetting that they don't know English.

This is of course not obviously relevant for what Copilot is intended for, but I feel that if you actually try this you will understand how far we are from something like AGI, because if things like OpenAIs or whoever's systems were in fact close, this would be close too. If we were close we'd probably see silly errors too, but it'd be different kinds of errors, things like not telling you the story you want, not ignoring core instructions or failing to understand conversations.

Your points about misquotes and language troubles are very valid and interesting. But a word of caution on your prompt: you’re asking a lot of the word “analyze” here; if the LLM responded that the thread had 15 comments by 10 unique authors, and a total of 2000 characters, I would classify that as a completely satisfactory answer (assuming the figures were correct) based on the query
> Usually I use the format "Analyze the following ...".

It doesn't surprise me that you're getting nonsense, that is an ill-formed request. The AI can't fulfil it because it isn't asking it to do anything. I'm in the same boat as an AI would be, I can't tell what outcome you want. I'd probably interpret it as "summarise this conversation" if someone asked that of me, but you seem to agree that AI are good at summery tasks so that doesn't seem like it would be what you want. If I had my troll hat on I'd give you a frequency analysis of the letters and call it a day which is more passive-aggressive than I'd expect of the AI, they tend to just blather when they get a vague setup. They aren't psychic, it is necessary to give them instructions to carry out.

> We are incredibly far from AGI.

This and we don't actually know what the foundation models are for AGI, we're just assuming LLMs are it.

This seems distant from my experience. Modern LLMs are superb at summarisation, far better than most people.
> there might be a general belief that the median human is not intelligent

This is to deconstruct the question.

I don't think it's even wrong - a lot of people are doing things, making decisions, living life perfectly normally, successfully even, without applying intelligence in a personal way. Those with socially accredited 'intelligence' would be the worst offenders imo - they do not apply their intelligence personally but simply massage themselves and others towards consensus. Which is ultimately materially beneficial to them - so why not?

For me 'intelligence' would be knowing why you are doing what you are doing without dismissing the question with reference to 'convention', 'consensus', someone/something else. Computers can only do an imitation of this sort of answer. People stand a chance of answering it.

>knowing why you are doing what you are doing[...] Computers can only do an imitation of this sort of answer. People stand a chance of answering it.

I'm not following. A computer's "why" is a written program, surely that is the most clear expression of its intent you could ask for?

A computer doesn't determine the why, it is programmed to do so. It doesn't determine meaning or value from whatever-it-is.
Did you mean it doesn't set its own goals? Or what did you mean by "determine the why" if not a stack trace of its motivations(which is to say, its programming)? Could you give an example of determinimg meaning or value?
Yes, set its own goals. Here's an example - say you wanted to track your spending, you might create a spreadsheet to do so. The spreadsheet won't write itself. If you want, you could perhaps task an ai to monitor and track spending - but it doesn't care. It is the human that cares/feels/values whatever-it-is. Computers are not that type.

Is your position that humans are pretty mechanistic, and simply playing out their programming, like computers? And that they can provide a stacktrace for what they do?

If so, this is what I was getting at with my initial comment. Most people do not apply their intelligence personally - they are simply playing out the goals that we inserted into them (by parents, society). There are alternative possibilities, but it seems that most people's operational procedures and actions are not something they have considered or actively sought.

>Is your position that humans are [...] simply playing out their programming?

Yes, at least it's what I wanted to drill further into.

Boiled down, I'm interested in hearing where "intelligent" people derive their motivations(I'm in agreement that most people are on ["non-intelligent" if you will] auto-pilot most of the time) if not from outside themselves, in your framework.

When does a goal start being my intelligent own goal? Any impetus for something can be traced back to not-yourself: I might decide to start tracking my spending, but that decision doesn't form out of the void. Maybe I value frugality, but I did not create that value in myself. It was instilled in me by experience, or my peers, etc. I see no way for one to "spontaneously" form a motivation, or if I wanted to take it one step further(into the Buddha's territory), I would have to question who, and where, and what this "self" even is.

Being an intelligent being is not the same as being considered intelligent relative to the rest of your species. I think we’re just looking to create an intelligence, meaning, having the attributes that make a being intelligent, which mostly are the ability to reason and learn. I think the being might take over from there no?

With humans, the speed and ease with which we learn and reason is capped. I think a very dumb intelligence with stay dumb for not very long because every resource will be spent in making it smarter.

Why would the dumb intelligence be less constrained than a human in making itself smarter?
I have yet to see an LLM with hands, feet, or eyeballs.

Currently, LLMs require hooks and active engagement with humans to ‘do’ anything. Including learn.

> every resource will be spent in making it smarter

The root motivation on which every resource will be spent is simply and very obviously to make a profit.