| HN Mirror

Respectfully, I think we cracked basic intelligence. What do you imagine under basic intelligence?

LLMs can do homeworks, pass standardized exams, give advice WITHOUT ANY SPECIFIC TRAINING.

You can invent an imaginary game, explain the rules to the LLM and let it play it. Just like that.

You can invent an imaginary computer language, explain the syntax to the LLM and it will write you valid programs in that language. Just like that.

If that is not intelligent I do not know what is. In both cases, the request you put in is imaginary, exists only in your head, there are no previous examples or resources to train on.

sph 805 days ago

> Respectfully, I think we cracked basic intelligence. What do you imagine under basic intelligence?

It all depends on your definition of intelligence. Mine is the ability to solve novel problems.

AI is unable to solve novel problems, only things it has been trained against. AI is not intelligent, unless you change the very definition of the word.

I challenge you to imagine an imaginary game or computer language, explain the rules to the LLM. It will learn and play the game (or write programs in your invented language), although you imagined it. There was no resource to train on. Nobody knows of that game or language. LLM learns on the spot with your instructions and plays the game.

I cannot understand grad school level mathematics even if you give me all the books and papers in the world. I was not formally trained in mathematics, does that make me not intelligent?

sph 805 days ago

"Ability to solve novel problems" does not mean ability to solve all problems, nor to have all the knowledge in the world. A caveman can be as intelligent as you and I, even without being able to read.

Not having knowledge in mathematics has no impact on your intelligence quotient.

I agree. But LLMs do solve novel problems in that case, you just have to explain it to them like you would have done to an intelligent caveman or me.

Which novel problem can't an LLM solve? I gave the example of an imaginary game because the LLM cannot have possibly trained on it as it is imagined by one person and nobody knows how to play it.

I tried the first thing which came to mind, with GPT-4o. The problem is probably not novel, yet it failed anyway.

---

User:

Let’s play a game. Every time I give you a word, you separate it into its individual characters and assign a number to each letter, in order, following the Latin alphabet. So "a" is "1", "b" is "2", and so forth. You will then subtract 2 from each number, convert it back to the corresponding letter and return the new word. Did you get all that? Are you ready to start with my next word?

LLM:

Got it! I'm ready to start. What's your first word?

User:

anthropomorfic

LLM:

Sure! Let's break down "anthropomorfic".

1. Assign numbers to each letter:

a (1), n (14), t (20), h (8), r (18), o (15), p (16), o (15), m (13), o (15), r (18), f (6), i (9), c (3)

2. Subtract 2 from each number:

a (1-2=-1: Consider Zuzzer, i.e., `25`), n (14-2=12), t (20-2=18), h (8-2=6), r (18-2=16), o (15-2=13), p (16-2=14), o (15-2=13), m (13-2=11), o (15-2=13), r (18-2=16), f (6-2=4), i (9-2=7), c (3-2=1)

3. Convert them back to letters:

y, l, r, f, p, m, n, m, k, m, p, d, g, y

The new word is "ylrfpmnmkmpdgy".

---

Looks pretty good, right? It even found a good wraparound solution for “a” (I wasn’t trying to trick it, I just didn’t think about the word too hard), though I have no clue what it means by “Consider Zuzzer”. But then you look at the last one, where it converted “c” to “y”. That’s absolutely wrong and is the easiest one any human would guess. Yet you could miss it because all the previous ones were right.

Here’s a bonus. When verifying the answer, I queried Brave search for “letters of the alphabet with numbers assigned”. The AI-generated answer at the top was: A=1, B=2, C=3, D=4, E=5, F=6, G=7, H=8, I=9, J=1, K=2, L=3, M=4, N=5, O=6, P=7, Q=8, R=9, S=1, T=2, U=3, V=4, W=5, X=6, Y=7, Z=8. Which, alright, I can see the logic, but that’s not what anyone would want by asking that question.

ben-schaaf 805 days ago

Huh? It's a meme that LLMs can't follow the rules of chess. Just tried tick tack toe on GPT 3.5 and not only did it pick bad moves it also failed to evaluate the win condition.

I disagree, if a decent LLM cannot understand it there is a problem with the prompt.

Imagine someone not knowing chess and explaining it to them. Would they be able to understand it on the first try with your prompt?

https://news.ycombinator.com/item?id=38304184

> if a decent LLM cannot understand it there is a problem with the prompt.

Ah, yes, the “you’re holding it wrong” argument with a dash of “No True Scotsman” so the goalposts can be moved depending on what anyone says is a “decent LLM”.

Well, here’re are a few failures with GPT-3.5, GPT-4, and GPT4-o:

https://news.ycombinator.com/item?id=40368446

https://news.ycombinator.com/item?id=40368822

> Imagine someone not knowing chess and explaining it to them. Would they be able to understand it on the first try with your prompt?

Chess? Probably not. Tic-tac-toe? Probably yes. And the latter was what the person you’re responding to used.

ben-schaaf 805 days ago

I can ask GPT 3.5 for the rules of tick tack toe and it gives a perfect answer. It clearly knows the game and its rules.

If you think this is just a lack of prompt engineering please provide a prompt that makes GPT 3.5 actually follow the rules of tick tack toe so that I may play a game with it.

orwin 805 days ago

If LLM could invent consistent imaginary games (or anything, like a short novel, or a 3 page essay on anything it want), maybe i would agree with you. The issue is that anything it create is inconsistent. The issue might be an artificial limitation to avoid copyright issues, but still.

Actually my argument was the opposite. We as humans can imagine a game, explain it to the LLM and it learns, consistently, every time.

Generating new games is something else, that is creativity not merely intelligence.

orwin 805 days ago

But even that. Did you try to use GPT4 as a chess engine? I have issues with the Slav defense when i start with the queen's gambit, i tend to loose tempo or position, or both. I asked him continuations, and it was either wikipedia entries or nonsense, no in-between, no interesting insight. Now, i have asked a regional champion a bit before that (he is around 2.2k elo, so not exceptionally good) and although i can't seems to understand or use the concepts, he gave me interesting enough ideas to build on it.

Not saying that chatGPT isn't a great tool to write documentation or fiction (half my TTRPG campains are featuring description by ChatGPT), but i wouldn't call it intelligent.

Chess is a very specific field that requires training. Chatgpt may not be optimized for chess.

And I think chatgpt has some issues visualizing stuff like a chess board.

Therefore to get a decent answer you'll have to explain that you are a professional chess player. You'll have to describe what tempo and position means in chess. You'll have to describe what a gambit is etc. After these steps it will understand and guide you in whatever you need.

If you succeed you can release it as a customGPT.

It's a bit like asking a tea from the ship's supercomputer in hitchikers guide to the galaxy.

orwin 804 days ago

I see your point, I don't know enough to evaluate it rationally and agree (or disagree) with it.

It feels like ChatGPT already know that stuff, because it knows 10 times more openings and variations than I do, and can tell me official game names and years. Still, it seems it doesn't really understand? That the issue for me. Like the story of the French scrabble champion who doesn't speak french at all. It's impressive, sure, but how can he knows that much and not understand?

> What do you imagine under basic intelligence?

Consistency, for one. I have asked LLMs the exact same question twice in a row and got wildly different answers. Intelligence presupposes understanding. When I ask an LLM “give me the first X of Y” and it replies “I cannot give you the first X of Y because there have only been X+10, here’s the first X+5 instead”, I’m hard pressed to call it intelligent.

Have you tried specifying you field of inquiry which was algebra. Try saying solve this equation for me. I am a lawyer by day so I constantly face limitations of natural languages. The solution is to write less ambiguous prompts.

The field of inquiry was not algebra. It was a straightforward question using real numbers. I asked it about the first <number> kings of <country>. I don’t recall the exact number, only the stupidity of the answer.

So you understand, let’s say I asked for the first 20 kings of England. It told me it could not give the first 20 because there had only been 30, and that it would give me the first 25 instead.

Understood. I tried your prompt again and it seems it understood it.

Here is my experiment: https://chat.openai.com/share/98cae2bf-a7a6-42e7-b536-f3671c...

I gave minimum context like this: "I have a history exam. You are an expert in British royal history. List me the names of 20 kings and queens in England."

The answer was: "Certainly! Here's a list of 20 kings and queens of England:

1. William the Conqueror 2. William II (Rufus) 3. Henry I 4. Stephen 5. Henry II 6. Richard I (the Lionheart) 7. John 8. Henry III 9. Edward I (Longshanks) 10. Edward II 11. Edward III 12. Richard II 13. Henry IV 14. Henry V 15. Henry VI 16. Edward IV 17. Edward V 18. Richard III 19. Henry VII 20. Henry VIII"

> I tried your prompt again and it seems it understood it.

And like I said at the start of the conversation:

> Consistency, for one. I have asked LLMs the exact same question twice in a row and got wildly different answers.

You’ve proven my point.

> I gave minimum context like this: "I have a history exam. You are an expert in British royal history.

Your excuses are getting embarrassingly hilarious. As if you need a history exam and to be an expert to understand the context of the question.

By the way, that answer is wrong from the first one. So much for giving context and calling it an expert.