| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by earthboundkid 1063 days ago
	An LLM is a tool. It is a very versatile tool. It can be used in many situations. It does not therefore follow that it should be used in all situations. Even if you wanted to use an AI to solve sudoku, there is no particular reason to begin with a model trained for language modeling instead of a model better suited to the task.

2 comments

joe_the_user 1063 days ago

I would, uh, bet that you're right.

But given that there has been a lot of discussion of the possibility that an LLM has "general intelligence", it seems worthwhile to figure out whether the solving of a random problem is possible.

link

sfn42 1061 days ago

They don't possess general intelligence, end of discussion. Thanks for attending my TED talk.

link

earthboundkid 1061 days ago

Seriously. Will someone make general intelligence by gluing together an LLM and some other AI stuff? I dunno, maybe. But currently existing LLMs don’t have GI and it’s really easy to show this by chatting with them and asking them GI questions not in the training data.

link

thumbuddy 1063 days ago

I don't get it there are so many ways to solve sudokus why does anyone care about this anyways?

link

coldtea 1063 days ago

Well, it's not really about finding a way to solve sudokus.

Nobody involved in this cares for that as a goal in itself.

It's about the mystery of why an LLM can't do it well.

It's about the challenge of finding a way (prompt) to get it to.

It's about what this reveals about the inner workings and limitations of an LLM.

link

thumbuddy 1063 days ago

So maybe I think about things a little differently, but is there a theoretical reason why we should expect a large language model to be good at sudokus? I remember not long ago they often struggled with adding two numbers

link

coldtea 1063 days ago

>is there a theoretical reason why we should expect a large language model to be good at sudokus

Because LLMs have shown the ability to be good at many tasks not directly related to language, and even exhibited some crude "general intelligence" traits.

So, some people would like to find how far this can be pushed, and why it works for e.g. a lot of tasks involving abstract manipulation of symbols and logical analysis, but not for a basic enough and clear goal like solving a simple sudoku.

link

objektif 1063 days ago

What tasks would you say LLMs are good at that are not related to language?

link

skwirl 1063 days ago

It's very hard to define what is and is not "related to language" and this is kind of a fundamental question that seemed to get a lot of attention in the 20th century. Maybe these language models can help shine some light on that.

According to OpenAI, GPT-4 scores 4 on AP Calculus BC, 5 on AP Statistics, 4 on AP Chemistry, 4 on AP Physics 2. But is mathematical/logical reasoning largely a language task? I don't really know. I feel pretty confident saying that riding a bike is not a language task, but logical reasoning, I'm not so sure.

link

skwirl 1063 days ago

LLMs are good at a lot of things we don't have a good reason to expect them to be good at. It's very hard to come up with "theoretical reasons" it should be good at things, in "theory" they should not be nearly as capable as they are. Even NLP researchers have been shocked at how well this has worked.

link

thumbuddy 1063 days ago

If there is no theory, or expected result why should anyone care what it's good at or not? You kinda get what you get and if you don't get what you want you do what?

link

jyap 1063 days ago

It’s just a well known problem case that has a straightforward answer that is easily verifiable.

Eg. Can a model play tic-tac-toe or solve chess puzzles

link

thumbuddy 1063 days ago

I feel like it's kind of a weird question because if you change the random seed enough times maybe one of them could be good at chess puzzles but suck at being a chat bot, or be good at sudokus but be a horrible pair programmer. I don't know what value a lot of these questions bring once a model hits a trillion parameters of which none or very very few are understood.

link