Hacker News new | ask | show | jobs
by vikp 1280 days ago
It's unclear to me how you could separate knowledge and reasoning:

- Reasoning typically requires base knowledge to work from. A side effect of training reasoning is embedding knowledge into the model parameters.

- Even if you offload the search portion (either through outputting special tokens that are postprocessed, or applying the model in multiple steps with postprocessing), you still need embedded knowledge for the model to decide what to search for, and then to successfully integrate that knowledge (in the multi-step case).

Maybe some kind of post-facto pruning of model weights?

7 comments

Reasoning is that which knows that it lacks some necessary knowledge, whereas knowledge isn't aware that it lacks some necessary reasoning.
Fascinating - makes me think of the distinction between dreaming and being awake. Only in the latter state one can tell the difference.
Not quite true, see lucid dreaming.
To be fair to the GP comment, I've personally never experienced lucid dreaming, so for me the distinction holds - when I'm dreaming, I never know I'm dreaming, and mostly I don't even remember my life awake, I only remember whatever's in the context of the dream.
That would be a third, distinct state from each. Dreaming while knowing that you are is quite a thrill!
This has to be one of the most insightful sentences I've ever read.
Would you mind expanding on it a bit? I do sincerely appreciate its pithiness, but curious to read it explained a bit further.
Think of it as: reasoning=computation, knowledge=data. Data alone doesn’t say it must be computed. But computation, by definition, is attempting to create data (the result) that doesn’t exist. Thus: knowledge isn’t aware it must be reasoned about, but reasoning knows it’s trying to find (deduce, compute) knowledge it lacks.
I disagree. If you have knowledge, and you don't try to do anything other than compress it to make space, reasoning about that knowledge will come about by sheer unintended consequence once the patterns your compressing on reach some threshold of sophistication.

By definition, an optimal compression algo is a dimensionality reduction algo. A dimensionality reduction algo lets you do a bunch of machine learning tasks.

In the world of large language models, what part of "reasoning" is hard-coded and what part, if any, is learnt?

Is reasoning simply a scan/search of your vector space (i.e. your knowledge) according to some hard-coded algo?

Thanks for that!
Doesn't seem to unreasonable to me. If I asked you "How old is Obama" and you had a data source which had the ages of every person, you don't need to know the answer from memory. Your reasoning tells you that to find out how old someone is you need to check the external resource and what to do with the info once you get it.
The required knowledge is around entity recognition, with "Obama" referring to the 44th POTUS, and not somebody else who happens to have the same surname (and there are multiple of them actually, at least 4 given his family).
This can clearly be guessed from a search as well. Popularity can be well defined, and in the case of Obama, there is clearly one much more popular than the others.
The model still needs to infer from the sentence the entity to look it up. It is also the case that this is a relatively simple example as 'Obama' refers to a single class of entities and there is not a lot of ambiguity around resolution of class, only resolution of specific entity.

Take this sentence:

> When was KitKat released?

I could refer to the sweet, or the Android OS. Vastly different classes, and the model here needs to "decide" to ask for more information to disambiguate the class, and if the class is the sweet, then it needs to disambiguate the taste particular flavour possibly, and even ask the geographic location.

Yes, but the amount of knowledge necessary to decide how to make those sorts of decisions is far smaller than the amount of knowledge necessary to answer all such questions.
And that's perfectly fine. Humans have exactly the same problem. They will get this wrong, and you will reply "no, I'm talking about the android version". Language is ambiguous so we cannot expect machines to get it right all the time.
I do agree with you that it is fine, what I was getting at was that there needs to be a way to measure uncertainty in a manner that is robust to unbalanced distributions or context drifting.
I hope nobody ever releases the famous Kitkat Club in Berlin from its chains. Because there are not so many.

My experience with ChatGPT is that it gets what I mean very well from the context.

Fair, although I’d think it acceptable if the response to your prompt was “which one? I found 4”
It would have way more than 4 people in the search results though. GP said there's at least 4 because there's him, his wife and his kids.

Even knowing Obama is a person is a knowledge-based leap. (To us humans) it's obvious the question means Barack Obama because he's the most notable subject for that name. But how do you prevent your AI from responding that the "Obama JS library is 5 years old"

https://github.com/rgbkrk/obama

If it's based on frequency of training data, the ex-president will have far more hits in the training corpus.

Now, on less talked about topics it doesn't sound any different than what happens with people

Q "How old is Tim?"

A "Which Tim are you talking about, you didn't give me crap to work with?"

The answer is obviously 62 because that's how old Tim Apple is.
Isn't it a sort of cultural knowledge to understand what is meant? Ie when we say Obama we mean one specific guy who is very important but if I'm talking about Mrs Watanabe it's a generic Japanese person?
You can already coax ChatGPT into interacting with external systems today; I set up a prompt where the model pretended to be a factory system on a communication bus. It could access its "inventory" by posting a prefixed message to the communication bus.

After a bit of prompt engineering the model could query inventory, "manufacture" various recipes, and store the end products in inventory.

It might be possible to look at the weight activations as it reasons through contacting the external system over the emulated communication bus? For a suitably varied set of commands you might be able to find a subset of weights that are most correlated to the task and prune the others. Then you'd be left with a model that can retrieve and store information, as well as perform reasoning tasks.

Still has problems with working memory (the input token limit, since the model is auto-regressive) given all the external information is coming back in via the prompt, but ChatGPT seems to handle that gracefully right now.

> It's unclear to me how you could separate knowledge and reasoning

yeah, me too. There are a few cases I'd point your attention to.

1, preschool toys, kids somehow manage to put the square peg in the square hole. I mean, they may chew on them or push them around, but there's a "moment of magic" when they make it all click together. Maybe there's some implicit knowledge there, I know I played games like that, but I don't remember.

2, sudoku. you don't really need to know anything, just make each line row and box different. no memorization, just look. but what about the rules? does that count as knowledge?

I've been reading some math books lately, and I think we're not alone. Coping with sets of sets is a hard question that people have been wondering about for, as far as I can tell, a long time.

For now, it's probably safe to say, knowledge about knowledge is different that just knowledge, and having one layer work on k1 and another layer work on k2 is ok. maybe someday add k3...kn. Other fields do that. Worth checking out.

I think, we could both get very fussy about what exactly that _means_. But for now, I'm happy to be charitable in my reading. I'd also expect them to run into some really thorny problems when they try to pin down exactly what's going on, just like everybody else does. For today, good for them. Seems like a nice win.

Different people might think diffrent, but when solving complex problems I do think I have seperate "think about / gather the facts" and "formulate the solution" phases.

I don't think about the totality of facts in the world - I think my brain is mentally extracting the facts that are relevant to the problem and then reason about those facts.

There is certinaly back/forth though, but I think I go "here is a bit of information, how does that apply? ok but what about this fact? ok here is how that would apply considering something else..." but I think this is still a gather -> solve -> gather -> solve

Humans (or at least I) have the ability to look up knowledge. What about an architecture where the main model has a background conversation with a knowledge model, just like I need to look up things I remember exist, but forget details off. Heck, such a knowledge model would have great value for people aswell.

Using written conversation as an interface between language models feels natural and completely bonkers at the same time.

I don’t know why you would want to separate them completely. You have some knowledge in your head, store some other in notes on your desk and some in a library (the kind with books in it).

You can make the model write to a local wiki when it encounters new information and read from its own wiki when it feels it needs to store stuff. You can also make it spend time randomly browsing the knowledge base reconsolidating it, reorganising and labeling it.

The architecture of the wiki doesn’t have to be “clever” in any way. It is just an old fasioned database with query function the model can write to and query from.

Reasoning can be defined in the abstract. Knowledge cannot.

For example, If I need A & B | D & E to get C I can reason that if I have B and want C, I need A or D & E.

Once I aquired this reasoning skill, I can apply it to any kind of "bool-sequence X required for Y" situation, regardless of what specificly X and Y are, or how many entities X encompasses.

Whereas if I know that a rocket engine requires an oxygen/methane mix to function, I cannot transfer that to the knowledge that I need a raincoat or umbrealla in order to avoid getting wet in the rain.