Hacker News new | ask | show | jobs
by worksonmine 1034 days ago
I don't see why not. It's not taking a single answer from a database no, it's taking several based on probability and merging them into what it thinks we're looking for. If you learn to multiply with code to perform one task, you can then apply that knowledge for a completely different task. It may look like solving a completely new problem but the LLM doesn't even see the difference.

When you use the term "custom library" that might be you over-complicating the task. It's still just looking up function to do x, function to do y and applying it to the output. Don't get me wrong it's impressive where we're at but there's no need to exaggerate it as magic.

1 comments

> . It's still just looking up function to do x, function to do y and applying it to the output.

I mean no, no it isn't.

I'm giving it info on how to construct data models with a custom library, so interacting with that is not using anything previously stored, and then giving it businesses/tasks to model as simple human descriptions.

If you tell me that something which

* Takes a human description of a problem

* Describes back to me the overall structure and components required to solve it with a hierarchy

* Converts that into code, correctly identifying where it makes sense for an address to be contained within a model or distinct and referenced

* Correctly reuses previously created classes that are obviously not in its original dataset

has no understanding or reasoning and it just regurgitating things it's seen before simply mashed together, I don't know what to say.

Frankly

> it's taking several based on probability and merging them into what it thinks we're looking for. I

Sounds pretty much like understanding and reasoning to me.

> but there's no need to exaggerate it as magic.

I'm absolutely not saying there's magic. Humans aren't magic and they can do reasoning. I'm saying it's not just looking up text and regurgitating it.

I think this is supported by things like othello-gpt, which builds an internal world model and outputs based on that.

It's difficult for me to assess how original your library is without examples, maybe I could find the exact implementation on github within 30 minutes. But I've yet to see anything that isn't just mashing together stackoverflow and git repositories to save time. I get the same answers with less wordy fluff from a simple search, but I also know where to look.

It's impressive that it knows the difference between "how many are 5 more apples than 10" compared to "how many percent are 5 apples of 10" (I don't know if it does, just assuming). But the first release also tried to reason why the weight of 1 pound of nails depends with the simple prompt "how much do 1 pound of nails weigh". That's most likely a perfect example of it mashing the classic "what weighs more, 1 pound of nails or 1 pound of feathers".

It IS just looking in a database, and mashing it with some fluff. I'm happy to be proven wrong but I need more than your word for it. My experience is that as the topic gets more niche (less data in the training set) the worse the answers I get and it starts making things up based on probability. It doesn't reason in the sense I assume you're expecting.

Have you had a look at othello gpt? https://thegradient.pub/othello/

It's a nice constrained example of a transformer learning a world model, not just looking up responses.

> It's impressive that it knows the difference between "how many are 5 more apples than 10" compared to "how many percent are 5 apples of 10" (I don't know if it does, just assuming). But the first release also tried to reason why the weight of 1 pound of nails depends with the simple prompt "how much do 1 pound of nails weigh". That's most likely a perfect example of it mashing the classic "what weighs more, 1 pound of nails or 1 pound of feathers".

Is there a formulation here that would get to a point where you'd think it's not just mashing things together? Are there elements of a simple question that would be required?

Here's a slightly trickier one for it "Which weighs more, a pound of feathers or balloons made from one pound of rubber then filled with 100g of helium?"

https://chat.openai.com/share/b841c96f-e46c-4adf-8ec3-8778ff...

Very impressive, but is it any more original than classic search engines' old trick of regular expressions to figure out if I mean the currency or weight when I ask "1 pound =" with the contexts USD or kg after "="? Does it understand the input, or are there just enough discussions in the training data to make it look like it is? I'm not convinced it's not the latter.

It uses context to figure out we're trying to convert something to something else. Then it adds all those numbers up. Taking helium into consideration is no doubt interesting, but they've also polished that task since that was the common critique they got so very wrong with the first release (which I mentioned they had fixed). I'm not qualified to assess this part of the answer;

> "If the balloons displace more than 100g of air when filled with helium, then they would effectively weigh less than if they were left empty. If they displace exactly 100g of air, then the balloons would have the same weight as if they were left empty."

I don't know enough to understand how much 100g of helium is and how it behaves. And it doesn't try to explain it to me, it mentions it then takes the easy route assuming it's a trick question. What does that tell you? I guess there are similar discussions around and it gives me the summary. Why doesn't it tell me how much air it displaces under what circumstances? Temperature etc, it should be easy if it's not just a simple discussion on a random forum. A conversion regex could do it.

This comment[1] has a very impressive example. But anything I'm qualified to assess has mostly been meh. If the fix is better training data does that mean it's reasoning or regurgitating? The mistakes it makes are what tells me how it works, not when it tricks me that it's correct. To me it's a very well polished search engine summary.

[1]: https://news.ycombinator.com/item?id=37219351

If you've not looked at it I really recommend othello gpt. That is an experiment explicitly designed to tackle this kind of question, has it just seen enough moves that it knows what should come next?

> Why doesn't it tell me how much air it displaces under what circumstances?

You can ask it and it'll answer.

> If the fix is better training data does that mean it's reasoning or regurgitating?

More training data helps with things you can just bring to the fore, same as a lot of learning. More useful training data though can also help reasoning, which makes sense - deliberate training of people helps improve their logical reasoning capabilities. I know that doesn't guarantee that's what LLMs are doing but humans benefit significantly from both more teaching and better teaching.

> Very impressive, but is it any more original than classic search engines' old trick of regular expressions to figure out if I mean the currency or weight when I ask "1 pound =" with the contexts USD or kg after "="? Does it understand the input, or are there just enough discussions in the training data to make it look like it is?

I'd be interested to know any requirements around this to clearly show the difference. I tried asking what if I filled a balloon at a childrens party with a gas made of atoms that have 1 proton and 100 neutrons: https://chat.openai.com/share/71224df4-5c6c-45f7-88fd-eec316...

(tl;dr: "In the context of a child's birthday party, introducing such a balloon would be a grave mistake.")

It identifies:

* Whether it would float or not and why * That it would be radioactive, and likely types of radiation from it * What that would mean to the balloon * How people would react and the likely consequences of releasing it in a room of children

This is an element that does not exist, in a setting where nothing like this has happened before, with details ranging from types of decay, consequences and human emotional reactions to something like this. Yes, there are real things you can use as a base (e.g. how do people respond to events that kill people), but I feel it's an example of where it's beyond a search engine summary.

> If you've not looked at it I really recommend othello gpt.

I skimmed it and read the conclusion, and it looks interesting, will take a closer look when I have time.

The prompt covers a subject that goes completely over my head so I can't tell how well it reasons. I don't know what 1 proton to 100 neutrons means, but I gather it's radioactive. I don't think it's far fetched that it draws the same conclusion from the training set because to you it seems obvious, and is probably well known to anyone who knows the subject. Kind of like it would understand that "hotter than the sun" is super hot, can correlate to different melting points. But I wouldn't say it understands the concept of temperature. Given the right prompt it might give you the impression it does.

The feelings of the scenario reads like any PR comment after a tragedy. "We feel shock and disbelief" and so on. The scenario being hypothetical doesn't change that since it's probabilities. It acts just like you'd think it would. The earlier example with the helium balloon is similar, it assumes a human context and not the form, and environment the helium is in. True intelligence might not even consider the presence of atmosphere as the norm. "It has no weight outside of your human constraints" would be novel.

Lets say it has odd numbers between 1-9 in the database. Given the prompt 2 and 8 you will get back 1,3,7,9, sprinkled with some natural language and we get the impression it's intelligent.

Are you saying it understands the effect the neutron to proton ratio has, as opposed to just comparing the vectors closest to your prompt that it builds the answer from? Being tested on new and hypothetical examples only means it will be further from the vectors but still close enough to give us the impression it understands the subject. If the training data didn't include the words neutron or proton it would have no idea where to begin.

In my first comment that started this chain I said:

> I don't see why not. It's not taking a single answer from a database no, it's taking several based on probability and merging them into what it thinks we're looking for.

I don't think even this latest answer is any proof of anything other than that. Are you claiming there is? And what are you claiming is happening?