|
|
|
|
|
by IanCal
1034 days ago
|
|
Have you had a look at othello gpt? https://thegradient.pub/othello/ It's a nice constrained example of a transformer learning a world model, not just looking up responses. > It's impressive that it knows the difference between "how many are 5 more apples than 10" compared to "how many percent are 5 apples of 10" (I don't know if it does, just assuming). But the first release also tried to reason why the weight of 1 pound of nails depends with the simple prompt "how much do 1 pound of nails weigh". That's most likely a perfect example of it mashing the classic "what weighs more, 1 pound of nails or 1 pound of feathers". Is there a formulation here that would get to a point where you'd think it's not just mashing things together? Are there elements of a simple question that would be required? Here's a slightly trickier one for it "Which weighs more, a pound of feathers or balloons made from one pound of rubber then filled with 100g of helium?" https://chat.openai.com/share/b841c96f-e46c-4adf-8ec3-8778ff... |
|
It uses context to figure out we're trying to convert something to something else. Then it adds all those numbers up. Taking helium into consideration is no doubt interesting, but they've also polished that task since that was the common critique they got so very wrong with the first release (which I mentioned they had fixed). I'm not qualified to assess this part of the answer;
> "If the balloons displace more than 100g of air when filled with helium, then they would effectively weigh less than if they were left empty. If they displace exactly 100g of air, then the balloons would have the same weight as if they were left empty."
I don't know enough to understand how much 100g of helium is and how it behaves. And it doesn't try to explain it to me, it mentions it then takes the easy route assuming it's a trick question. What does that tell you? I guess there are similar discussions around and it gives me the summary. Why doesn't it tell me how much air it displaces under what circumstances? Temperature etc, it should be easy if it's not just a simple discussion on a random forum. A conversion regex could do it.
This comment[1] has a very impressive example. But anything I'm qualified to assess has mostly been meh. If the fix is better training data does that mean it's reasoning or regurgitating? The mistakes it makes are what tells me how it works, not when it tricks me that it's correct. To me it's a very well polished search engine summary.
[1]: https://news.ycombinator.com/item?id=37219351