Hacker News new | ask | show | jobs
by saghul 792 days ago
Kinda surprised the 8B gets this wrong: "what's heavier a kilo of steel or two kilos of feathers?" GPT-3.5 gets it wrong too. The 70B model does get it right, so does GPT-4.
3 comments

My pet question is “Which weighs more, 1000cm^3 of styrofoam or 1cm^3 of tungsten?”

Most LLMs go through the calculation and find the styrofoam is heavier, then confidently announces that the tungsten weighs more. Strange considering it’ll say something very nearly like “The styrofoam weighs 50 g and the tungsten weighs 19.3 g, therefore the tungsten is heavier.”

That's not how it responded to my query.

> What's heavier? 1 kg of lead or 2 kg of feathers?

That's a classic trick question!

The answer is: 2 kg of feathers.

Why? Because 2 kg is heavier than 1 kg, regardless of the material. The density of the material doesn't matter in this case, only the weight. So, 2 kg of feathers would weigh more than 1 kg of lead.

I stand corrected. I was inadvertently on the 70b model.
That seems to be the general experience. Maybe 8B are just too few parameters to achieve higher level reasoning.
Maybe depth rather than parameter count.