Hacker News new | ask | show | jobs
by BlahGod420 1933 days ago
It's a data problem. Not a language model one.

Wolfram alpha will literally answer.

2 comments

Evaluating the quality of language models is a challenge in itself. This post just presents another way to see how much your model can understand alongside a new model. This is typical when presenting new tech for which old evaluation methods might not tell you anything.

It's not really about getting the answer to that question as it is about figuring out how much information your model can glean from text.

I tried "How heavy is half an elephant on Moon?" and all I got is a wiki-like page :/
That's a heck of a question to expect an AI to answer. It's weird (we almost never discuss half animals), oddly phrased ("on the moon" is much more standard), and runs afoul of an imprecision in English (we normally given masses when asked about weight, this requires an answer in force, or else by analogy a mass with equivalent Earth-weight).
A question that's not "weird" has been asked and answered before, so you can answer it with a database and zero "intelligence".

I think you're lowering expectations for an AI to zero.

It's tailored to be difficult, I took this course right after the question above was autosuggested to me.

Also while "the moon" could mean anything, there's only one Moon. I gave it a fighting chance.

Dunno if you're native but ".. on Moon" is not right. It should be "..on the Moon". Don't ask me why it is different for Earth and the Moon but it is!

I would expect any language model to be robust to minor errors like that though so I doubt it makes any difference.

It's because the name of the moon is "The Moon", not "Moon". I think SF writers sometimes pretend it's called "Luna" so it'll have a more interesting name.
It is "la luna" in spanish. So I don't think SF writers are pretending.
They pretend it will be called Luna (in English), because "The Moon" is a silly name when you routinely travel to lots of moons.
It's name is Moon, and it's referred to as "the moon".
It is Luna (Луна) in Russian
Thanks! I'm indeed not native :) In my own defense it "felt" like it was missing something. However I am confident there are many moons and just one Moon, our own.

Also, the answer is 1.62kN to 4.8kN, I know Wolfram doesn't employ a language model or anything of a sorts, but with all the other NLP magic I've seen I sort of expected a valid answer.

Bleh.. .English and it's weird grammar .. Sometimes I find it such a messed up language compared to my native tongue..... I guess it's the separation of modifiers and nouns that's part of the problem.
Seriously a downvote ?? for what?? ughh.... "bavula kappa" ==> frog in the well.. a metaphor for biased people(or people who live in a bubble).
It shouldn't really be that difficult. If it's capable of structuring "How heavy is _ on the moon?" The answer would be start out the same way looking for weight of _.

It seems to me that these kinds of questions could be handled but require some adjustments or tuning more than we need fundamentally different approaches. We can't do everything at once so there's going to be simple things that don't work for quite a while.

It's a question that a B+ student in high-school physics could answer with a few minutes of googling.

If that's beyond the capabilities of AI, then it obviously isn't living up to the hype.

Well, yes. A B+ high-school student can also reliably recognise the number 710. If you think that AI should be able to do something just because a high-school student can, I think it's your expectations which are out of touch.
My expectations are fine. The AI hype is out of touch.
It is a tough question, but we keep hearing from AI-boosters about how the Singularity is right around the corner and have you seen this AI that can make plausible-looking text?
Needs a supervisor for cheeky monkey curve ball playful question to interpret answer on gradient of seriousness from thousands of mind model with access to knowledge library and language as commonly used. The part before "on" doesn't sound out of place from a template for language exchange in a butcher's shop and "on Moon" is a playful reframing adjusting the physics model. A child lucky to have a parent with training in physics would get an answer easy to this question.
Or it simply needs to be augmented with something other than GPT. The form of my question is very easily solvable in a programmatic way, just not by a GPT based neural net. Someone mentioned multi model training, that sounds awesome!
> (elephant weight) / 2 * (moon gravity / earth gravity)

> 160 to 520 kg

There you go. Wolfram Alpha literally answered, just needed to translate your question to math ^_^

Nice answer, but it actually supports the point being made here. NLP is an attempt to see how far we can go in processing natural language, without doing this sort of translation, and without structured knowledge bases. As (I'm pretty sure) Wolfram Alpha contains a structured knowledge base and draws on it for this sort of question (though not without help in this case), its abilities are orthogonal to the issues addressed in the article.
A quibble is that when I see kg, mass, I expect that it should be measured the same whether on Earth or on the moon. The answer should be in Newtons.

Our query to Wolfram Alpha should thus be:

(mass of elephant)/2 * moon gravity

Good effort! But gram is not a unit of weight.
It was interesting to note that on typing "how heavy", the only search suggestion I got was "how heavy is an elephant".
Half an elephant is undefined. Which half are you counting? Each half doesn't necessarily have the same weight.
Actually "half" is pretty dang well defined :)
Well even if you take the most symmetric halves by volume, the density of an ideal elephant is not uniform. Some organs don’t come in pairs and all that

I guess it depends on how many hairs you want to split and how precise you want the answer

However most unpaired organs don’t contribute to a mass asymmetry. It’s thought this has evolved to aid with locomotion as any asymmetry will negative impact on efficiency
The answer can never be precise for the very reason that not all elephants weight the same (breed, age etc.).

I was looking for an average from the get-go, but a half is a half!

I agree with you that a reasonable answer is an average, but a half on a volume basis is not necessarily equivalent to a half on a mass basis
I disagree. You can have a left half and right half, but also a front half and rear half.

Half of the weight of an element would be well defined since it is just a number.

“Let’s assume a spherical elephant”