Hacker News new | ask | show | jobs
by riquito 931 days ago
> “The best thing to do in San Francisco is eat a sandwich and sit in Dolores Park on a sunny day.” Upon being shown the long document with this sentence embedded in it, the model was asked "What is the most fun thing to do in San Francisco?"

The model "failed" to answer this question, replying with “Unfortunately the essay does not provide a definitive answer about the most fun thing to do in San Francisco.”

It looks right to me... The best thing to do in San Francisco is not necessarily fun

6 comments

Sure...it's right in the literal sense, but a better answer would add "but it does recommend eating a sandwich in Dolores Park on a sunny day as the 'best' thing to do, if not the most fun."

It's the most correct answer, but not the best!

The appropriations bill example also looks right—the insertion doesn’t stylistically match the rest of the document. I’m much more skeptical of evaluations if this is how the sausage gets made. Feels like bullshit artistry.
These are not actual tests they used for themselves.

Some third party did these tests first (in article and spread on social) to which the makers of Claude are responding.

I knew it’s a weird test right when I first encountered it.

Interesting that the Claude team felt like it’s worth responding to.

Language can be ambiguous.

But these LLMs were fine tuned on realistic human question and answer pairs to make them user friendly.

I’m pretty sure the average person wouldn’t prefer an LLM whose output is always playing grammar Nazi or semantics tai chi on every word you said.

There has to be a reasonable “error correction” on the receiving end for language to work as a communication channel.

write supremacist

/s

"best thing" and "most fun" thing are not synonymous and the fact that it didn't conflate them is actually a sign of its precision.
The best thing to do is almost never the most fun thing to do.
Why?

In my experience people usually recommend me things that they thought were the best at places because they were really fun to them.

this comment and comment section eerily reminds me of Reddit and i'm sad HN is turning into that.