Hacker News new | ask | show | jobs
by nearbuy 1129 days ago
Still fairly impressive. Probably better than most people could do if given 60 seconds, but probably worse than most people if given 10 minutes.
3 comments

I would rate a person who provides no sentence at all as performing significantly better, and I suspect most people could pretty quickly come up with something.
> I would rate a person who provides no sentence at all as performing significantly better

Why?

> I suspect most people could pretty quickly come up with something

It only takes 60 seconds to test that on yourself. It's not that easy to come up with something of similar length to ChatGPT's answer that also sounds somewhat natural/sensible.

>Why?

For the same reason that "I don't know" is generally a better response than bullshitting.

>It's not that easy to come up with something of similar length to ChatGPT's answer that also sounds somewhat natural/sensible

Those weren't requirements.

> Those weren't requirements.

Then it seems we don't disagree on anything concrete. You're just using a different rating system than me when I judge it as impressive compared to what an average person would produce in 60 seconds.

Not sure if this is a general principle of yours. If ChatGPT were able to write a 1000 word essay using all 5-letter words except for a single mistake, would you still find it unimpressive? Do you think it a tool or person who makes minor mistakes isn't useful? Or only when a tool/person makes major mistakes?

ChatGPT wasn't asked to be impressive, it was asked to write a single sentence containing only five-letter words. I think that a tool that is unreliable is significantly less useful than a tool than is reliable and that, all other things being equal, a tool that fails in difficult to verify ways is less reliable than one that fails in easy to verify ways.
I agree with all of that.

I guess I interpreted your first response as disagreeing with my comment, when you were actually just bringing up a different topic.

>I would rate a person who provides no sentence at all as performing significantly better

The logic failure in the above statement is probably worse than the logic failure of not being able to spontaneously compose a phrase with just 5-letter words - and slipping in one or two with a higher word-count.

>I suspect most people could pretty quickly come up with something

You'd be very surprised then. Most people fail at even more basic tasks.

Heck, most candidate programmers fail at fizz-buzz (not that more difficult than the above)

>The logic failure in the above statement

And which alleged logic failure is that?

The idea that making a mistake but otherwise fulfilling most of the task is worse than failing to perform any part of it.

Especially in the context of "evaluating the performance of something".

Let's expand this a little to make it even more evident: if the task was "make a paragraph of 100 words using only 5 letter words" and an AI couldn't produce anything at all, whereas another came up with a paragraph of 100 words, except a couple of them had 6 or 4 letters, it would make absolutely no sense to rate the first as "better" than the second in performing the task.

As for understanding the task, the latter exhibits an understanding of it (since it produced a paragraph, and most of the words it used filled the criteria, which wouldn't happen if it chose them randomly), it just made a couple of mistakes (the kind of humans could easily make too in such a task). For the former we can't even be sure if it even understood the task at all.

We don't rate humans that way on performing tasks either (if they got it less than perfect it's worse than not doing it at all). Even math tests at the university level consider the approach and any partial results in the right direction, don't just mark it 0 if there's an error, nor give a higher mark to students who didn't produce anything.

>The idea that making a mistake but otherwise fulfilling most of the task is worse than failing to perform any part of it.

The are many contexts in which correctness is important. In such contexts, an incorrect answer is often worse than an explicit non-answer.

>We don't rate humans that way on performing tasks either (if they got it less than perfect it's worse than not doing it at all). Even math tests at the university

Standardized tests often rate incorrect answers worse than non-answers, though yes a university maths test in particular isn't likely to be that sort of test.

That's wrong.

(An example of a sentence with only five letter words I wrote in less than 60 seconds)

I wasn't clear on how was using "better". Your example is better in that it fulfills the requirement, but I don't think it's as impressive as ChatGPT's answer. How long would it take to make a sentence that is at least 7 words (and also making sense, and ideally sounding good)?
In 5-10 minutes I came up with "Alarm! Naked actor moons queen below (under?) fruit trees, later hides under cheap hotel floor".

Note that I used one of those minutes to get a list of all 4 and 5 letter words, which I'm not sure whether the rules allow or not.

It would take me longer to write an interesting, longer sentence that complied with the rules. But I'd remind you that GPT failed.
"That's" is not one word.
This isn't something that can be usefully discussed. "Word" has a vague enough definition that a contraction can validly be considered one or two words. If you try and look to linguistics you'll just see they use specialized words with stricter definitions.

Regardless it's more reasonable for me to say "that's" is a five letter word than it is for the AI to say "spells" is a five letter word.

I don't think that is true.
I tried, this is what I came up with under significant time pressure:

Happy books sound great.

It was very difficult to think of a plural verb with 5 letters, and once I realized that was an issue, I was worried that I wouldn't have enough time to come up with a singular noun that would fit any of the singular verbs that I was considering (reads, seems).

Interestingly, this is the exact same mistake that ChatGPT made! It has "spell" -> "spells" which is a plurality / correctness of sentence mistake.

My sentence is technically correct and could be used plausibly in conversation: "What kind of books do you want to read?" "Happy books sound great."

But it's a pretty weak sentence. Being restricted from articles makes it very difficult to get agreement.

Or....."I don't think that is true."

;)

Or "See Spot run."