Hacker News new | ask | show | jobs
by vidarh 1175 days ago
I didn't suggest you have anything against GPT, but that the way you're interacting with it is counterproductive if you want it to be useful to you like the person you first replied to rather than finding flaws in it. It's easy to find flaws in it. E.g. I just had a lengthy "argument" with it about the right directions between two places in Nice out of morbid curiosity of why it got the first question so wildly wrong (conclusion: It knows of lots of roads. It does not yet know how most of them are connected, which neighbourhoods they are in, which direction they go, or how to route; this should be unsurprising as it's unlikely to be in its training data; and that's fine, just don't use it for that).

In the example you picked apart, it got it close enough to be useful even though the answers have plenty of issues.

> The story of Clever Hans shows that humans are very good at convincing themselves something is smart when really they're subconsciously feeding it answers. So I do think validating GPT requires thinking a little adversarially rather than aiming to help it.

If the goal was to validate GPT, sure. But the goal above was not to validate GPT. The discussion was over whether it could be useful. That doesn't require "validating it". It just needs a rough understanding that is more right than wrong with respect to which types of queries are productive in producing results that saves us time without doing harm.

Yes, that means there are lots of applications where it's not suitable. That's fine.

> Then that's exactly what GPT should have said here.

I disagree. The question explicitly did not ask for that. It said it was too far to travel in one go, without explaining what the longest number of hours acceptable to travel in one day was. That a human might implicitly interpret it that way based on personal preferences might well be the case. But responding that there were no evenly split options would indicate a failure to carefully read the question. Explaining why the options did not split it evenly would be good (but GPT really would not be up to the job in this case).

Note that GPT still gets this plenty wrong, so I'm not suggesting it's up to scratch in this area.

Adding a constraint of no more than 8 hours per day, GPT3.5turbo (free ChatGPT) messes up (still suggests Florence, and gives the nonsensical suggestion of Avignon). GPT4 (paid ChatGPT only) suggests Genoa and Nice.

Lowering the threshold to 6 hours (which AFAIK is not possible), GPT3.5turbo gives the same broken set of options. GPT4 still suggests Genoa and Nice, wrongly claiming no more than 6 hours per day, so that is definitely a problem. In this case it should have said there's no way of doing that.

Trying to be more explicit about this (" We want to travel no more than 6 hours from Rome to the stopover, and no more than 6 hours from the stopover to Montpellier") does not help, so this is indeed a strong indicator that it struggles with this particular type of constraint and you shouldn't trust it on this subject other than to give ideas.

When thinking of what would make it get this right, it's not surprising: There likely aren't that many travel descriptions containing distances and travel times in its training data, and it can't read maps yet.

> If someone had a track record of doing rote work decently but messing up anything requiring critical thinking then I'd entrust them only with work appropriate to their skillset until they proved otherwise. That's exactly what I'm doing with GPT: I use it to do my busywork, but I'm not asking it for anything like travel advice until it gets quite a bit better.

That's exactly what I'm suggesting. Maybe with the extension that asking it "tell me how to do X" questions often works well, and that sometimes even questions where you know it'll mess up the details will give you enough ideas to go on. E.g. in this case, at least GPT4 gives reasonable options even though the travel times are messed up (when the plugins are opened up, hopefully this will improve significantly). That might not matter for a region you know, but for a region you don't, getting a list of cities to plug into route planners might still be worthwhile as long as they're more right than wrong.

1 comments

> but that the way you're interacting with it is counterproductive if you want it to be useful to you like the person you first replied to rather than finding flaws in it

I can't speak to the answer the person I replied to got, because they didn't post it. If they got the answer I got asking the same question the answer they got wouldn't have been useful to them.

It's only counterproductive if I'm wrong. If I'm right that it's not yet useful for this sort of thing I'd only waste my time giving it more chances.

> in this case, at least GPT4 gives reasonable options even though the travel times are messed up

The reasonability of the options fundamentally depended on specific facts in this case. Mixing up a 3 hour train ride and a 12 hour train ride ruined the answer. So the answer I got from ChatGPT was fundamentally broken.