| I plugged that into whatever chat.openai.com is. > You also seems to focus on catching it out rather than getting a result. Nope. I have nothing against GPT. I find Copilot useful enough to pay for. I'm just sick and tired of people promoting it with obviously wrong examples. The story of Clever Hans shows that humans are very good at convincing themselves something is smart when really they're subconsciously feeding it answers. So I do think validating GPT requires thinking a little adversarially rather than aiming to help it. > The problem here is that there aren't as far as I can tell any very balanced options Then that's exactly what GPT should have said here. > For cases where there is in fact not a single perfect option, and where you need to weigh pro and con, it works well with the caveat that you do indeed need to be careful and check specifics. The same way I'd check specifics if I had a conversation about this with a friend. If someone had a track record of doing rote work decently but messing up anything requiring critical thinking then I'd entrust them only with work appropriate to their skillset until they proved otherwise. That's exactly what I'm doing with GPT: I use it to do my busywork, but I'm not asking it for anything like travel advice until it gets quite a bit better. |
In the example you picked apart, it got it close enough to be useful even though the answers have plenty of issues.
> The story of Clever Hans shows that humans are very good at convincing themselves something is smart when really they're subconsciously feeding it answers. So I do think validating GPT requires thinking a little adversarially rather than aiming to help it.
If the goal was to validate GPT, sure. But the goal above was not to validate GPT. The discussion was over whether it could be useful. That doesn't require "validating it". It just needs a rough understanding that is more right than wrong with respect to which types of queries are productive in producing results that saves us time without doing harm.
Yes, that means there are lots of applications where it's not suitable. That's fine.
> Then that's exactly what GPT should have said here.
I disagree. The question explicitly did not ask for that. It said it was too far to travel in one go, without explaining what the longest number of hours acceptable to travel in one day was. That a human might implicitly interpret it that way based on personal preferences might well be the case. But responding that there were no evenly split options would indicate a failure to carefully read the question. Explaining why the options did not split it evenly would be good (but GPT really would not be up to the job in this case).
Note that GPT still gets this plenty wrong, so I'm not suggesting it's up to scratch in this area.
Adding a constraint of no more than 8 hours per day, GPT3.5turbo (free ChatGPT) messes up (still suggests Florence, and gives the nonsensical suggestion of Avignon). GPT4 (paid ChatGPT only) suggests Genoa and Nice.
Lowering the threshold to 6 hours (which AFAIK is not possible), GPT3.5turbo gives the same broken set of options. GPT4 still suggests Genoa and Nice, wrongly claiming no more than 6 hours per day, so that is definitely a problem. In this case it should have said there's no way of doing that.
Trying to be more explicit about this (" We want to travel no more than 6 hours from Rome to the stopover, and no more than 6 hours from the stopover to Montpellier") does not help, so this is indeed a strong indicator that it struggles with this particular type of constraint and you shouldn't trust it on this subject other than to give ideas.
When thinking of what would make it get this right, it's not surprising: There likely aren't that many travel descriptions containing distances and travel times in its training data, and it can't read maps yet.
> If someone had a track record of doing rote work decently but messing up anything requiring critical thinking then I'd entrust them only with work appropriate to their skillset until they proved otherwise. That's exactly what I'm doing with GPT: I use it to do my busywork, but I'm not asking it for anything like travel advice until it gets quite a bit better.
That's exactly what I'm suggesting. Maybe with the extension that asking it "tell me how to do X" questions often works well, and that sometimes even questions where you know it'll mess up the details will give you enough ideas to go on. E.g. in this case, at least GPT4 gives reasonable options even though the travel times are messed up (when the plugins are opened up, hopefully this will improve significantly). That might not matter for a region you know, but for a region you don't, getting a list of cities to plug into route planners might still be worthwhile as long as they're more right than wrong.