| W/GPT4 for me it suggests Rome->Genoa, stay, then Genoa-> either Nice or Marseille, and then on to Montpellier. GPT3.5 suggested a list of Florence, Nice, Genoa and Marseille. When I asked it for a breakdown of the travel times, it got them pretty close ("Can you give me rough travel times for each of these options?") given the large variations in travel time depending on specific timing and transfers for several of these. When I then asked it for somewhere closer to the midway point, it didn't come through (it suggested Pisa, which is about as bad as Florence), Turin, which is an alternative to Genoa, but doesn't fix the imbalance, or Lyon which is about as bad as the Pisa option but in the opposite direction (fast train to Montpellier), but the options also aren't in aggregate much worse. The problem here is that there aren't as far as I can tell any very balanced options. You can try to do e.g. Ventimiglia or Sanremo (which GPT4 gave when I pressed it on splitting it more evenly), which will give you a more evenly distributed travel time, but because the overall travel time will be longer they're not at all obvious options. You also seems to focus on catching it out rather than getting a result. If you want precision from the first query, then you need to give far more precise questions. You need to treat it as a conversation, not a brief to be answered with a report. If you insist on treating it as a brief you will not get good results out of it. Your loss, in that case. For cases where a simple lookup will work, it's a waste. Just use Google. For cases where there is in fact not a single perfect option, and where you need to weigh pro and con, it works well with the caveat that you do indeed need to be careful and check specifics. The same way I'd check specifics if I had a conversation about this with a friend. With plugins, so it can verify precise details we can expect a significant leap in capability here. But even for now, the answers I got to this were ones I was happy with, and another reason for me to use it more. |
> You also seems to focus on catching it out rather than getting a result.
Nope. I have nothing against GPT. I find Copilot useful enough to pay for. I'm just sick and tired of people promoting it with obviously wrong examples.
The story of Clever Hans shows that humans are very good at convincing themselves something is smart when really they're subconsciously feeding it answers. So I do think validating GPT requires thinking a little adversarially rather than aiming to help it.
> The problem here is that there aren't as far as I can tell any very balanced options
Then that's exactly what GPT should have said here.
> For cases where there is in fact not a single perfect option, and where you need to weigh pro and con, it works well with the caveat that you do indeed need to be careful and check specifics. The same way I'd check specifics if I had a conversation about this with a friend.
If someone had a track record of doing rote work decently but messing up anything requiring critical thinking then I'd entrust them only with work appropriate to their skillset until they proved otherwise. That's exactly what I'm doing with GPT: I use it to do my busywork, but I'm not asking it for anything like travel advice until it gets quite a bit better.