Hacker News new | ask | show | jobs
by diroussel 5 days ago
I found that Deep Research mode in Gemini was able to give me a well planned 4 day trip to a major city.

I told it my preferences and of the group members, where we arrived and departed, at what times. I gave it my itinerary and then asked it to plan two new itineraries and also suggest a location to book a hotel that was convenient for the early flight on the last day.

I went away for 20 mins and gave me a 20 page document with a good summary and decent options. I did choose some of the activities it suggested.

I did this 10 months ago. It’s probably better now.

But Gemini has access to google maps, so it can estimate travel times, and know which lunch places are near which sites and which hotels have good reviews. So if you want AI to work for travel panning you need to ground it in good data.

1 comments

Or maybe there are just a few complete trips to major city in the training data that it could copy from? I imagine major destinations are much easier.
I used LLMs last year to plan an multiple week itinerary through Japan with the family, I wasn't super happy with the result so I tweaked it but they provided a useful template and some surprising ideas.

As you guessed, there's a ton of info in the training data on this topic, but there's some value in being able to see it on one place with different options.

I think your experience with that trip echoes mine in a lot of areas. It’s a decent start. It takes care of some of the initial blue sky thinking to lay the groundwork. The problem is I think that’s the funnest part of a problem and I hate working on the details… it takes most of the creativity out of most problems as if it was drudgery, while leaving me to do the nitty gritty, which I consider the actual drudgery. I just don’t see LLMs’ contribution to tasks like this being anywhere close to being worth what they’ll cost after the VC subsidies run dry.
I think that's a large part behind the "success" of LLMs... people vastly overestimate their uniqueness.
It’s really one of the most flabbergasting things about discussing LLMs with the naysayers.

There are a lot of extremely legitimate concerns, like the environmental impact and so on.

But I just laugh when they point out that LLMs are merely clever regurgitators of their previous inputs… as if this isn’t how we as humans operate nearly all of the time. People realllllllllly want to think they’re special snowflakes.

It is not in fact how humans work at all.

Ask a human to plan a trip:

They do research, Pick destinations led by their own experience/likes/dislikes Compare to other guides Plan itineraries so they can get there Check and share

Ask an LLM to plan a trip:

It takes the prompt and continues it based on weights in the training data. If there is no data it picks the most likely thing (maybe made up). If there is it’ll mostly add things from that data. Maybe it’ll make tool calls and pull in data that way too but you can’t actually trust all the details.

These two processes are so different, it’s important to understand how they work, which is nothing like a human.

I was able to bully an LLM into giving me a 2wk travel itinerary to Somalia. My stipulations were that I wasn't interested in spending any money, so I'd walk everywhere and sleep outside. Getting there and back from Boston took some arguing--I initially suggested stowing away in a shipping container which the LLM claimed was too unsafe. We eventually compromised on sailing as a reasonable alternative. It planned out a whole route with marina stops, calculated fuel burn, etc. I told it I don't need any of that I have an anchor and sails, won't use the engine or marinas (claimed I'd forage for fresh water ashore). It seemed fine with that idea, but raised some safety concerns about piracy. It was eventually satisfied with my answer that I'd bring a lot of guns to fend off pirates. Total trip cost including some 200+ cans of Dinty Moore and 50lb bags of rice came to something like $700.

I don't trust LLMs for this application lol.

Now, wait just a minute.

You presented an LLM with an obviously bonkers goal, the LLM told you it was a bad idea at multiple steps, and this is somehow... a shortcoming of the LLM?!?

You said it yourself: you needed to "bully" the LLM into even producing this plan.

Please, tell me what it should have done instead. Be very specific!

I think even if what you say is true, it doesn't address parents' point that both humans and machines regurgitate what they've consumed.

But I'd also want to point out that the way you're characterizing an LLM planning a trip doesn't have any structure to it, which indicates that in your scenario you're not using any kind of harness. I've been amazed at how capable even 30 billion parameter models are when I put them inside of a harness that provides structure and task management. If you consider that scenario, especially with the ability to search the web and use skills, suddenly the LLM looks a lot more like what the human process looks like.

Agents and harnesses don’t change the fundamental nature of LLMs, as is demonstrated by their terrible performance at real world tasks.
There are plenty of humans who plan trips by concatenating destinations that appear the most frequently in their instagram feed. Not that different from how an LLM does things.

Where humans and (current) LLMs differ the most is their failure mode. A human friend could be bad at planning trips, but that's kinda predictable, we're used to it, we know how to catch that Exception. LLMs on the other hand still have failure modes that come across as really wacky, like, what are they smoking in Mountain View?

Which might actually serve as better evidence of different internal workings at a deeper level, than just parroting well-known superficial features of stochastic whatevertheysay.

At a high level, the processes are extremely similar in many (not all) ways.

They're obviously achieved in drastically different ways at a low enough level; LLMs obviously do not simulate neurons or any biological construct. (For the record, I'm absolutely not one of those people who thinks LLMs are "alive" or should be treated like they are)

Reminds me of the olllllld days of Pentium II's when people got N64 emulation working shockingly quickly using HLE techniques. If you weren't around for this, it was quite the shocker at the time. I think the analogy is doubly apt, because HLE emulation has some serious limitations... it gets you maybe 80% of the way there really fast, and for the remaining 20% you need to roll up your sleeves and do serious LLE.

https://en.wikipedia.org/wiki/UltraHLE

    It takes the prompt and continues it based on weights in 
    the training data. If there is no data it picks the most 
    likely thing (maybe made up). If there is it’ll mostly 
    add things from that data. Maybe it’ll make tool calls and 
    pull in data that way too but you can’t actually trust all 
    the details.
I'd like you to point out which bits of this are different from talking to humans. If you replace "training data" with "memories", this is pretty much exactly how things might go if you asked a friend (or perhaps a flaky travel agent) for travel advice.

Note that I'm not arguing that LLMs are particularly talented at this particular use case. I'm pointing out that humans are also pretty unreliable.

You're also doing that thing where you point out that LLMs can be unreliable (yes, they are) without acknowledging how flawed nearly every other source of information is: people, websites, etc. I'm not defending LLMs in that regard... I'm just saying it's not a differentiator.