Hacker News new | ask | show | jobs
by robbomacrae 949 days ago
I was a bit surprised about their choice of example app. With tour guide/info you need to be re-assured the information you're giving is correct. But asking LLM's to write it for you is a recipe for hallucinations. I built Summer AI to do audio tour guides using old fashioned web scraping and then piping the output through LLM's just to summarize and even then use a couple of extra steps to ensure factual accuracy.

During testing I found any other approach would always inevitably make stuff up.

Not trying to disparage this work. I think it's great that you can build such a thing so quickly. But I wouldn't rely on the info these LLM's provide.

2 comments

We are working on a similar application and we have the same observation: external data is required to avoid hallucinations, especially if you go to less known places. It's absolutely the case with GPT-3.5 and often with GTP-4. We will release our new content in the next few days. We are finally wondering about eating the cost of expensive TTS or going with a cheap option for okay results. Can I ask which option you used for TTS?
Hello! I know of your guys work! I try to keep up with all the competitors :) Feel free to reach out to me rob @ summer dot ai and I'd be happy to talk shop.

For anyone else that is interested in this question: I've tried a whole bunch of the TTS services and found that Microsoft and AWS are the best of the standard providers IMHO and these are services that tend to have startup credits available so I use a mix of these two - I try to never rely on just one provider. I've met with the Eleven Labs folks and some of their demo's of the V2 stuff that's coming are really amazing but latency and pricing might rule them out as an option for the time being.

Thanks for the answer Rob, we just reached out :) We arrived to the same conclusion, we mostly rely on AWS Polly so far. Hopefully the pricing of better alternatives goes significantly down in the next months. We even tried to run different open source solutions but we could not find anything SOTA.
Was this with GPT-4?

I used it to plan a summer European trip to 17 cities and didn't notice any hallucinations for what must have been 20 hours of back and forth creating itineraries, day trips, etc.

Did not use it as my only source of info, still watched some videos by Rick Steeves and Wolter's World, but it far and away was the most efficient part of the planning process for me.

Yes gpt 4. I was fact checking and using places I knew pretty well and would find the occasional error when asking it to describe cities etc and what to do.

Well for itineraries, given that its training data gets cut off at some point, I’d also be worried about it recommending places that are no longer open.

Note that I’m building an app for customers and attempting to build a brand that can be trusted - whereas in your case gpt 4 is probably ideal given you know it’s caveats and can check the important stuff.

I've also used it a few times to suggest things to do in the last few months. It came up with OK suggestions.

Also, they literally just demoed something similar as this project as part of their recent keynote on new openai stuff. The demo featured focused on their new assistant api.

I've actually grilled chat gpt a bit on geographic information on a few occasions. It's not perfect but surprisingly good. It obviously extracts a lot of that from things like wikipedia.

You can actually ask it to answer in geojson format, paste it in geojson.io and end up with a usable map. It seems to know about a lot of landmarks; including coordinates. For smaller venues, the coordinates are not super accurate.

You can also ask it to provide geojson for the boundingbox of a city or area. I even asked it to pretend to be an in car navigation system and provide me with directions to an address. It half succeeded in listing correct streets but messing up right and left. With the latest chatgpt it just asks bing.

The topic of this thread is intended to be a loose recreation what was used in the OpenAI demo. I guess my assertion is that it could work quite well for a travel agent type of functionality, at least for the type of service an average travel agent provides. For something like a tour guide that needs to gives depth it probably does fall flat quite a bit.

I can totally understand not being great for smaller landmarks... it could be the case too that it could trigger a call to an external service for things it doesn't have high confidence about.