Hacker News new | ask | show | jobs
by tholor 1652 days ago
The demo corpus there just contains documents about countries and capital cities. So you could try asking questions like "What's the climate of Beijing?" or "How many people live in the capital of the US?".
2 comments

Haystack looks great, but the demo maybe highlights some difficulties with this kind of task.

"What is the population of Italy?" ...gives the population of Rome as first answer at 78.32 relevance :)

I get similar result for some other countries.

"What is the population of Cambridge?" ...to be fair, this is an ambiguous place name as there are several around the world. However the answer it gives is quite far removed from any of them: "In 1788, Kingston had a population of 25,000", Relevance: 93.14

(Disclaimer: I'm a Haystack maintainer and I helped creating this demo)

I had to try it out the questions you asked, because your first seems totally answerable to me. And indeed I do get the right answer in the first position (60 million). Did you ask exactly the same question you posted?

For the second, unfortunately we included only country pages and capital city pages, so it's likely that the information about the population of Cambridge simply wasn't there.

In general though I agree this task is not perfect for a demo. It's hard to tell whether the model is wrong because it doesn't have enough info, or whether it does have the data but couldn't find it. The best way to evaluate it will always be to try it out on your own data :)

What is the population of Cambridge?

for me the demo returns that the model did not find an answer...

ah, it seems to matter if you use upper/lower case

"What is the population of italy?" returns Rome still

"What is the population of cambridge?" returns Kingston, Jamaica, circa 1788

I guess I typed the questions nicer in my original comment above than I did in the input box for the demo :)

Yep, that's definitely this challenge with commonly available models. In a real-life product development there's most often an important step of evaluating the model(s) and fine-tuning if necessary.
Re "Kingston" - interesting! :) Probably, because of "Cambridgeshire"?
btw the demo can be found at https://haystack-demo.deepset.ai/