Hacker News new | ask | show | jobs
by smcin 698 days ago
Great work.

For some reason, when I gave it a very broad query I got the suggested result "[Table B18104?] Sex by Age by Cognitive Difficulty (Civilian noninstitutionalized population 5 years and over): Total".

No idea why it picked that table. Instead of the more general "[Table B01003]: Total Population" or "[Table B01001] Sex by Age". In general I think a query's first result hit should be the least specific match.

And the embeddings/full-text-search mishandle things that have no close match: the query "People who look like Kevin Bacon" returns "Number of People: Population by Ancestry: Basque (2022)"

1 comments

Hi - thank you for trying it out! These are both definitely real issues with the current approach. I've tried to reign in the "selecting an overly specific table" issue in the final "LLM-selects-from-search-results" stage but clearly have some work left to do there.

As far as the second issue - when people search for things way outside of the available data - I have not done much to address this, but really should. This happens for more plausible queries too, e.g. "Crime Rate" seems like it could be cataloged by the Census, but is not part of the tables indexed by the site (ACS Detailed Tables). It selects variables somewhat randomly here when it should really say something like "no relevant results found"