Hacker News new | ask | show | jobs
by jedwhite 1841 days ago
Yes, ironically, SEO industry resources can be helpful, and we used them in putting together training data. If you're interested there are some good simple free ones to get started also, like these from Mondovo:

https://www.mondovo.com/keywords/

Brave browser uses aggregated search history data that's been anonymized, but we're not trying to personalize results (we're looking for objectively true, rather than "true for you"), so we're not trying to replicate ad-industry style personalization. A good set of labelled data matching intents to phrases helped us build some models simply that are surprisngly good at picking intents :)