Hacker News new | ask | show | jobs
by vkkhare 1837 days ago
Makes sense though I wonder what would be the original source of that data (someone like google/microsoft must be logging user data and then making some parts of it anonymized and public).

Maybe also look into on-device learning, it can be efficiently hooked up with differential privacy and give more specific results.

1 comments

Yes, ironically, SEO industry resources can be helpful, and we used them in putting together training data. If you're interested there are some good simple free ones to get started also, like these from Mondovo:

https://www.mondovo.com/keywords/

Brave browser uses aggregated search history data that's been anonymized, but we're not trying to personalize results (we're looking for objectively true, rather than "true for you"), so we're not trying to replicate ad-industry style personalization. A good set of labelled data matching intents to phrases helped us build some models simply that are surprisngly good at picking intents :)