Hacker News new | ask | show | jobs
by BlackForestBoy 2437 days ago
> Does it have a way to handle page content that's less relevant?

It already collects some basic interaction data like visit frequency, stay time and scroll %. This data could be used to clean the db a bit.

Other than that there is definitely room to improve to clean out the terms that are captured but not really add value. Its a difficult task though because every page is so differently structured.

What ideas do you have to reduce the number of unhelpful terms? A spontaneous one is to detect the footers of this OutBrain et. all crap and remove them from the HTML before filtering out the words to index.

Right now we are focussing on developing a stable service with better UX and that does the current feature set really well. I'll take up your suggestion so we can think about how we can use them in the upcoming overhaul of the search.

Thanks for your input!