|
|
|
|
|
by jsrfded
5702 days ago
|
|
We have our own crawl/index/serve technology end-to-end. We have a 3 billion page web crawl, a machine-learning trained ranker, and then the slashtag vertical features. Since BOSS gives us an additional 20-40B pages for very long tail queries, we fall into /yahoo if we don't have any of our own results. We're auto-firing slashtags for certain regular queries now, e.g. [cure for headaches] will auto-fire /health, [industrial design colleges] will auto-fire /colleges. We're doing this initially for health, lyrics, colleges, autos, hotels, recipes, and personal finance. Getting the crap from sites like ehow out of the results and pushing results into a curated set of high-quality sites for queries in spammy categories really cleans up the results there. |
|
But it seems the /lyrics slashtag explicitly gives me #3, and actively excludes any results from the #1 or #2 categories that would normally come up.
For example, the ideal result for the search [pearl jam spin the black circle], imo, is the official page, http://pearljam.com/song/spin-black-circle. Without /lyrics this is the #4 result, which is decent. But when I add /lyrics, the official lyrics page gets excluded!