Hacker News new | ask | show | jobs
by dvirsky 3570 days ago
There were a few attempts at that in the past, one being http://omgili.com/ that now seems to return pretty much garbage.

BTW About 12 years ago I was building this search engine, and I was toying with the idea of building a classifier that classifies web pages based on their "genre" rather than category, so you can limit your search for shopping websites, forums, blogs, news sites, social media, etc. It was a bitch to train, and my classifier's algorithm was pretty crappy, but it showed some potential.

I think today modern search engine do that behind the scene, and try to diversify the result to include pages from multiple genres, but they usually don't let you choose.

1 comments

Heh, classifying by "genre" is exactly what I was thinking of doing.

Had some debate with myself if I should start by focusing on training for shopping pages (product pages & product reviews) - because that might make some money; or start by training for forums - which I'd enjoy a lot more. Or build a more general system which would definitely never work and never get finished.

Google actually let you filter by "discussions" until a few years ago, so they certainly do this kind of classification. It didn't work perfectly but sometimes did the trick. Don't know why they removed that feature.

Google removed it because they aim at the mass market.

Another perspective: people who find answers in forums are less likely to be interested in ads. And who knows, maybe making search shitty(in so many ways, not just formus), ad revenues rise ?

IIRC I found that the easiest was to train on shops, forums, and porn. But another tricky bit was conceptual - genre and category overlap sometimes (e.g. porn). Anyway I couldn't get it to yield proper results. But today we have things we didn't back then like opengraph and schema.org tags, that give more semantic info.
Iirc Gigablast had such a feature.