Hacker News new | ask | show | jobs
by dvirsky 3570 days ago
IIRC I found that the easiest was to train on shops, forums, and porn. But another tricky bit was conceptual - genre and category overlap sometimes (e.g. porn). Anyway I couldn't get it to yield proper results. But today we have things we didn't back then like opengraph and schema.org tags, that give more semantic info.