Hacker News new | ask | show | jobs
by DevAccount 5046 days ago
Could you possibly automate the items that do have enough information and collect the ones which don't for later manual review?
1 comments

I think this will be the solution I go with, but I'm a bit unsure how it would work in practice. I don't know how I would evaluate the accuracy of automated clustering?

Another solution might be a sort of automated-manual hybrid: e.g. identify common words/phrases in a particular category manually, write a script to find all items that have those, add to category.

Well to write the automatic bit you'll have to manually figure out the rules :)

But yeah, something like that would be a good start. I don't know anything about this domain so I'm of limited help here. It might be too hard to categorize based solely on words if they're not distinct enough.

You could use an algorithm to identify keywords in your dataset and the manually classify the most common ones.