|
|
|
|
|
by jbob2000
3196 days ago
|
|
I'm guessing he has a bunch of restaurant menus that he needs to read and categorize the items into appetizers, entrees, dessert, etc. etc. The problem is that he'll need 50,000 menus to train the ML model on, and then another 50,000 to verify it. He'll also run into the problem of restaurants sometimes categorizing entrees as appetizers (is a caeser salad an entree or an appetizer?) so the NLP portion will be especially difficult. What about dish names that are in other languages? I could go on... Tough problem! |
|
One approach is
https://blog.openai.com/unsupervised-sentiment-neuron/
where you can throw in a great amount of unlabeled data and build an internal representation that models the data well enough that you can train something that works like an HMM or CRF with a tiny amount of labeled data.
If you are willing to do something rule-based, I've used
https://en.wikipedia.org/wiki/Case-based_reasoning
to organize the work in annotating corpuses. Often I can prove that a certain rule set covers X% of the cases, then add a rule to do X+epsilon% until the results are "good enough".
Feel free to click on my profile link and send me a message if you want to chat more.