Hacker News new | ask | show | jobs
by Abundnce10 2953 days ago
This method dramatically improves over previous approaches to text classification, and the code and pre-trained models allow anyone to leverage this new approach to better solve problems such as: Finding documents relevant to a legal case; Identifying spam, bots, and offensive comments; Classifying positive and negative reviews of a product; Grouping articles by political orientation;

I'm starting a new project where I'm given many recipes and I need to take in a free form text of recipe ingredients (e.g. "1/2 cup diced onions", "two potatoes, cut into 1-inch cubes", etc.) and build a program that identifies the ingredient (e.g. onion, potato), as well as the quantity (e.g. 0.5 cup, 2.0 units). Could I use something like Fast.ai to tackle this problem?

3 comments

If you have sufficient labelled data, conditional random fields works well for this kind of problem. A technical team from NY Times have a great piece on it https://open.blogs.nytimes.com/2015/04/09/extracting-structu...
CRF works quite well, it's actually what I utilize right now to approach recipe parsing on https://cookalo.com/. It's based on CRFsuite with Python bindings for data training on already labeled recipes. If you build your own app and want to do some comparison, feel free to run some benchmarks against it.
Very cool! It sounds like you followed a similar approach that the NY Times used in their recipe parsing approach, correct?

How does your API handle ingredients with multiple options (e.g. "1 1/2 cups seedless red or green grapes")?

Yes, that's correct, it's similar to the mechanisms NY Times guys were using and I've been focusing on the datasets to feed the CRF with as it's what drives the whole thing. This is the output I've got based on your example: [ { "unit": "cup", "input": "1$1/2 cups seedless red or green grapes", "name": "red grapes", "qty": "1$1/2", "comment": "seedless or green" } ]

Don't hesitate to try the API out by pasting some examples to the white box on the site and pressing the "Try it out!" button, it's interactive :)

Don't hesitate to try the API out by pasting some examples to the white box on the site and pressing the "Try it out!" button, it's interactive

Sweet, I didn't realize it was interactive. I'll give it a try!

I'm not sure - what you're describing is information extraction. I haven't tried that yet, but I'm certainly interested in doing so (especially for medical data).
Similarly, could something like this be useful to extract out a command that a user wants to run from a transcription? For example, "Add a user named Jenny to our client list.", which results in the command, 'create user Jenny'. Or, "Could you add Jenny to our client list?", which results in the same command, 'create user Jenny'. Perhaps instead of outputting the next word, output the expected command from a set of commands?