| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Abundnce10 2953 days ago
	This method dramatically improves over previous approaches to text classification, and the code and pre-trained models allow anyone to leverage this new approach to better solve problems such as: Finding documents relevant to a legal case; Identifying spam, bots, and offensive comments; Classifying positive and negative reviews of a product; Grouping articles by political orientation; I'm starting a new project where I'm given many recipes and I need to take in a free form text of recipe ingredients (e.g. "1/2 cup diced onions", "two potatoes, cut into 1-inch cubes", etc.) and build a program that identifies the ingredient (e.g. onion, potato), as well as the quantity (e.g. 0.5 cup, 2.0 units). Could I use something like Fast.ai to tackle this problem?

3 comments

timClicks 2953 days ago

If you have sufficient labelled data, conditional random fields works well for this kind of problem. A technical team from NY Times have a great piece on it https://open.blogs.nytimes.com/2015/04/09/extracting-structu...

link

ioot17 2953 days ago

CRF works quite well, it's actually what I utilize right now to approach recipe parsing on https://cookalo.com/. It's based on CRFsuite with Python bindings for data training on already labeled recipes. If you build your own app and want to do some comparison, feel free to run some benchmarks against it.

link

Abundnce10 2952 days ago

Very cool! It sounds like you followed a similar approach that the NY Times used in their recipe parsing approach, correct?

How does your API handle ingredients with multiple options (e.g. "1 1/2 cups seedless red or green grapes")?

link

ioot17 2951 days ago

Yes, that's correct, it's similar to the mechanisms NY Times guys were using and I've been focusing on the datasets to feed the CRF with as it's what drives the whole thing. This is the output I've got based on your example: [ { "unit": "cup", "input": "1$1/2 cups seedless red or green grapes", "name": "red grapes", "qty": "1$1/2", "comment": "seedless or green" } ]

Don't hesitate to try the API out by pasting some examples to the white box on the site and pressing the "Try it out!" button, it's interactive :)

link

Abundnce10 2951 days ago

Don't hesitate to try the API out by pasting some examples to the white box on the site and pressing the "Try it out!" button, it's interactive

Sweet, I didn't realize it was interactive. I'll give it a try!

link

jph00 2953 days ago

I'm not sure - what you're describing is information extraction. I haven't tried that yet, but I'm certainly interested in doing so (especially for medical data).

link

state_less 2953 days ago

Similarly, could something like this be useful to extract out a command that a user wants to run from a transcription? For example, "Add a user named Jenny to our client list.", which results in the command, 'create user Jenny'. Or, "Could you add Jenny to our client list?", which results in the same command, 'create user Jenny'. Perhaps instead of outputting the next word, output the expected command from a set of commands?

link