Hacker News new | ask | show | jobs
by slow_donkey 1959 days ago
There's a more practical but relevant cooking problem that I haven't seen discussed - when extracting recipes from websites it's tricky to parse the actual ingredients and quantities.

For example, "1/2 cup of diced tomatoes or a can of tomatoes (preferably san marzano)". This sort of freeform text doesn't suit regex very well but also lacks substantial context clues. You'd most likely use named entity recognition which could recognize that "1/2" is a quantity, "cup" is the unit, etc. but I haven't gotten very good results yet.

Maybe I'll write up a post when I land on a solution.

1 comments

There was an old HN post about the NYT recipe tagger. This seems to be the most up-to-date repo: https://github.com/mtlynch/ingredient-phrase-tagger
Ah yeah I saw this initially but haven't given CRFs a try yet. I was hoping ml had advanced enough that I could throw newer solutions at the problem. Thanks for linking this though, I just realized there's a lot of labelled training data the NYT provided which I will definitely use.