Hacker News new | ask | show | jobs
by scary-size 1162 days ago
Have fun going down the rabbit whole of parsing structured data from ingredient phrases. Had a fun few weeks with that!
1 comments

Yeah it has not been trivial so far! Let me know if you have any tips you can share
I settled on a regex based approach with lots of data clean-up and normalisation up front. Example of my site [4]

Some other approaches I spent a lot of time on:

* Extracting Structured Data From Recipes Using Conditional Random Fields [1]

* Chef Watson [2]

* Ingredient Parser - Model Guide [3]

[1] https://archive.nytimes.com/open.blogs.nytimes.com/2015/04/0...

[2] https://blog.kitchenpc.com/2011/07/06/chef-watson/

[3] https://ingredient-parser.readthedocs.io/en/latest/guide/ind...

[4] https://pretty-recip.es/recipe?recipe-url=https%3A%2F%2Fwww....

Maybe GPT-4 or some other LLM? Maybe too expensive but I would think they'd be able to accomplish the technical task.