| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pbowyer 1180 days ago

For the reasons others have said I don't see it replacing 'traditional' scraping soon. But I am looking forward to it replacing current methods of extracting data from the scraped content.

I've been using Duckling [0] for extracting fuzzy dates and times from text. It does a good job but I needed a custom build with extra rules to make that into a great job. And that's just for dates, 1 of 13 dimensions supported. Being able to use an AI that handles them with better accuracy will be fantastic.

Does a specialised model trained to extract times and dates already exist? It's entity tagging but a specialised form (especially when dealing with historical documents where you may need Gregorian and Julian calendars).

[0] https://github.com/facebook/duckling