|
|
|
|
|
by pbowyer
1180 days ago
|
|
For the reasons others have said I don't see it replacing 'traditional' scraping soon. But I am looking forward to it replacing current methods of extracting data from the scraped content. I've been using Duckling [0] for extracting fuzzy dates and times from text. It does a good job but I needed a custom build with extra rules to make that into a great job. And that's just for dates, 1 of 13 dimensions supported. Being able to use an AI that handles them with better accuracy will be fantastic. Does a specialised model trained to extract times and dates already exist? It's entity tagging but a specialised form (especially when dealing with historical documents where you may need Gregorian and Julian calendars). [0] https://github.com/facebook/duckling |
|