|
|
|
|
|
by ahmedhawas123
309 days ago
|
|
This may be a bit of an irrelevant and at best imaginative rant, but there is no shortage of solutions that are mediocre or near perfect for specific use cases out there to parse PDFs. This is a great addition to that. That said, over the last two years I've come across many use cases to parse PDFs and each has its own requirements (e.g., figuring out titles, removing page numbers, extracting specific sections, etc). And each require a different approach. My point is, this is awesome, but I wonder if there needs to be a broader push / initiative to stop leveraging PDFs so much when things like HTML, XML, JSON and a million other formats exist. It's a hard undertaking I know, no doubt, but it's not unheard of to drop technologies (e.g., fax) for a better technology. |
|