I build such a service, but it is impossible to guarantee any reliable result. I ended up shutting it down.
The PDF standard is a mess, and the number of 'tricks' I've seen done is astonishing.
Example: to add shade or border effect to text, most PDF generators simple add the text twice with a subtle offset and different colors. Result: your SaaS service returns every sentence twice.
Off course there were workarounds, but at some point it became unmaintanable.
I'd say exactly the opposite. PDF makes it easy to create a document that looks exactly the way you want it to, which seems to be all that most web designers want (witness all the sites that force a narrow column on a large screen and won't reflow their text properly on a small screen).
In a way it has. In my experience, there have been multiple times where a "generate PDF" requirement has come up, with the best viable solution being "develop it in HTML using standard tech" followed by "and then convert it to PDF".
The demand for automating text extraction is still very high — or at least it feels like it when you’re working around the clock to cater to 3 of your customers, only to wake up to 10 more the next day. We’re small but growing extremely quickly.
Everything. Insurance companies to fledgling AI startups.
It’s definitely harder to get government business because the sales process is so long and compliance is so stringent. That said, we are GDPR compliant.
The PDF standard is a mess, and the number of 'tricks' I've seen done is astonishing.
Example: to add shade or border effect to text, most PDF generators simple add the text twice with a subtle offset and different colors. Result: your SaaS service returns every sentence twice.
Off course there were workarounds, but at some point it became unmaintanable.