Hacker News new | ask | show | jobs
by Kikobeats 739 days ago
You can use Microlink to turn PDF into HTML, and combine it with other service for processing the text data.

Here an example turning a arxiv paper into real text:

https://api.microlink.io/?data.html.selector=html&embed=html...

It looks like PDF, but it you open devtools you can see it's actually a very precise HTML representation.