|
|
|
|
|
by dmv
5984 days ago
|
|
Actually, it is not trivial to programmatically extract plain text from a PDF in a consistent format. In many cases it is easy, but the PDF format is visual first and content second, resulting in plenty of opportunities for Captcha-like problems. Despite that, PDF, especially PDF-A, is the superior format for preserving published content. |
|