Hacker News new | ask | show | jobs
by CJefferson 938 days ago
One big important project, which is making good progress, is "tagged PDFs".

The biggest advantage of these is that they should be, eventually, accessible to blind people, unlike normal PDFs created by LaTeX which are not that much better than a blank page to be honest (this isn't PDF's fault, Microsoft Word PDFs are very accessible).

For exciting reasons to do with the internals of TeX (mainly, it's actually a programming language, although it looks like a markup language), I know this has been a major project that has taken many years, and will take many more -- but I personally consider it a fairly big embarassment of academia, which often claims it wants to be open and accessible, that we lock so much of our research in a format which many people simply can't read.

1 comments

Wouldn't this require authors to use semantic tags instead of visual-presentation ones? That is, a tag that specifies that what follows is code instead of just \texttt{}, a tag specifying that what follows is e.g. the title of a book instead of just \textit{}, etc. The TeX engine itself, as it processes the source into a PDF, cannot know what the original author meant.
This isn't about text formatting. It's about things like reading order, alternative text for figures, or even just making clear to the PDF engine that two consecutive words are part of the same paragraph (which isn't even the case by default!). Take a look at this: https://www.overleaf.com/learn/latex/An_introduction_to_tagg... As GP said, it's really shameful that, in academia, we just ignore this kind of stuff. I tried making my articles' PDFs accessible and failed miserably.