Hacker News new | ask | show | jobs
by badsectoracula 1585 days ago
Considering year old mobiles are able to perform on-the-fly language translations in photos even via awful cameras, i find it weird that screenreaders still rely on such hints.
2 comments

Delete every CSS declaration (both inline or stylesheets) from every website and see how easy it is to read them. Not very, huh? Same deal with accessibility.

You can't "just OCR stuff" without losing all the visual meaning in a page. Just like we use borders and paddings and colors to hierarchize information, screenreaders use an information hierarchy too so users can conveniently navigate around.

I wasn't referring to just OCR stuff (or even just web stuff) though, my point was that there is enough information in the screen to make out detail - computer vision is more a broad subject than just scanning text. ~12 years ago i was working on getting a computer figure out where 2D boxes were in a feed from a camera (for augmented reality, not accessibility) and my algorithm was quite naive and primitive, but also the source was some awful web camera, not something "pristine" like a screen's content.

Of course i don't know that it is possible, it could be impossible, i'm just having the impression that there hasn't been much effort towards that approach. And TBH it kinda feels like it'd be much better to have a solution that works with "everything" without that "everything" knowing about it (or at least with very little participation from that).

Also FWIW i often use a "simple" web browser like Dillo or Elinks to read articles since it bypasses all the cruft and the usual suspect for making things unreadable isn't CSS but JavaScript.

OCR is relatively easy, but the accessibility information is not only that. There are the types of elements, the possible interactions and the changes on the screen. Also, it gives the ability to skip unnecessary information. Using ml for all of that is taxing and probably not very practical until the invention of AGI.
I wasn't referring to just OCR, check my other comment.