Hacker News new | ask | show | jobs
by jameshart 848 days ago
Next AI challenge: try to infer the intended panel reading sequence, and the flow of speech bubbles/narrative.

Would potentially be a useful augmentation to a digital comic book reader, refocusing from panel to panel in sequence. Not to mention making comic book content more accessible.

5 comments

This is such a good resource! Thank you!
Is that solvable? It's my personal experience that humans can't reliably do that; I could imagine a machine doing better, but I'm not sure what information it would be working with.
I feel the problem is you often read some panels in the wrong order, but when you read them you realize the order is wrong after the fact, based on the text. The AI could probably use the text as additional information for determining the order.

Would probably never be perfect, but if it gets it right in all but a few outlying cases, that should be good enough.

Counterpoint; isn't reading and re-reading the page after figuring out the intended order part of the experience?

Take video games. From Software games tell stories through breadcrumbs; locations, speech lines, item descriptions, and only over time can you start to connect the dots.

Sure, you can watch a youtuber connect the dots for you, show you what you've missed, and make you aware of your own mental limitations. But it's not the same.

These programs would probably be useful to speed up the translation of comics, for example there are hundreds of thousands of Japanese self-published works and many of them do not have a translation.
Crunchyroll used to offer a guided reading experience for some of their manga.

You could probably build a tool that tags each panel and attempts to figure out the order, and then have a human editor do a validation pass. If you have enough people reading a series you can probably crowdsource the panel sequence.

Kindle just added one. I find it annoying because I can't remember how to edit the single-panel mode once you get into it.
Files in ACBF format include panel metadata. So there should be lots of training data.

- https://launchpad.net/acbf

That sounds similar to having an AI explain where you should be looking in a painting, or where to pay attention in a movie. This might be genuinely useful as an accessibility feature, but I'd also see strong sentiment against it, potentially from the creators of the art.
I'm more of a manga reader then a western comic one, but why would there be sentiment against this? In what way is the reading order of panels up to interpretation?
Focusing manga, the shoujo space for instance has a reputation for using out of frame character positionning and not shying away from composing pages with a visual flow that doesn't follow the actual character speech.

It would be complex to pin a given panel order as canonical when the author is playing games with the reader. I don't think authors would oppose an accesibility feature, but I could see the debate if it was a more prominent, sanctionned way of reading.

You've now focused too much on manga/comics. Stepping back a bit and looking at art more in general and using the provided example of looking at a painting, who are you to tell me where I should be looking. yeah yeah, i see the obvious thing your AI is trying to tell me where to look, but I'm looking at this less obvious thing that really strikes my fancy. maybe it's a blemish. maybe it's a unique brush stroke/technique that others might not care about, but an aspiring artist might. (even if it is another stupid AI.)
But the point of this entire discussion is about manga/comics???
because that was the idea proposed on how to expand this concept to something else. that's how conversations evolve. someone says something that plants a seed for someone else, that then gets someone else thinking, that then, that then.
Yeah but isn't telling you where to look for and what to think contrary to the idea of art? Sure, a guide telling you why Van Gogh's brushstroke was interesting may give you a newfound appreciation of what from a distance is a simple looking painting, but treating looking at art as something that should be min/maxed and optimized sounds depressing.
I think aside from accessibility the main use case is for e-ink devices where the resolution isn't high enough to comfortably display the entire page at one and the refresh rate isn't high enough to pan/zoom quickly. With those limitations automatically "paging" through the panels probably makes for a much better reading experience (I haven't tried it myself).
Guided Viewed was/is a thing in Comixology / Kindle comics.