| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by a9t9 3285 days ago

Well, at least this confirms that the screenshots are not manipulated ;)

The tricky part for the OCR in this example is the diverse background, as the Chinese characters are directly inside the movie.

Your comment is interesting, as the original motivation for creating the Copyfish extension was to help me watch Chinese movies. So I can confirm that for this purpose, it works fine. Of course, once in a while it gets some characters wrong but it works ok with many movies.

Here is a screencast of Copyfish doing subtitle OCR:

https://www.youtube.com/watch?v=YNGkGWj8lA4

1 comments

imron 3285 days ago

> as the Chinese characters are directly inside the movie.

Yep, same with TV shows, and soft-copies of transcripts are difficult to come by, hence my interest in something like this.

I just watched the video. When used on a video does it keep a history of all OCRed text?

Finally, you might also like to try posting this on http://www.chinese-forums.com If it mostly works well for TV and films, I'm sure there will be quite a few people there who are interested in it.

link

a9t9 3285 days ago

> When used on a video does it keep a history of all OCRed text?

Not yet - but this feature is already on my todo list ;)

Thanks for the hint about the chinese forums!

link

imron 3285 days ago

> Not yet - but this feature is already on my todo list ;)

Another interesting feature would be to do some sort of statistical analysis of Chinese text being OCRed and then combining that with possible characters suggested by the OCR. This would almost certainly prevent the mistake in the last two characters of the Chinese movie screenshot.

link