Hacker News new | ask | show | jobs
by a9t9 3285 days ago
Well, at least this confirms that the screenshots are not manipulated ;)

The tricky part for the OCR in this example is the diverse background, as the Chinese characters are directly inside the movie.

Your comment is interesting, as the original motivation for creating the Copyfish extension was to help me watch Chinese movies. So I can confirm that for this purpose, it works fine. Of course, once in a while it gets some characters wrong but it works ok with many movies.

Here is a screencast of Copyfish doing subtitle OCR:

https://www.youtube.com/watch?v=YNGkGWj8lA4

1 comments

> as the Chinese characters are directly inside the movie.

Yep, same with TV shows, and soft-copies of transcripts are difficult to come by, hence my interest in something like this.

I just watched the video. When used on a video does it keep a history of all OCRed text?

Finally, you might also like to try posting this on http://www.chinese-forums.com If it mostly works well for TV and films, I'm sure there will be quite a few people there who are interested in it.

> When used on a video does it keep a history of all OCRed text?

Not yet - but this feature is already on my todo list ;)

Thanks for the hint about the chinese forums!

> Not yet - but this feature is already on my todo list ;)

Another interesting feature would be to do some sort of statistical analysis of Chinese text being OCRed and then combining that with possible characters suggested by the OCR. This would almost certainly prevent the mistake in the last two characters of the Chinese movie screenshot.