Hacker News new | ask | show | jobs
by column 4114 days ago
How I did it:

0. Download and install Tesseract https://code.google.com/p/tesseract-ocr/

1. Crop and adjust curves of the image in Paint.Net (= get a better contrast) to ease OCR

2. Save as PNG in the folder of Tesseract

3. Open cmd in Tesseract folder, type tesseract kayak.png kayak (produces kayak.txt)

4. Open kayak.txt in Notepad++, remove empty lines (to have correct indexes), CTRL+F "KAYAK"

3 comments

Doesn't this only work if the word is west-to-east? Seems like it'd fail for any other orientation (there are 7 others).

(Well, for a palindrome, it works for east-to-west too - so I guess you missed 6).

You are correct. However, it is stated that 'KAYAK' appears only once and it seems to me that CTRL+Fing should be the first thing to do when searching for a string in a relatively small text. The next step, had the CTRL+F failed, would have been to search for "KAYAK" vertically/diagonally programmatically.

(on a sidenote, you're counting is incorrect, as it is a palindrome there really is only 4 directions for the word to appear: W-E, N-S, NW-SE, NE-SW)

for vertical, you could transpose the crossword then Ctrl+F, just being aware that the result has its columns and rows switched.
This is the output of Tesseract (once empty lines are removed):

    AAYKAKAYKAYYKKAAKAYAAAYKYKYAYK
    KYYAKYAAAYKAAKKYAKKKAAAKAAYKAA
    KKKYAKAKAYKKYKKYKKAKYAAAAKAAKK
    YKAAKYAAKKKKKAYAYYYYYAAKAAKKKK
    KKAYYAAKKYYAYAAKAAKKAAYKAKKAAK
    KKAAAKAAAKKAKKKYAAAYKYAYAKYKAA
    KKYKAAAAAAAAAAKKAAAKYYAKAKAKKY
    KKKKKKAKYYKYYKKAAYAKAAAKYAKKAA
    KAKKAKAYKAAYAKYYKKAYKKKAAAAAKA
    KKAKAKAKAAAKAYAYKAKAYYAYKKAKYY
    AAAYAKKKAKKAKYKAYKKAKKYKAAKKYY
    YKKKYKKYAAAKKKKAKAYKKKKKAAKKKY
    KKAKAYKAAKAKKYYAKAYAAKYAAAAAAA
    AAAAKKYAAKKAAYKKAAAYAYAKYAKAYK
    AKKAKKAYKAAYKYKAKYAAKKYKKKKKKA
    KYKAAKKYAYKAKKKYKAKAYAYKYAAAKK
    KAYYKKAKYKAYAAKAYAAYYYAKKYKYKK
    KKKKAKAYAAAKKAYAYKKYAKAAAAYKAY
    AYKAKKKAAKAKAYAAAKKYKKAYKAYAYY
    KYAYKAYAKAKYYKYKAKAAYKYKKAYKAK
    AKAKKAKKAKKKKAAAKYYYYKAKAAAAAK
    KAKYYKAKKYKYAKYAAAAAKAKAYAAKKA
    KKAAAKKAKAKKAAKAYAAYAAAKKAAYYA
    KAKKYYKKAYAYKKAKKYYKYKYKKAAAKA
How do you do OCR in Paint.Net ?
In Paint.Net I only adjusted the curves (CTRL+SHIFT+M) to get a better contrast. The OCR is done by Tesseract. This solution is all about using the right tool for the job.
Sorry I misunderstood your comment. Was just wondering how you are doing OCR in Paint.Net. Thank you for your clarification.