Hacker News new | ask | show | jobs
by aragonite 51 days ago
> But I would love an option (emphasis on option) to see the text side by side with the page images. ... That way, I could "confirm" or "fact check" the faithfulness of the OCR.

You can already do that on Wikisource. For example, here's p. 658 from the entry on "Molecule":

https://en.wikisource.org/wiki/Page:EB1911_-_Volume_18.djvu/...

Also OP: I noticed some fidelity issues in your version (at https://britannica11.org/article/18-0684-s2/molecule). For example parts of the math formula under the line that ends with "the molecules of other kinds" ([1]) are missing (compare [2]). Also, in your version fn. 1 of this article is attached to "as they have always done" ([3]) but it should actually be attached to "Atom" on p. 654 ([4]):

[1] https://britannica11.org/article/18-0684-s2/molecule#:~:text...

[2] https://en.wikisource.org/wiki/Page:EB1911_-_Volume_18.djvu/...

[3] https://britannica11.org/article/18-0684-s2/molecule#:~:text...

[4] https://en.wikisource.org/wiki/Page:EB1911_-_Volume_18.djvu/...

3 comments

That's cool about the WikiSource parallel text+image page view, TIL. Thanks!

As an example flow (since it took a minute to figure out): we can start at https://en.wikisource.org/wiki/1911_Encyclopædia_Britannica then click to navigate/browse volume > section > topic to get to a text page, then click Source tab, then click a Page Number (maybe hunt around for the correct page number), and see the parallel view, text + image. With previous and next page buttons available, retaining the parallel text + image view.

Following up, another WikiSource flow is the following:

1. Go to https://en.wikisource.org/wiki/1911_Encyclop%C3%A6dia_Britan...

2. Click button "Search the 1911 Encyclopædia Britannica". This currently goes to the page at https://en.wikisource.org/wiki/Special:Search?search=&prefix...

3. Enter the search term and click Search. (There is auto-suggest for some topics, but Search button seems to give more complete results.)

4. Get to the text page of interest, such as https://en.wikisource.org/wiki/1911_Encyclop%C3%A6dia_Britan...

5. Notice the left margin contains hyperlinks like [105] whwere 105 is the page number nd links directl;y to the side-by side view of page 105. Click the [105] link on the left (for example), to get to https://en.wikisource.org/wiki/Page%3AEB1911_-_Volume_02.djv... which shows the text-and-image side by side (for that page).

This flow avoids the hunting-for-the-right page step, by using the direct links.

There's also a side-by-side option now. On any article, clicking the little scan button above the double-navigation arrows in the right margin will open the scan at whatever page you're viewing, and it will scroll as you scroll the text.

Thanks for the suggestion.

These were both pipeline errors and they have just been corrected, thanks to your sharp eyes.