Hacker News new | ask | show | jobs
by UltraSane 62 days ago
Which is silly because you can easily just use OCR and screenshots to create DRM free versions of Kindle books.
3 comments

Not to mention it’s as easy to download books from Anna’s Archive as it is to buy them from Amazon. It’s weird going through so much effort to lock down books people already paid for.

I wonder how much this is about making it difficult for people to migrate to another platform. I recently switched to Kobo and the reader is far superior to Kindle. I had a hell of a time moving my library though.

I suspect at least some of this comes from publisher pressure. An acquaintance works for one of the big global book publishers and his general sense from upper management is that they still hate having to sell digital books.

It feels like the last major media industry that is holding out against a "future" that has been here for a long time already.

It's all from external pressure. Amazon spending energy on ebook DRM is a negative ROI activity for them.

A vanishingly small % of would-be ebook buyers even know pirated ones exist, and an even smaller one knows how to get those onto their Kindle.

My wife buys dozens of ebooks per year on Amazon, her friends too. I'm guessing if I poll that group, none of them would even know where to start, nor care to.

"Piracy is almost always a service problem" is also true. I see a lot of people who were risen on a pirated .mp3 and .epub to move to the streaming platforms just because it's a bit more convenient.
Yes totally agreed. I pay for streaming music and for Youtube because the costs make sense to me for what I get.

I used to pay for Netflix but now that there's so many different streaming services I have returned to the high seas because we just don't watch enough shows (maybe 3-5 shows a year?), yet they are spread across different services that all cost $20/month now, so the costs don't make sense for us.

For books, honestly, I refuse to accept that an EPUB costs $25 when the hardcover version costs $30. I also have heard first-hand how little of that $25 goes to the author (for the average author, not for a famous one)..

I do try to buy digital books directly from authors when I can, which is increasingly an option from upcoming writers, but otherwise, yarrrr...

This applies to newspapers too — if you compare the print version to the online version of a newspaper you notice that there's a lot more attention paid to the paper version. Whereas the online version has all kinds of aggressive banners and ads.

I think it's a generational thing, for a lot of publishers the internet is this newfangled thing

It gets even weirder in the Netherlands were the book industry has created a cartel. They have a minimum price that you cannot go under.

Of course what happened is that lots of people just started to import English paperbacks bypassing all the local laws. The price difference was just insane.

Dutch people in general do not have an overinflated view of their own language like in France.

It is really easy to buy a book, cut the spine off and feed the pages into a sheet fed scanner.
This reminds me of college, where I used to take my textbooks to the local copy shop to get the pages sliced out and three-hole punched so I only had to carry around currently relevant chapters rather than 30 lbs of books.

As for e-books, long story short, my low-tech chop-and-punch method tended to be cheaper and/or more convenient than the available legal e-book options at the time.

I considered scanning, and even had access to a sheet-fed duplex scanner, but given that the only mobile device I had at the time, a 17" PowerBook G4, was both awkward as an e-book reader and heavier than the unbound printed pages I was carrying around, it wasn't worth the hassle.

I actually bought a special flatbed book scanner where the glass was flush with one side and scanned every page of a book and then returned it. Scanning was tedious but not too bad while watching a good show or movie and getting my money back felt so good. Adobe Acrobat Pro can convert 800MB of scanned pages into a 70MB PDF with searchable and copy-able text.
It's to stop people from seeding new books to shadow libraries. It's not as easy to find new books on AA as on Amazon.
Given how quickly full-quality releases of movies and TV shows appear after they're first streamed, this is surprising to me, at least so long as the PC and/or Android Kindle apps continue to exist.
That's the title of the post.
OCR'd ebooks are universally trash. For one, all formatting is gone. Anything in the book other than ASCII characters will vanish. You lose links within the book and all other advanced features.

And OCR is generally just not accurate enough and still makes very visible mistakes throughout the text.

Have you read many OCR'd ebooks? I have, and every single one was massively inferior. Most I would consider barely readable.

For books that you want to keep the formatting the best option is to use Adobe Acrobat Pro and its Editable Text and Images feature. This replaces the scanned letters with a custom TrueType font. I used this in college to scan textbooks and it worked really well. Modern OCR on books is incredibly accurate.

see https://www.youtube.com/watch?v=bhJ9zqY8Da0

Open-source, free version of this is Stirling PDF https://github.com/Stirling-Tools/Stirling-PDF where you can do very accurate OCR while keep the formatting.
I love it when formatting is removed. Some ebooks especially epub don't work well with alternative fonts somehow.
What OCR do you guys use? I have only seen OCR that makes a lot of errors. Having it be usable requires tons of manual review. I probably wouldn't trust an LLM to do that review because it may introduce its own errors.

Edit: downvoters, would you like to answer my question? I would genuinely like to know. I thought based on the confidence of the comment above there must be a super accurate OCR I've never heard of, but after seeing the sibling comment I'm going to guess there isn't.

Stirling PDF https://github.com/Stirling-Tools/Stirling-PDF is a free self-hosted PDF tool that can do very accurate OCR while keeping the formatting.
Modern OCR is VERY accurate. Heck Adobe Acrobat Pro OCR was essentially perfect 20 years ago.
One of my hobbies is typesetting modern editions of a certain type of rare, obscure old books that were poorly typeset to begin with. Modern OCR—and I’ve tried plenty of tools—is still rather error prone in my application.
Can you name a good open source one? I have spent many hours in the current decade correcting OCR errors. Mostly tesseract.