Hacker News new | ask | show | jobs
by enraged_camel 3713 days ago
Wow! That's some incredible gross incompetence on part of the people at Amazon who designed this system, and those who QA'd it.
4 comments

How would you have fixed it?

Assuming you implement the obvious thing -- tracking each page of a book the user has opened --

(a) How could you reliably track this if the user always keeps their Kindle on airplane mode?

(b) How could you track this accurately if the user reads a few hundred pages in the subway, where there's no Internet service?

(c) How could you distinguish that between someone who hopped around a book in that same time period?

(d) If Kindles don't already track page views this way, how do you update the software on all Kindles to start tracking this way? When do you switch your billing script to track purchases like that?

(e) If you're QAing a Kindle and you spot this loophole, how do you do all these fixes? How long are you willing to keep the software from shipping? How certain are you that your theoretical solution is better than what's already shipped?

Product development is hard, and it makes me angry when people handwave it as "gross incompetence" from a position of ignorance.

Uh, store a log file of every user action in that book, and send those log files to the mothership periodically, as internet is available? It does not have to be same day, just eventually.

Analyzing log files for duration/pages visited is probably easier than the equivalent for web server logs, and there are very many services that will analyze those for you.

Yeah, I'm not getting the "This is a very difficult problem to solve" take on this.

The books currently track by location point and you could log on blocks (e.g. every 100 location points progressed).

Amazon's books are already DRM'd to hell, meaning the kindle has to use the unlimited books through the marketplace. Then it's just a matter of reporting user stats, which can be covered in the Unlimited TOS.

You're still trusting the client. Any system that trusts the client is flawed. Time to read per page varies, and if you read the original article it says the scammers are mitigating chances of alerts by clicking through a book over a three day period. You're going to find people who click through at very cost efficient means somewhere in the world when you're making $60,000 a month from this scam.
You would still need to force the users through each page.

Fast-reading bad actor accounts can be flagged as abusers through pattern recognition. Since a subscription is necessary, creating numerous accounts to game the system becomes expensive fast.

This is kind of rough for technical or reference-ish books. Perfectly OK for fiction... well, except for anthologies and collections (HAL 9000 says I'm sorry Dave, I can't let you skip past the other Arthur C Clarke stories in this anthology, you'll have to read all the stories in order, at an acceptably slow speed)

I subscribe to F+SF and it would annoy me if I were technically prevented from skipping the end of stories that don't resonate with me.

Sorry, I meant the abusers would need to force their fraudulent user accounts through each page.

Normal people who don't find a book engaging can still skip the end, just that they'll (justifiably) be worth less to the author.

They don't have to be fast-reading. They can just create a fake log, and send it to the mothership after an appropriate period of time.

Still, I will agree that that will make scammers' lives harder.

Since only paid accounts are used in revenue attribution, faking logs for your own accounts would never work.
Amazon writes the client software (and ships hardware). If the clients communicate securely with the servers, Amazon should be able to trust them.

(I'm excluding Kindle Web Viewer, of course. Perhaps it should not have access to Kindle Unlimited.)

Fucking horrific level of data collection.
Yes, it is, but unfortunately I'm not sure how to get around it in a system where you aren't actually buying the goods, but borrowing them and then they are required to know how much of it you used. Thankfully you can still buy books outright if you don't want to be tracked (sort of. All KU books are Amazon exclusive, so Amazon will at least track that you bought it).

That said, Amazon is already syncing your location,and any annotations you've made[1] so they persist across all kindle devices, so there's already a bunch of tracking in place. Given that there's already some tracking, I wouldn't be too opposed to a per-page bit for whether it was read, triggered when the page has been lingered on for five or more seconds (scaled down to 1 second for partial pages, such as ends of chapters).

1: Anyone remember the big episode years back over Amazon realizing they didn't have the license to a book, then removing it from all Kindle devices automatically, including the annotations made? In what is possibly the most ironic situation I can imagine, the book was 1984.

Speaking of time, why not bill by minutes spent reading instead of pages turned?
I believe this would create a bad incentives structure. You'd penalize the author for people getting hooked to the book (and therefore getting into the 'flow' and reading faster), and encourage scammers to just linger on pages (probably making the scam even easier).

Plus, lots of significant ambiguities to solve: user is reading a page, gets up to do something else, forgets Kindle open. How many minutes do you bill? This might be solvable with the proper signals and rules, but I believe this is far from trivial.

Way more in-depth levels of data collection happen on literally every single page of the web, for reference.

If you're comfortable browsing the internet, this level of reporting on a Kindle seems almost quaint by comparison.

> If you're comfortable browsing the internet, this level of reporting on a Kindle seems almost quaint by comparison.

Yeah, I'm not okay with that other tracking either. In addition, I am paying for my Kindle and my Kindle books or KU subscription. It used to be only free services tracked you, but I guess that limitation is coming to an end.

This. It doesn't even really need to be a log. A bitset with each bit representing a page and a `1` representing "this page read" would do the trick. On a massive 8000 page tomb, that's only 1kb.

If Amazon doesn't need the exact pages read, POPCNT the total and send that.

...that wouldn't change anything. They'd just change the report file to to sync straight 1s... no, you still need obfuscation and encryption, bloating it to at least 100kb.

but thats still pretty minor

I don't think you gain much by forging this number on a single device and you wouldn't be able to manipulate this on ALL devices.

The reason the scam worked is that it encourages all readers to jump to the end of the book (via a link on the first page). I don't think there would be an equivalent way to force people to page through and pause on each page.

That may not make the scam totally impractical to all but the most dedicated hackers, but it does increase the scam costs substantially. Maybe enough to remove the low-hanging-fruit from the scammers and have them target elsewhere.

So, I don't think "that wouldn't change anything".

And you don't think that those logs can be faked? It might stop the casual, "hey fans, read this 'book' to support me" but it wont stop the real scammers or people who would buy reads for revenue and ratings.
Kindles are pretty locked down...it's not that difficult to have the kindle sign the data it sends (probably does that already). Being scammed by hacked kindles is one thing, but they're not even trying here...
You could easily sign the log with the same certificate that is providing the DRM on the book itself. Or a different certificate. Encrypting things is not new, nor hard.
Doesn't work.

You would need to fake the logs for paid accounts, and since rev sharing is a formula of all paid subscriptions, you'd be hard pressed to make positive returns.

Not that any of those are difficult -- and all of the problems you list around connectivity are, uh... it's not like tracking it poorly resolves that problem, they're still finding some way to eventually sync the data up now, it's just lower-quality data.

But... even if those ARE difficult problems, shouldn't you try to solve them BEFORE you launch a business model where you promise people (ie, your authors) that you can do these things? Hell, especially if they're difficult problems, you should fix them before telling people you've solved them.

>(a) How could you reliably track this if the user always keeps their Kindle on airplane mode?

I wouldn't let users always in airplane mode participate in the program. Actually, I'm pretty sure they already can't, as they need to connect to get books from KU.

>(b) How could you track this accurately if the user reads a few hundred pages in the subway, where there's no Internet service?

By using this magic thing called computer storage, and syncing later...

>(c) How could you distinguish that between someone who hopped around a book in that same time period?

By observing how much time they spend in each page (with some allowances for different reading speeds, skipping, speed reading etc) and making sure they've legitimately read a good portion of the book.

Even if they haven't actually read it, but only mimicked the above, this constraint just made the fraudsters' process much much slower to complete.

>(d) If Kindles don't already track page views this way, how do you update the software on all Kindles to start tracking this way?

You simply require users to update their software to continue participating in KU, and give them a deadline.

Users need to connect to browse/get new books anyway.

>When do you switch your billing script to track purchases like that?

After the deadline, only people with updated KU software will be there, so no problem, you just switch it.

In between, you could always switch it on an account basis (like you already have KU and non-KU account and other tiers) -- those who already updated get the new behavior, etc.

>(e) If you're QAing a Kindle and you spot this loophole, how do you do all these fixes? How long are you willing to keep the software from shipping? How certain are you that your theoretical solution is better than what's already shipped?

It's not like this things are rocket science. Companies do such QA an keep back BS products for a few months all the time. Even companies losing billions from doing so, like Apple. For Amazon, which barely breaks even and lots of offerings are loss leaders that's even easier.

>Product development is hard, and it makes me angry when people handwave it as "gross incompetence" from a position of ignorance.

Hard or not, there are always lots of cases of actual, bona fide, certified, 100% legit, "gross incompetence" too...

Right. There is no easy, tradeoff-free way to automate the tracking and proportional payment process.

Which means that Amazon really does need to move to a more Apple-like human curation process for all new authors, and/or for all new titles. Doing so will immediately tank precious vanity metrics like # of titles added to the store each month. But the alternative is an ever-growing jungle of weeds crowding out the legitimate works. The more that happens, the harder it will be to eventually weed the garden.

I don't see why it needs to be always online. Just locally record the number of pages that were in view for X amount of time.
I'm also assuming that they had to take a bit of a lowest common denominator approach to m2m communication given that they have cell-based (read - costs amazon money) and wifi (does not cost amazon money) enabled versions of the device. If they tracked every page read and sent a log periodically, that _could_ get expensive quickly on the part of the cell-based versions depending on what network agreement they have (numerex, for example, still charges by the kb for this type of low byte traffic). Given that the rules needed to be the same for both types of devices, you couldn't necessarily have an if(wifi){ //send log} else { //send last page syncd} code branch. This is just a giant guess given that I know nothing of amazon's partner network agreements.
I have foobar2000 setup to track my plays for each song. It has an adjustable slider that I have set for 35%. Once 35% of the song has been played it increments the play count. It doesn't even need internet to do this! This stuff isn't that hard.
Spotify does it similarly: 30 seconds == a stream.
> (a) How could you reliably track this if the user always keeps their Kindle on airplane mode?

Spotify's solution is to require a device to sync at least once every 30 days or the offline content expires.

> (b) How could you track this accurately if the user reads a few hundred pages in the subway, where there's no Internet service?

With a log of events that syncs when they do reconnect.

> (c) How could you distinguish that between someone who hopped around a book in that same time period?

By measuring the duration spent on each page.

Are you serious ? All of this is fixable in software, in a way that would just penalize revenue from (too) fast readers.
We run a small microsite service for designers and once enabled single page view tracking metrics - we had at the time very few customers and yet manage to smash trough our 50k keen.io event allowance in a single day. Can't imagine it on sonething where books have hundred pages and users running in the million
You can do the aggregation locally on the device. You wouldn't want to send every page view as an immediate event, just send the aggregates every 15 minutes or at the start or end of each session.
That seems like an exceptionally low allowance for any kind of page view tracking. Given that they have the technology in place server-side, and it's not an incredibly hard problem, the server-side costs to Amazon of doing this would be tiny. No comment on the cost of designing a decent algorithm and keeping ahead in the cat and mouse games.
eh it's the cost of using a prepackaged solution. I'd move to an internal one but up until now developing features for the app had more precedence than developing a state of the art event tracking solution.
Yeah, the prices aren't that insane overall tbh. The small size of the free-tier suprised me, but my impression is that free tiers on services have been shrinking since last time I was in that size of company
You don't think amazon can handle a few GB of log files every day for the entire kindle unlimited service? Really? Worst case you do some sampling.
I'm sure they can do, they bill themselves at a different price
It's not even about the billing. Companies 1/1000 the size of Amazon can afford it.
you mean 10 million a year revenue company? I wish we'd be there.
It needn't be a tracking event per page.

They're already firing {"current_page":3000}. They just need to start doing something like {"current_page":3000,"pages_seen":5}.

All of these issues were solved over a decade ago by sales contact software.
Kindles could store, locally, the pages that are displayed, and then upload to the server when they have internet connection
That's the difference between people designing a system against a malicious adversary and a user.

Nowadays, if it involves money, for godsakes please assume the former.

the system is still bad even without a malicious party, as said in the article, if the user go back to page 1 after reading the book the system will count the book as unread and the author will not get paid.
Pretty sure that's not true. They sync "furthest page read" not "last page read".
I'm surprised "start read" times and "stopped read time" and "actual pages read" isn't taken into account in the whole process. If an entire 1000+ page book is read within less than 5 minutes, something is wrong. I don't care how fast of a reader you are, my classic Kindle wouldn't be able to process that many pages in under 5 minutes anyway.
Is "incredible gross incompetence" good or bad? I have problems judging people, so need help.
incompetence is bad. gross incompetence is bad on a widespread and surprising scale. incredible gross incompetence is shockingly bad on a widespread and surprising scale.

"Incredible" doesn't mean good, by definition - it means unbelievable.

Why is incompetence bad? I have no ego and don't have a way of judging things like this. I think I am unusual in not having an ego. I can't assert that anything is "bad" or "good", including incompetence. I understand "incredible" here. I also understand "gross", although "surprising" is a bit difficult. I have no ego, so no reference of what is surprising. For those with egos, surprising is usually in reference to the ego.

For example, a tebibyte of RAM is surprising to an OSX programmer. A tebibyte of RAM is not surprising to someone working on seismic simulations for oil extraction.

I do see your point. But I think you are being too pedantic... By saying it's incompetence and "bad", the poster probably meant that the people at Amazon aren't doing a great job (in achieving whatever goals they want to achieve or we customers want them to achieve, in this case keeping the scammers out). Maybe the poster is just frustrated with Amazon not doing its job. It doesn't necessarily have anything to do with egos in the sense that the people at Amazon are pathetic idiots that that we should scoff at or that we can indulge in a sense of superiority. It's simply about solving a problem.

Also, I don't think you were fully correct in saying that surprising is usually in reference to the ego. I think it's simply in reference to what you've seen before, what you are used to seeing, what you expect to see or would like to see. It's not necessarily ego-related. But yeah, having an open mind and not being limited by what you've seen is important. But I digress.

Please remove you ego and try again.