Hacker News new | ask | show | jobs
by evandwight 1374 days ago
>>>0.4% or 163 books sold 100,000 copies or more

>>>0.7% or 320 books sold between 50,000-99,999 copies

>>>2.2% or 1,015 books sold between 20,000-49,999 copies

>>>3.4% or 1,572 books sold between 10,000-19,999 copies

>>>5.5% or 2,518 books sold between 5,000-9,999 copies

>>>21.6% or 9,863 books sold between 1,000-4,999 copies

>>>51.4% or 23,419 sold between 12-999 copies

>>>14.7% or 6,701 books sold under 12 copies

- Kristen McLean from NPD BookScan

7 comments

BookScan isn't a reliable source of information unfortunately. It only counts when a book's ISBN is physically scanned over a scanner (no ebook, audio book, libraries, specialty sales, etc) and also only covers 75% of retail in general. Generally, you can bet BookScan largely undercounts by a very wide margin.
The above numbers are from her response in the comments in the original article, it's worth reading that to understand how those numbers are made up, she explains in depth
I would be interested to see kindle data for self-published
The majority of books written don't even get published - so the majority of books are definitely read by less than 12 people.
This way of looking at it might justify the statistic, but at the cost of making it uninteresting.
Interesting enough to me, and pretty relevant to a claim like "book-writing is rarely commercially worthwhile". No "points scored" against the publishing industry, but point-scoring is for shallow people.

For publishing, I wonder how many copies of little-bought books are read, and how many are printed -- both probably quite different to the number sold. And I also wonder how the outcome distribution compares to venture capital outcomes, and what predictor variables are useful. "Harry Potter" is a famous case of prediction being difficult (or at least badly done?) but you can probably get some signal from author (writing history, other celebrity), genre.

what counts as a book? If a book thats not published is a book, what about a collec5uon of my notes and memoirs on my blog. It got read by loads of people, should it count to the statistics?
This ambiguity around what a book is seems like an artificial one. Go to a bookstore or view the catalog in Apple Books. Those are books, even the $0.99 micro stories one might find on, e.g., Kindle. Anything else might be a book, but probably not in a way that is useful or constructive to analyze in the context of sales.
A sample of 75% of retail sales is a substantially larger sample than, say, the estimates by the US Census.

If you believe that BookScan isn't reliable, then you also have to believe that the US Census data is totally crap.

> Because this is clearly a slice, and most likely provided by one of the parties to the suit, I decided to limit my data to the frontlist sales for the top 10 publishers by unit volume in the U.S. Trade market. My ISBN list is a little smaller than the one quoted in the DOJ, but the principals will be the same.

> The data below includes frontlist titles from Penguin Random House, Simon & Schuster, Hachette Book Group, HarperCollins, Scholastic, Disney, Macmillan, Abrams, Sourcebooks, and John Wiley. The figures below only include books published by these publishers themselves, not pubishers they distribute.

When you limit your data to those published by (fairly) large publishers, you've already skewed the data irreparably. Most of them won't even look at a book unless an agent brings it to them, and most agents won't represent most would-be authors.

On the other hand, some technical books don't require agents, and O'Reilly has to be a very large publisher in terms of books sold.

Some other categories don't, either -- I know someone who publishes "cozy mysteries" through a real publisher (not a giant one), and she doesn't have an agent.

> On the other hand, some technical books don't require agents, and O'Reilly has to be a very large publisher in terms of books sold.

I think you are drastically over-estimating the share of developers that read software development related books.

Out of the 150+ software engineers I worked with on a daily basis throughout my career so far, I can guarantee you not even 5% have ever read programming books (and I work at a FAANG, not your average mom and pop shop).

It's a niche market.

Really? I can't think of any dev I've worked with who didn't at least have some reference books handy. Though to be fair, it's been 10 years or so since I've worked in-person with people on a daily basis, so maybe my impression is just way out of date.

I still buy the occasional programming book, but nothing like I used to now that we have all the online resources.

> some reference books handy

Looking at a random list [1] of O'Reilly books, I can see 3 categories:

- The ones for beginners, like "Learning Python" or "JavaScript: The Definitive Guide",

- The ones that will be outdated even before reaching the shelves of a library, such as "Kubernetes in Action" or "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow",

- The ones that are more about concepts such as "Clean Code".

I can't see any of those being used as a reference book. The Internet and official documentations is the reference book.

[1] https://www.toptechskills.com/top-tech-books/

At least half the people at my current job buy technical books because we get a monthly book allowance. Whether they finish reading them is another story.

In my experience this is pretty common. Almost everywhere I’ve worked has had some kind of training budget, and most places have had fairly well attended book clubs.

I think O'Reilly has really tried to adapt to the online revolution. I don't know how well it's worked.
Thinking over my last few programming book purchases, they're really more or less textbooks. I get them for the structure they provide to in-depth learning, which usually doesn't work so well with online materials.
well, you have a different 150 than I do, I'm afraid, and I was also at a FAANG. A really high percentage of people I know have O'Reilly books on their shelves.
Also, the distribution is skewed. Between my wife and my own collection, we have owned at least 120 O'Reilly books.
It used to be that Addison-Wesley was the unrivaled king of CS publishing. If you saw the AW logo on the cover, you knew it was gonna be a rock solid book. Sadly, at some point they seem to have slacked off on their standards a bit, and now it looks like they play second fiddle to O'Reilly. (I don't know if at present there's much of an advantage in quality either way.)
Is this the moment in the conversation where someone steps in to commend Fred Brooks and David Parnas?
With such a large portion less than 1000, it would be nice to see it broken down more. Was it more 20, or 900?
Well the distribution would almost have to be skewed to the lower side given then general trend(but obv this isn't a very scientific method)
Can you just plot the histogram and guesstimate based on the distribution?
Before you try to try to interpret these numbers, you should be aware:

- the numbers are a for a 12 month period of sales, not for books published in a particular 12 month period (see below for why this matters)

- some of those books were published near the start of the 12 month period, so the count represents their first 52 weeks of sales

- some of the books were published in the last week or the 12 month period, so the count represents their first couple of weeks of sales

- some of the books were published almost a year before the start of the period, in which case the count represents the number of sales within the last couple of weeks of their first year of sales (sales >12 months after publishing don't count as 'frontlist' so aren't included in these numbers)

tl;dr the % figures towards the bottom of the list are probably too pessimistic

McLean's comments are spot-on, if you read them carefully. She describes herself as a "numbers gal" and she is.

Limiting it to the top publishers immediately leaves out lots of books. But OK, these are major players who are putting their own resources on the line for some books, so that's a valid slice.

For that 14.7% that sold under 12 copies: as a self-published author, I have to say, "Why not my book instead of that crap?"

The problem, of course, is that they didn't expect it to sell that few copies when they printed it. They didn't say, "Hey, this one looks like 10 copies or so. Let's go with it!"

What does "book" and "sell" mean here, following the article's excellent explanation on how those terms vary wildly. Are those figures per year or lifetime for the books?
This reminds of the Jordan Peterson thing where when he's asked "Do you believe in God" he replies with "Depends on what you mean by believe and God".
To be fair to JP, he's not always right but in this instance, he does have a point.

I believe in physics but I also believe that the map is not the territory. Go back a few centuries and the model of the atom was that of the Christmas pudding. We now know better but our own models are most certainly not correct. Thus I'm believing in something I know to be faulty on some level.

Similarly with God, each religion's conception is different. If you believe in a pagan religion you believe in lots of different gods. If you believe in a monotheistic religion you believe in one God.

It's very difficult to have a conversation free of misunderstanding on abstract matters unless you have a good grasp of the underlying concepts your conversational partner embodies in a word and vice versa. With a subject as touchy as religion, it's prudent to define terms early.

Likewise, I think asking for definitions of 'book' and 'sold' are perfectly valid questions.

Do we have similar statistics for the App Store?