One study with a grand total of 50 subjects and we're already saying that readers absorb less on Kindles? While interesting, the headline is click-bait and is not something we can say without further study.
Something did ping my radar, although it's hard to say because it's not published yet. What the news article says is:
But instead, the performance was largely similar, except when it came to the timing of events in the story. "The Kindle readers performed significantly worse on the plot reconstruction measure, ie, when they were asked to place 14 events in the correct order."
What I would like to know is: how many other performance measures did they test? How "significant" is "significantly worse"? If, say, they tested for 100 performance measures (unlikely, but I'm using a large number on purpose), then random chance means that there are likely to be some measures that are "significantly worse." If, on the other hand, they only tested 3 performance measures, then it's less likely to be random chance.
Basically, if you run an experiment and you test for a large number of things, you can't say much about the outliers. With large enough numbers, there are bound to be outliers. However, after you run such experiments, and you see those outliers, you can run more experiments to test if that was random chance, or if there really is some correlation there.
While the XKCD comic has a lot of truth to it, it's mainly about many different individual experiments (as well as some poorly done ones.) When running large sets of correlations, standard operating procedure is to use one of several techniques to counteract this effect.
The Guardian article linked actually doesn't present the statistics of the study, which hasn't been published yet. Absent further information, critiquing the sample size sounds pretty reasonable to me.
Withholding evidence isn't a defense against criticism. If you won't TELL ME your effect size, but you do tell me the sample size, I can certainly say, "I am skeptical of your conclusion, because of your sample size."
(Note: They don't link to the study in the article so naturally I cannot comment on the soundness of the study in question).
(EDIT After writing this comment the link was changed from the Guardian to the NYT, which provides more information, though the study has still not been published.)
Studies (especially psychological ones) with human subjects are very difficult and expensive to conduct, which is why sample sizes are often small. 50 is by no means an unusual size for an experiment of this sort.
An alternative is to use observational data, but it's very hard to differentiate between useful observational data and garbage observational data. Not only do you introduce a whole host of problems, but it's more difficult to parametrize the problems that are introduced. So you could easily create a study that boasts a large sample size, with a respectable p-value[0], but have no way of knowing which confounding variables were introduced during the data collection process.
A third option is self-reported data, which comes with an even bigger asterisk after it. For something like this, I'd much rather trust a controlled study of 50 than a self-reported survey of 300 (at that point you might as well post it as an 'Ask HN' and judge based on the comments!).
By contrast, while controlled experiments on human subjects are by no means unbiased or immune to confounding of variables during data collection, it's almost always easier either to limit these in a controlled setting or at least to parametrize them after they happen.
So in the end, this usually ends up being the best feasible option (not the best (theoretically) possible option), short of massively increasing funding to such studies.
[0] Which is usually the wrong way to look at studies anyway, but that's a separate topic of discussion