Hacker News new | ask | show | jobs
by akyu 1423 days ago
No evidence for nudging =/= nudging doesn't exist.

I'm fairly sure anyone who has done A/B testing at scale has plenty of evidence that nudging works. Perhaps not up to the standard of science, but there are literally people who manipulate choice architecture for a living and I'm fairly convinced a lot of that stuff actually works.

7 comments

"... evidence that nudging works. Perhaps not up to the standard of science..." That's pretty close to saying it doesn't work. The point of this meta-study was precisely to show that the evidence claimed to support nudging was probably attributable to random variation + unnatural selection, where the unnatural selection was publication choice: either the researchers who got negative (null) results chose not to bother writing it up and submitting it, or papers that reported negative were rejected by publishers.

There are lots of people who do X for a living, but where X doesn't work: palm readers, fortune tellers, horoscope writers, and so on. I'm not even sure that funds managers reliably obtain results much above random.

I think what’s not clear is what’s in those papers and what exactly they have to say about nudging and what definition they’re using. It defies credulity to think that changing defaults in software doesn’t change behavior if only because most users aren’t technically savvy enough to change their settings.

On the other hand the dream of nudge theory is something like a study done in the UK that suggests that adding the line “most of your fellow citizens pay their taxes” will increase the likelihood that people pay taxes. This I’d be more likely to believe the benefits are not clear, and more importantly difficult to replicate across time and culture.

It seems that trying to do a meta-analysis on all of nudge theory (or large categories of it) would indeed show know impact. It’s not like you’re testing one thing, you’re comparing well designed programs, with ones that aren’t.

>That's pretty close to saying it doesn't work.

No it's really not.

To say things a different way, I don't think this study will change anything for people actually doing choice architecture in applied settings. They have results that speak for themselves.

> results that speak for themselves.

This is exactly how a midwife explained to me why she uses magic crystals. She told me that there's science, and there's results, and that she's seen the crystals work.

Obviously they don't work by magical vibration, but are you sure they don't work at all? If the midwife feels and acts more confident from having that tool or the mother feels more relaxed because she thinks they will make the process easier, then the crystals do, in fact, work. They just don't work through the mechanism those individuals think they do.
I mean, yeah, if she has solid RCT data on thousands to millions of childbirths and has found a statistically significant impact from using the magic crystals, I would support their use. A/B as well as scientific research uses the same basis.

The issue is that in fact the midwife will not have such data. The comparison being made is that A/B testing, if run competently, is pretty close to scientific research, in particular for research related to nudging.

I wonder how many engineers crack open a statistics book to find the correct test versus just plotting box plots and saying "see looks pretty different"
To be fair, the more profound a result the less math you need to convince anyone it is the case.
But if run rigorously, A/B testing is identical to scientific research, and the scientific research fails to show an effect.
The OP was referring to A/B tests that were "perhaps not up to the standard of science", not ones that were already science.
"I don't think this study will change anything for people actually doing choice architecture in applied settings." Probably true, but then evidence that horoscopes etc. don't work, doesn't prevent people from drawing horoscopes, or other people from relying on their horoscope to plan out their day.

"They have results that speak for themselves." Let me put my point differently. Suppose that nudges don't have any effect at all (null hypothesis). More concretely--and just to take a random number--suppose that 50% of the time when a nudge is used, the nudgees happen to behave in the direction that the nudge was intended to move them, and 50% of the time they don't move, or they move in the opposite direction. And suppose there are a number of nudgers, maybe 100. Then some nudgers will get better than random results, while others will get no result, or negative results. The former nudgers will have results that appear to speak for themselves, even if the nudges actually have no effect whatsoever.

This is the same as asking if a fair coin is tossed ten times, what is the probability that you'll get at least 7 heads. The probability of such a number of heads in a single run is ~17%. So 17% of those nudgers could be getting apparently significant results, even if their results are actually random.

I think gp and you probably see eye to eye, but gp has a problem with your phrasing. If the effect does not live up to scientific rigour, that (more or less) implies that the effect is roughly indistinguishable from randomness.

If folks have results that speak for themselves, then the effect more than likely is scientifically rigorously testable. It may already have been - by those very results.

They would be the people who published, in this scenario.
Seriously, what about that kind of publication bias: A/B tests don’t get published.

If you run a useful system where it would be meaningful and interesting to know whether a social science theory actually applied, you might run an A/B test to see if it works. If it works, it is adopted—but it is almost never published. And that is for two reasons: 1. no incentive to publish and 2. major incentive not to publish. #2 is recent (post Facebook experiment) and it is specifically because a large portion of the educated public accepts invisible A/B testing but recoils with moral indignation at the use of A/B testing results in published science. Too bad: Facebook keeps testing social science theories, but no longer publishes the results.

The standards of selecting a result of an A/B test are less stringent than those of publication for the advancement of knowledge. For publication, the goal is to determine whether a model is accurate. For A/B testing, the goal is to select the best design/intervention. The difference is that for scientific testing "inconclusive" means that there isn't enough evidence to consider it a solved problem and it should have more research, while in A/B testing "inconclusive" means that any effect is small so you should pick an option and move on.

As an example, suppose I flip a coin 1000 times and get heads 525 times. The 95% confidence interval for the probability of heads is [0.494, 0.556], so from a scientific standpoint I cannot conclude that the coin is biased. If, however, I am performing an A/B test, I would conclude that I'll bet on heads, because it is at worst equivalent to tails.

I think you are missing the point. With academic publication bias, sometimes an unbiased coin gets heads 600 times by chance. Those studies get published. But, if you ran the test again, you might only get 525. That study won’t get published.

And, in opposition to your assumption: there is nothing to prevent A/B tests being published with high academic standards— like a low p value and tons of n. In an academic context, that’s just fine— it’s a small but significant effect.

A/B tests are simply controlled experiments—which are the gold standard of scientific evidence generation in psychology. My point is that the main generators of this evidence are only permitted to use this evidence to inform commerce not public knowledge. That is a loss for science and public policy, in my opinion.

You don't have to prove something doesn't exist , you have to prove it exists.
Absolutely.
They note that there is no evidence for nudging as being generally effective. So any individual nudge could be effective (except in finance in which they found that none are effective).
"We studied X extensively and there is no evidence that it works" is a textbook example of how scientists say "X doesn't work".

Except the article is more specific and has way more details than that.

>I'm fairly sure anyone who has done A/B testing at scale has plenty of evidence that nudging works

Lol! A/B testing in practice is rife with P-hacking and various other statistical fallacies.

What exactly makes you convinced that it works? To be specific: why wouldn’t there be bias in the A/B testing results, too?

There are literally people who give astrological analyses for a living.

A/B testing has a ton of issues as well that make it easy to be fooled

https://biggestfish.substack.com/p/data-as-placebo

Of course.
We are talking about publication bias, where the decision whether to publish something is biased by the outcome of the experiment.

I think this doesn't really apply to A/B testing, because people are incentivized pay as much attention to negative results as to positive ones.

From what I’ve seen there is even more incentive to focus on positive A/B tests. It’s the way you get credit for your work at a company. A negative test is counted as barely anything. So your incentive is to run tons of tests, then cherry pick only the positive ones and announce them widely. Another strategy is to track multiple metrics for each test and not adjust for that when computing p values. But then at the end you only report the one metric that was positive.
People are incentivized to pay attention to the result that increases their mid-year bonus the most.
I cannot share the reason I am convinced it works. But I can tell you I am convinced it works.

I'm sure many people here are in similar situations.

Great minds! I was writing more or less the same thing, you beat me to publication by three minutes.