| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by akyu 1423 days ago
	No evidence for nudging =/= nudging doesn't exist. I'm fairly sure anyone who has done A/B testing at scale has plenty of evidence that nudging works. Perhaps not up to the standard of science, but there are literally people who manipulate choice architecture for a living and I'm fairly convinced a lot of that stuff actually works.

7 comments

mcswell 1423 days ago

"... evidence that nudging works. Perhaps not up to the standard of science..." That's pretty close to saying it doesn't work. The point of this meta-study was precisely to show that the evidence claimed to support nudging was probably attributable to random variation + unnatural selection, where the unnatural selection was publication choice: either the researchers who got negative (null) results chose not to bother writing it up and submitting it, or papers that reported negative were rejected by publishers.

There are lots of people who do X for a living, but where X doesn't work: palm readers, fortune tellers, horoscope writers, and so on. I'm not even sure that funds managers reliably obtain results much above random.

mikkergp 1423 days ago

I think what’s not clear is what’s in those papers and what exactly they have to say about nudging and what definition they’re using. It defies credulity to think that changing defaults in software doesn’t change behavior if only because most users aren’t technically savvy enough to change their settings.

On the other hand the dream of nudge theory is something like a study done in the UK that suggests that adding the line “most of your fellow citizens pay their taxes” will increase the likelihood that people pay taxes. This I’d be more likely to believe the benefits are not clear, and more importantly difficult to replicate across time and culture.

It seems that trying to do a meta-analysis on all of nudge theory (or large categories of it) would indeed show know impact. It’s not like you’re testing one thing, you’re comparing well designed programs, with ones that aren’t.

akyu 1423 days ago

>That's pretty close to saying it doesn't work.

No it's really not.

To say things a different way, I don't think this study will change anything for people actually doing choice architecture in applied settings. They have results that speak for themselves.

pessimizer 1423 days ago

> results that speak for themselves.

This is exactly how a midwife explained to me why she uses magic crystals. She told me that there's science, and there's results, and that she's seen the crystals work.

msrenee 1423 days ago

Obviously they don't work by magical vibration, but are you sure they don't work at all? If the midwife feels and acts more confident from having that tool or the mother feels more relaxed because she thinks they will make the process easier, then the crystals do, in fact, work. They just don't work through the mechanism those individuals think they do.

rsanek 1423 days ago

I mean, yeah, if she has solid RCT data on thousands to millions of childbirths and has found a statistically significant impact from using the magic crystals, I would support their use. A/B as well as scientific research uses the same basis.

The issue is that in fact the midwife will not have such data. The comparison being made is that A/B testing, if run competently, is pretty close to scientific research, in particular for research related to nudging.

asdff 1423 days ago

I wonder how many engineers crack open a statistics book to find the correct test versus just plotting box plots and saying "see looks pretty different"

saalweachter 1423 days ago

To be fair, the more profound a result the less math you need to convince anyone it is the case.

pessimizer 1423 days ago

But if run rigorously, A/B testing is identical to scientific research, and the scientific research fails to show an effect.

sacrosancty 1423 days ago

The OP was referring to A/B tests that were "perhaps not up to the standard of science", not ones that were already science.

mcswell 1423 days ago

"I don't think this study will change anything for people actually doing choice architecture in applied settings." Probably true, but then evidence that horoscopes etc. don't work, doesn't prevent people from drawing horoscopes, or other people from relying on their horoscope to plan out their day.

"They have results that speak for themselves." Let me put my point differently. Suppose that nudges don't have any effect at all (null hypothesis). More concretely--and just to take a random number--suppose that 50% of the time when a nudge is used, the nudgees happen to behave in the direction that the nudge was intended to move them, and 50% of the time they don't move, or they move in the opposite direction. And suppose there are a number of nudgers, maybe 100. Then some nudgers will get better than random results, while others will get no result, or negative results. The former nudgers will have results that appear to speak for themselves, even if the nudges actually have no effect whatsoever.

This is the same as asking if a fair coin is tossed ten times, what is the probability that you'll get at least 7 heads. The probability of such a number of heads in a single run is ~17%. So 17% of those nudgers could be getting apparently significant results, even if their results are actually random.

Beldin 1423 days ago

I think gp and you probably see eye to eye, but gp has a problem with your phrasing. If the effect does not live up to scientific rigour, that (more or less) implies that the effect is roughly indistinguishable from randomness.

If folks have results that speak for themselves, then the effect more than likely is scientifically rigorously testable. It may already have been - by those very results.

DangitBobby 1423 days ago

They would be the people who published, in this scenario.

dr_dshiv 1423 days ago

Seriously, what about that kind of publication bias: A/B tests don’t get published.

If you run a useful system where it would be meaningful and interesting to know whether a social science theory actually applied, you might run an A/B test to see if it works. If it works, it is adopted—but it is almost never published. And that is for two reasons: 1. no incentive to publish and 2. major incentive not to publish. #2 is recent (post Facebook experiment) and it is specifically because a large portion of the educated public accepts invisible A/B testing but recoils with moral indignation at the use of A/B testing results in published science. Too bad: Facebook keeps testing social science theories, but no longer publishes the results.

MereInterest 1423 days ago

The standards of selecting a result of an A/B test are less stringent than those of publication for the advancement of knowledge. For publication, the goal is to determine whether a model is accurate. For A/B testing, the goal is to select the best design/intervention. The difference is that for scientific testing "inconclusive" means that there isn't enough evidence to consider it a solved problem and it should have more research, while in A/B testing "inconclusive" means that any effect is small so you should pick an option and move on.

As an example, suppose I flip a coin 1000 times and get heads 525 times. The 95% confidence interval for the probability of heads is [0.494, 0.556], so from a scientific standpoint I cannot conclude that the coin is biased. If, however, I am performing an A/B test, I would conclude that I'll bet on heads, because it is at worst equivalent to tails.

dr_dshiv 1423 days ago

I think you are missing the point. With academic publication bias, sometimes an unbiased coin gets heads 600 times by chance. Those studies get published. But, if you ran the test again, you might only get 525. That study won’t get published.

And, in opposition to your assumption: there is nothing to prevent A/B tests being published with high academic standards— like a low p value and tons of n. In an academic context, that’s just fine— it’s a small but significant effect.

A/B tests are simply controlled experiments—which are the gold standard of scientific evidence generation in psychology. My point is that the main generators of this evidence are only permitted to use this evidence to inform commerce not public knowledge. That is a loss for science and public policy, in my opinion.

themitigating 1423 days ago

You don't have to prove something doesn't exist , you have to prove it exists.

akyu 1423 days ago

Absolutely.

zeroonetwothree 1423 days ago

They note that there is no evidence for nudging as being generally effective. So any individual nudge could be effective (except in finance in which they found that none are effective).

marcosdumay 1423 days ago

"We studied X extensively and there is no evidence that it works" is a textbook example of how scientists say "X doesn't work".

Except the article is more specific and has way more details than that.

aaaaaaaaaaab 1423 days ago

>I'm fairly sure anyone who has done A/B testing at scale has plenty of evidence that nudging works

Lol! A/B testing in practice is rife with P-hacking and various other statistical fallacies.

omginternets 1423 days ago

What exactly makes you convinced that it works? To be specific: why wouldn’t there be bias in the A/B testing results, too?

There are literally people who give astrological analyses for a living.

zeroonetwothree 1423 days ago

A/B testing has a ton of issues as well that make it easy to be fooled

https://biggestfish.substack.com/p/data-as-placebo

akyu 1423 days ago

Of course.

lIl-IIIl 1423 days ago

We are talking about publication bias, where the decision whether to publish something is biased by the outcome of the experiment.

I think this doesn't really apply to A/B testing, because people are incentivized pay as much attention to negative results as to positive ones.

zeroonetwothree 1423 days ago

From what I’ve seen there is even more incentive to focus on positive A/B tests. It’s the way you get credit for your work at a company. A negative test is counted as barely anything. So your incentive is to run tons of tests, then cherry pick only the positive ones and announce them widely. Another strategy is to track multiple metrics for each test and not adjust for that when computing p values. But then at the end you only report the one metric that was positive.

aaaaaaaaaaab 1423 days ago

People are incentivized to pay attention to the result that increases their mid-year bonus the most.

akyu 1423 days ago

I cannot share the reason I am convinced it works. But I can tell you I am convinced it works.

I'm sure many people here are in similar situations.

mcswell 1423 days ago

Great minds! I was writing more or less the same thing, you beat me to publication by three minutes.