Hacker News new | ask | show | jobs
by Phemist 2532 days ago
Fun anecdote about some of my own dabblings in online psych experiments and MTurk.

7 or 8 years ago, my then supervisor and I experimented with running psychological (behavioural) experiments on MTurk. We had created a method that would run in browsers in native javascript and were looking to validate it. Coming from a testing-1st-year-psych-students-in-the-basement-of-the-faculty-building kind of thing, we naively took what the going wage was for that (something like 8 euros/hour), and put an experiment online that would take around 20 min to complete, and paid 2 euros.

My supervisor used his own credit card and set the spend limit to around 1000 euros; thinking that we'd never hit it. Boy how wrong were we. Apparently this 6 euro/hour wage was _much_ _much_ higher than the going rate, and we hit the spend limit in around 2 hours. Even though we had to throw out around 70% of the completions, we ended up with usable data from around 150 participants.

We went in expecting to run the experiment for a week or 2, and get maybe 50 participants, but came out with 500 in a span of 2 hours. Safe to say, we celebrated a job well done that night over one or two drinks. People were commenting on how nice of a change of pace it was compared to the then-usual MTurk tasks and that they would have done it for free. Some even left their e-mail addresses should we run another online experiment. I've since been out of the field of online behavioural experimentation, but it seems to have taken off quite a bit.

1 comments

Holy smokes that is an amazing story. Do you have any more? I am always fascinated to hear how researchers design studies to prevent people from just spamming in any old answer.
I'm not sure what the state of the art is nowadays. Back in the day. The method at the time was so new, we just assumed that there would be no bots that would be able to complete the experiment successfully, or that they would stand out like a sore thumb upon analysing the data. Modern frameworks will eventually have to deal with this I'm sure. The structured way in which experiments are defined also make it easy to develop tools to automate this. Unfortunately I'm not into the field enough anymore to know how people are dealing with it.

The biggest problem we had was with people with high latency connections. The effects we were looking for were measured in 10s of milliseconds. In order to tease out these effects, we had to be very particular about the timing of when certain stimuli were presented to the participant. High latency page reloads (which were unavoidable in the system we built our method on) would mess with this high-precision presentation requirement. We measured the latency, but did not pre-emptively exclude anyone based on their latency, hence the high % of people with unusable data in our initial validation experiment.

For subsequent experiments I built a "loader" screen that would pretend to be loading the experiment. What it in fact was doing was refreshing the webpage several times (ofcourse, while progressing a progress bar) to measure the latency of the connection of the participant. High average latency + high variance latency connections were excluded. The tresholds were based on what we found in the early validation study.

Surprisingly, after throwing out the high-latency data, there were no other exclusions necessary. It seemed that for our validation study, all participants were very attentive during the experiment.

In "in-person" cases, researchers would add attention checks to their experiments with the logic that failing these attention checks by itself is no indication of "spamming", but seeing weird quirks in the data + failed attention checks would be. One that my supervisor was fond of was throwing in instruction screens, in the middle of runs of trials that required you to press a very specific button to continue. E.g., the experiment has you press 'A' and 'J' constantly, and to continue you have to press 'N'. Secretly though, A and J were also valid ways to continue with the experiment. The thought being that if you were hammering one of those buttons to quickly get through the experiment, you would also skip past the instruction screen very quickly with it.