Effective Samples in Online Experiments: A Role for Volunteer Labs?

Earlier this month Science Magazine noted the potentially worrisome dependence of psychologists on Amazon.com's Mechanical Turk (MTurk) for experimental research subjects. Online labor markets, and MTurk in particular, have equipped social science (and many other academic and non-academic fields) with a low-cost, often reliable source of survey participants. "Social science for pennies" has caught on quickly, reflected by the rapid proliferation of academic publications using data from MTurk. While not everyone agrees about exactly how it should be integrated into social science, MTurk is generally perceived as an enormous and diverse subject pool for academic research.

So what's the cause for concern? As the Science piece points out, researchers using methods imported from wildlife ecology previously found that while MTurk boasts a participant pool of over half a million subjects, its effective sample is just 7,300. This essentially means that when studies are advertised on MTurk, only a small fraction of the potential pool is likely to participate. 

 This is obviously problematic if one of MTurk's most attractive features is its purported ability to provide diverse and representative samples. While certainly an extreme scenario, it would be strange and unsettling if most of the thousands of studies now being published using MTurk all relied on the same data points in the population. Small effective sample may be related to financial compensation incentives on behalf of Turkers--while thousands upon thousands of people may initially sign up, only a subset find Human Intelligence Task (HIT)'s involving social science surveys lucrative and desirable enough to scale up and participate repeatedly.

This one area in which, by design, volunteer labs might possess an advantage over MTurk and other paid online labs. Whereas paid subjects may become inactive once they become dissatisfied with compensation or simply find something better to do, volunteer subjects are presumably joining DLABSS or other initiatives for non-monetary purposes. This may create entirely new selection bias issues (that we will address in the coming weeks and months), but also might result in larger effective samples. Indeed, the DLABSS volunteer community is quickly approaching 6,000 people, just short of MTurk's effective sample size estimated in recent years. To date over half of DLABSS volunteers have participated in more than one study, and nearly 40% have participated in more than two. The average DLABSS volunteer has taken 3.7 studies. 

In any case, effective samples are just one of many challenges that social scientists are facing as they pivot to online experimental outlets such as MTurk. It remains to be seen precisely how severe MTurk's issues are, as well as the extent to which volunteer-based labs can help mitigate such problems. 

DLABSS is attempting to help fill these knowledge gaps by comparing its own subjects and their performance with MTurk and other survey modalities across a large range of experimental settings this summer. Of course, even if volunteer labs are cheaper and prove more effective at avoiding effective sample and other biases, we would like to see volunteer labs gradually emerge as complements to other existing survey platforms. For instance, they might help "break ties" when a study obtains different results across in-person and MTurk samples, and could help researchers discover bugs in their experimental research designs.

We plan on releasing a comprehensive summary of our initial findings in working paper during early Fall 2016. Our preliminary assessments suggest that DLABSS volunteers are in fact demographically similar to "Turkers" and that the Lab is able to replicate research findings obtained using other experimental modalities. In the meantime, we will publish snapshots of various findings as they come in over the summer. So stay tuned!