Using Amazon's Mechanical Turk to Recruit Transition Age Adults with Autism
Amazon’s Mechanical Turk (MTurk) has been described as comparable to other online platforms and in person laboratory studies within the social sciences and for clinical psychology (Thomas & Clifford, 2017). However, researchers have recently reported a decline in data quality on MTurk due to the use of Virtual Private Servers (VPS) which allow participants to circumvent traditional screening methods that allow the researcher some control over participant recruitment (Dennis, Goodson, & Pearson, 2018). Further, the use of MTurk in clinical populations has led to demographic deception (i.e., misrepresentation of personal characteristics for eligibility requirements) which can result in incorrect conclusions drawn from the data (Chandler & Paolacci, 2017; Kan & Drummey, 2018).
We examined issues related to MTurk and asked the following: What percentage of participant attempts in a survey of daily living activities are usable in a clinical sample of caregivers or individuals with a self-reported ASD diagnosis aged 18-22? What are the most common reasons for non-payment (i.e., failed to meet study eligibility)? What are qualitative and quantitative differences in responses from VPS-respondents and non-VPS respondents?
Participants were 370 individuals with a self-reported ASD diagnosis aged 18-22 years or their caregivers recruited via MTurk to answer questions about daily activities. Several screening questions (e.g., Individualized Education Program in school, formal diagnosis of ASD) designed to filter out ineligible participants were included. Once filtered, these participants were sent to the end of survey without payment. Further, additional questions were asked in multiple formats (e.g., age) to ensure eligibility (e.g., are you 18-22, what year were you born, how old are you?). Last, qualitative responses comparing respondents from VPS participants versus traditional internet providers were analyzed.
Out of 604 total attempts (including participants who attempted the survey multiple times), 39 were deemed usable leading to a 6.5% return rate. On average, participants responding from VPS servers attempted the survey four times while non-VPS participants attempted the survey two times. The most frequent reason for non-payment was failure to meet demographic criteria (e.g., out of age range, additional sensory diagnoses) and multiple attempts with inconsistent responses. Over fifty percent of response attempts were deemed unusable within the first seven demographic questions. Twenty percent of non-VPS respondents reported having an ASD diagnosis in one version of the survey and then reporting they did not have an ASD diagnosis in a different version, indicating participants switch their answers to meet inclusion criteria. VPS respondents use less words and nonsensical short answer responses when given open ended questions (see Table 1 for an example).
Despite the inclusion of several screening criteria, usable return rates were low with demographic deception occurring at high percentages. Thus, research conclusions based on MTurk samples should be interpreted with caution as traditional screening methods may not be adequate for screening out low quality or invalid responses. Consistent with previous recommendations, open-ended responses within surveys can assist in flagging problematic responses (Dennis, Goodson, & Pearson, 2018).