It’s amazing what some people will do in order to make a buck-fifty.
Two recent studies have brought to light how sophisticated panel fraud has become. There are still the old-fashioned frauds: the people who respond “good” or “lkjlgadlkj” to open-ends, speeders, liars, straightliners, etc. We covered all this in our report Still More Dirty Little Secrets of Online Panels.
But in a recent panel study, we discovered it’s becoming much worse.
Working with a variety of panels on a low-incidence study, we asked charitable donors two open-ended questions:
- Which organization did they give the most to?
- What made that organization rise to the top of their giving?
Our First Hint
From one well-known panel, we received responses that just didn’t feel right. The organizations people named were all legitimate brands, but the names were formal and complete. We’re used to seeing “Goodwill” or “Red Cross,” but most people don’t write in the full official names “Goodwill Industries International” or “American National Red Cross.” Yet we saw names such as Young Men’s Christian Association and 1000 Missionary Movement North America, Inc. Odd, but not definitive proof of a problem.
A Bit Too Good
The verbatims were well written and complete. In fact, a bit too well written and complete – no slang, no typos, no punctuation problems, no partial sentences. In other words, not the way people write (particularly when completing surveys).
We also noticed many respondents weren’t directly answering the question. We asked what made the organization rise to the top of their giving, but we got details about the organization’s work. For instance, “To promote and teach spiritual growth and development through mentoring and education of Christian based programs.”
Complete and articulate, and certainly it could be the reason someone would support this organization. But too many responses were too perfect.
Scraped from the Web
We did a Google search, and we found the reason this response was so complete and articulate is that it came verbatim from the website of an organization called 222 Ministries.
We discovered many other verbatims scraped directly from websites. All came from respondents 30 to 49 years old, and most of them had exact, official names such as Goodwill Industries International. Further, many of them started with numbers (e.g. 222 Ministries, 21st Century STEM Foundation).
Apparently, someone designed a bot to scrape websites for official-sounding organizational names and descriptions and populate our survey over and over with this information.
Easy to Overlook
Truly frightening is how realistic and diverse those responses were. They qualified for a very specific, low incidence study. All other responses were realistic and made sense (e.g. they weren’t claiming to earn $70,000 while giving $30,000 to charity). Somehow, they learned how to qualify for a low incidence study in order to commit fraud.
Although the panel company claims to do digital fingerprinting and ensure panelists only have one account, the same responses came from people with different demographics, including different locations. Consider three “respondents” naming 21st Century STEM Foundation:
- A 41-year-old Minnesota white male
- A 40-year-old Oregon American Indian male
- A 45-year-old Washington black male
All three verbatims were identical, yet for all its “quality control” the panel is allowing this person to complete surveys with multiple identities. With under $50,000 annual income, the chances this organization would be named once are low. Three times would require divine intervention.
Another panel gave us 24 “qualified” responses from one person who used an identical gibberish answer – and these responses supposedly came from males and females, different ages, different races, and three different states.
Denying There’s a Problem
What did the panel company (that on their home page promises “industry-leading data quality”) have to say about this? “Our quality team is questioning some of the bad data open ends, as those completes pass a lot of our criteria (well written, no quality flags). I really think the issue here is that there are a decent amount of good completes being thrown out ‘just to be safe.’”
In other words, “We are oblivious to fraudulent respondents who pass our obviously insufficient ‘quality checks’ and you’re just a pain in the neck.” Yes, the responses are well written and have no quality flags – they were written by marketing professionals to be included on each organization’s website!
Not an Isolated Case
I spoke with one of our field partners – Joey Harmon at Harmon Research (they were not involved with our study). Joey relates a similar experience on a skin care study.
“We asked a question to panelists about why they prefer a specific skin care product. The responses were extremely detailed and well written, to the point where initially we were really impressed with the quality. But that also raised some suspicions when we noticed that over 70 respondents out of 200 completes were of this nature, and some of the language we were getting just wasn’t how consumers write.”
Here are two examples of the verbatims:
- “It helps evaluate the body’s oxidative stress status and antioxidant reserve.”
- “A prolonged state of oxidative stress speeds up the skin aging process. Specifically, it contributes to the loss of collagen and elastin fibers, resulting in fine wrinkles, sagging, and texture changes.”
When they did a little exploring, they found these sentences were scraped directly from the Internet.
Not only do we have to guard against poor quality open-end responses – now we have to guard against open-end responses that are too good!
In Harmon’s short questionnaire with 40% incidence, respondents were probably getting paid $1.50. So those 70 responses likely netted someone a whopping $105 for all their time programming the bot. Imagine what fraudsters will do for a B2B study paying $10.
On our 3% incidence study with an extremely specific population, we rejected 58% of respondents for duplicates, multiple problems, and/or obvious fraud, including extremely sophisticated fraud that was hard to identify.
Given the panel company’s response, what do you think the chances are that these people were removed from the panel? No, they’re most likely all still there, waiting to give you some high quality verbatims on your next study.
The Moral of the Story
Unless you are searching through your respondents with a fine-toothed comb, by hand, one by one, looking for every possible problem, oddity, or unusual pattern, and unless you’re familiar enough with the product category to catch these issues, the chances are very good that you are getting taken for a ride.
Throwing in a “Please choose the third option” and tossing out speeders won’t cut it. Quality control algorithms and digital fingerprinting won’t cut it. And obviously from our experiences, depending on panel providers won’t cut it.
If you’re doing quantitative research, you have three choices:
- Trust what you’re getting is “industry-leading data quality” even though a large proportion of it is demonstrably garbage, and provide it to your clients so they can make wrong decisions from wrong data
- Commit to this level of data quality and invest the time and effort necessary to get there, no matter how painful it is
- Stop using panel
I doubt most of us will choose number three, as we still have quantitative work to do. I sincerely hope few choose number one…if this option remains popular, it will eventually mean the death of any remaining industry credibility.