Stapel was a controversial Dutch psychology professor, very well-known because of his frequent media presence. He produced many eye catching social psychology papers that attracted media attention. One of his many headline-making papers purported to show that an untidy environment increased people’s racist tendencies. From 2006 onwards he was based at Tilburg University where he was appointed a dean of faculty in 2010. In September of the following year he was suspended by the university after allegations of data fabrication were made by junior researchers in his laboratory. He admitted that he had used faked data when it became clear that some of the scenarios that he had described were impossible. He returned his PhD to the University of Amsterdam before an investigation into his thesis data had been completed. A series of reports commissioned by three of his past employers found that many of his papers, his own PhD thesis and many of the PhD theses that he had supervised contained fabricated data. In June 2013 he made a plea bargain with Dutch prosecutors and was sentenced to perform 120 hours of community service and some financial penalties. In 2013 an interview with Stapel by Yudhijit Bhattacharjee was published by the New York Times . This interview gives fascinating details of how he fabricated some of his published data and also gives some insight into his motivation for perpetrating these frauds.
Stapel was born in 1966 and graduated with distinction from the University of Amsterdam in Psychology and Communication Science in 1991. He spent some time after graduation in the USA at the Universities of Chicago and Michigan. In 1997 he was awarded a PhD with distinction by the University of Amsterdam and his thesis won a prize by the Dutch organisation for social psychology (ASPO). From 2002 to 2007 he was president of the ASPO. He remained at the University of Amsterdam until he was appointed a full professor at the University of Groningen in 2000. In 2006 he moved to the University of Tilburg where he was appointed as a Professor of Cognitive Psychology. He helped to set up the Tilburg Institute for Behavioural Economics Research (TiBER) and became its director in 2007. This is a research institute within the university that is:
“Devoted to studying the psychological processes underlying individual choice and economic decision making from an interdisciplinary perspective.”
On September 1st 2010 Stapel was appointed Dean of the Tilburg School of Social and Behavioural Sciences. Almost exactly a year later in early September 2011 Stapel was suspended from his position at the university and shortly after this he was dismissed for fabricating data.
In November 2012, Stapel published a book detailing his fraudulent activities and trying to explain/justify his actions. This book entitled Ontsporing (translated as derailed or off the rails) is only available in Dutch. Copies of the book were leaked as Pdf files on the web to try to prevent him making money from his fraudulent activities. He seems to have made a number of attempts to obtain payments for speaking about his activities and even trying to appear in a short play based on his case. He has also appeared in a video where he tells his fellow rail travellers his story. During this video he suggests that he lost his moral compass but it is back now. This video is one of a series under the umbrella of the TEXxMaastrict Braintrain which are said to be inspiring talks by interesting speakers that were given on the intercity train between Maastrict and Amsterdam.
The pseudo-star of Dutch psychology
Stapel produced at least 138 English language papers many of them in highly regarded psychology journals and his findings often attracted the science correspondents of the popular media with their “sexy” themes: “meat eaters are more selfish” “an untidy environment encourages racist attitudes” “seeing someone crying increases empathy” “encouraging people to think about capitalism increases greed”. His papers usually had clear clean outcomes that supported the original hypothesis with statistically significant results. Many of the topics, like those listed, would be of at least fleeting interest to the ordinary newspaper reader even if they were intuitively unconvinced about the value of the work and the general applicability of the findings.
Just a few days before he was suspended by Tilburg University, Stapel and two of his colleagues made newspaper headlines in Holland and elsewhere (e.g., Dutch Daily News of 30/8/2011 when a press release of unpublished findings suggested that “Meat brings out the worst in people”. When people were made to feel insecure they were much more likely to choose steak from a choice of three dishes (steak, fish or omelette). When people were made to think of meat by being shown a picture of a steak they were more likely to make selfish choices in sharing situations than those shown neutral pictures of a cow or a tree. After eating meat people were said to feel less connected to others, lonely and unpopular. This study not only made newspapers headlines but was also picked up by societies promoting vegetarianism and animal rights. One of Stapel’s collaborators in this fictitious and never published study was Professor Roosje Vonk of Radboud University. Although not personally responsible for producing any fabricated data she was reprimanded for publishing premature conclusions related to data that she had not verified or collected. She later apologised for releasing this fabricated data; she wrote on her blog that the affair
“Shows how even us psychologists can completely misjudge people”.
In 2011 Stapel and a colleague published data in the prestigious journal Science which suggested that people are more racist in an untidy environment than in a tidy one. When cleaners were on strike at a Dutch railway station he claimed to have set up an experiment to assess homophobic and racist attitudes amongst white travellers. He asked white passengers to fill out a questionnaire to assess their level of racism or homophobia in exchange for a small reward. They were invited to sit in one of a row of chairs in which either a white or a black man was already occupying one of the chairs. The experiment was later repeated when the cleaners were back at work and the station in its normal tidy state. In the paper it is claimed that questionnaire responses were more racist and homophobic in the dirty environment than the clean one and that in the dirty environment the white passengers chose seats further away from the black man. He performed a similar study in a neighbourhood that had been made deliberately untidy for the study and again compared questionnaire responses in the neat and the untidy neighbourhood and also asked subjects to contribute to a charity called “Money for Minorities”. Again people gave more racist questionnaire responses in the untidy neighbourhood and also contributed less in the charity collection.
In his interview with Bhattacharjee, Stapel says that in his early research career he was frustrated by the messiness and complexity of real data which rarely produced unequivocal conclusions. Journals preferred clear cut and potentially headline-generating results so he set out to manufacture interesting and sexy data that would appeal to journal editors and to the popular media. He recounts how he set up an experiment to see how being exposed to an image of an attractive or an unattractive female would affect subjects’ ratings of their own attractiveness. His hypothesis was that those subjects exposed to the attractive image would rate themselves less attractive than those exposed to the unattractive image. The data he collected when he actually ran the experiment did not confirm his hypothesis and so he would either have had to abandon the experiments or repeat them. He decided instead to produce a set of results that would support his hypothesis. He played around with the ratings (on a 0-7 scale) until he produced a set of data that supported his hypothesis but in which the differences were not so large as to attract suspicion. He produced this data in a few hours spread out over a few days and then published the results in The Journal of Personality and Social Psychology in 2004 .
In this interview he also describes an experiment in which he would try to test whether individuals consumed more when primed with the idea of capitalism. With a collaborator, he devised a questionnaire which contained some questions relating to capitalism and consumption but subjects were also presented with a mug full of M&Ms to eat whilst filling in their questionnaire. For half of the subjects the Dutch word for capitalism was printed on the mug but for the other half the letters were jumbled. The hypothesis was that subjects would eat more sweets from the mug with capitalism spelt out than when the letters were jumbled. He got a student to prepare the questionnaires and mugs of M&Ms and then took them away saying he would run the experiment at a high school where a friend worked. He went home and tested out roughly how many M&Ms he consumed whilst filling in the questionnaire to get an idea of what was a reasonable figure. He then built his data set around this measurement with those exposed to the word capitalism consistently consuming a few more than the other group.
As a final example, he and a colleague designed an experiment to test whether seeing someone crying would increase levels of empathy. Children were to be given pictures of a cartoon character to colour; in half of these the character had a neutral expression but in the other half it was shedding a tear. After completing the colouring task the children were to be given sweets and asked if they would be willing to share these with other children. The results generated by Stapel showed that children exposed to the crying character were more likely to share than those who had been shown the neutral image. Stapel and his colleague at Tilburg, Professor Ad Vingerhoets prepared all the required material with the help of a research assistant and Stapel took this away to collect the data “from a school where he said that he had contacts”. A few weeks later, Stapel provided Vingerhoets with a seemingly interesting and publishable data set purportedly obtained using these materials. However, Vingerhoets became suspicious when he asked Stapel for access to the raw data in order to write the paper and was told that it had not been entered on a computer at this point, even though some quite sophisticated analysis had been conducted on it. Vingerhoets took advice on his suspicions but did not take any formal action; as far as I can tell he never published any papers with Stapel.
Stapel’s faked studies often followed a pattern, collaboration in design and preparation of materials and in writing up the data but with the actual data collection “organised” by Stapel alone e.g. he would tell graduate students that he knew of a school or college who would co-operate in collecting the data. In many cases he seems to have then gone home and constructed a set of data that would give clean and clear support for the hypothesis being tested. Co-authors would then collaborate in writing up the paper and graduate students would incorporate the data into their theses. It is unusual practice for a research supervisor to collect data for a graduate student. The supervisor would normally give guidance on the design, collection and analysis of research data but the student would usually collect and analyse their own data. Stapel’s unusual practices were generally accepted by his graduate students who no doubt would have been reluctant to question the practices of a famous professor.
Stapel’s suspension and dismissal were triggered by the actions of three graduate students. They collected evidence of Stapel’s misconduct and then reported their suspicions to the head of the department who forwarded them to the rector of Tilburg University. The rector interviewed Stael and was not convinced by his response to these accusations. Stapel soon realised that some of the scenarios he had described would not stand up to scrutiny and so he confessed his guilt. Earlier suspicions about some of Stapel’s data being too good to be true and doubts about its origins were not reported or were effectively suppressed by Stapel’s use of his power and prestige. During September 2011, three investigative committees were set up by the universities where he had spent most of his career:
- The Levelt committee chaired by Professor Willem Levelt was set up by Tilburg University
- The Noort Committee chaired by Professor E Noort at the University of Groningen
- The Drenth committee chaired by Professor PJD Drenth at the University of Amsterdam.
These three committees produced a joint final report of 100+ pages covering all aspects of this case and its wider ramifications on 28th November 2012. Many instances of definite or probable data fabrication were found amongst his published work, his PhD thesis and PhD theses that he had supervised and these are detailed in the next section. This report is freely accessible online. Whilst Dutch academics may feel ashamed of the actions of their countryman Stapel, they can feel considerable pride at the extremely open, transparent and thorough way his fraudulent activities were investigated and made known to the world.
In November 2011 Stapel voluntarily returned his PhD to the University of Amsterdam because he considered his recent activities made him unworthy of this honour. The University of Tilburg decided to initiate a criminal prosecution of Stapel and in June 2013 he reached an agreement with Dutch prosecutors. He was ordered to complete 120 hours of community service and to forgo some financial benefits from Tilburg University . He was found not to have defrauded the taxpayer because he had spent research grants on activities relating to research, like staff salaries even though he had fabricated much of the data that these awards were intended to generate.
The committees that investigated Stapel’s activities were made up of no less than 18 senior academics including five statisticians. They examined 138 research papers and 18 doctoral theses as well as some book chapters, review articles and Stapel’s own PhD thesis. In total they interviewed 93 people including co-authors, ex-colleagues and Stapel’s ex-graduate students. They concluded that there was strong evidence of fraud with 55 research papers and in some chapters of 10 PhD theses. They also found that 4 reviews or book chapters co-authored by Stapel were partly based upon material in his fraudulent papers. They also concluded that two chapters of Stapel’s own PhD thesis showed strong evidence of fraud. Stapel admitted that he began to fabricate data around 2004 but they found strong indications that he had committed research fraud from 1996.
For each item deemed to contain fraudulent material there is a summary of the evidence. The papers where strong evidence of fraud was found have been retracted. The chapters of PhD theses with false data are listed to correct the scientific record but none of the PhDs have been revoked. This humane decision was taken because these young people were unaware they were being fed false data by someone supposedly training them and who they should have been able to trust. There was nothing to suggest that these research students knowingly colluded in the production of falsified data and their theses also contained their own honestly generated data, although this was usually less interesting than the false data provided by Stapel. One unfortunate young woman was due to defend her thesis in October 2011 shortly after evidence of Stapel’s fraud became known. She voluntarily withdrew her thesis prior to this defence and as far as I can tell she was still working towards a PhD in September 2014.
The report discusses the types of evidence that indicated that a publication contained fraudulent data:
- Stapel admitted fabricating data in many of his later publications. There was also evidence of fraud in some early work where he had denied fraud or was unsure whether fabricated data was present or not
- The experiments could not have been conducted as described in the paper e.g. many of the studies conducted in schools, railways stations or amongst judges could not have been conducted in the manner described
- Some of the variables reported in papers were not included in the experimental instruments used
- When raw data was available it was not consistent with that published or showed strong statistical indications of fabrication
- Evidence provided by co-authors helped to confirm fraud
- Examination of the published data often showed strong indications of fabrication or manipulation.
Stapel’s data was often “too good to be true” e.g. fluctuations were lower than was reasonable, effects were larger than reasonable, there were no missing values or odd outlying results. There were some odd unexpected multivariate relationships i.e. certain associations were fabricated but this inadvertently led to unbelievable consequences when other associations were tested.
The committees concluded that whilst Stapel produced lots of interesting and headline-making data, he made no significant contribution to social psychology theory. For all his fame and media attention the citation rates for his papers were modest and less than those of some of his less well known colleagues. According to the SCOPUS database just six of his publications had received 50 or more citations. Of these, four have been retracted and one other was considered likely to contain fraudulent data. His most highly cited paper (133 times) is unretracted and no strong evidence of fraud was found although as it was published in 1996 such evidence might be difficult to find. Although he may have had limited impact on social psychology theory, he did damage the reputations of the institutions where he worked and of his graduate students and co-authors. He also undermined confidence in the field of social psychology.
None of his co-authors deliberately collaborated in data falsification but the committee did suggest that some had shown a lack of effective scientific criticism and accepted poor scientific practices. There were similar criticisms of those who had reviewed Stapel’s fraudulent publications. Warning signs that co-authors and reviewers failed to act upon were:
- Data and findings that were too good to be true
- Improbably large effects
- Lack of missing and deviant values
- Improbably low variability
- Impossible statistical values
Some editors and reviewers were said to have encouraged bad scientific practices:
- Removal of certain variables
- Leaving out conditions where no effect had been found even though it had originally been anticipated
- Insisting that retrospective pilot studies be conducted and then reported as if they had been conducted in advance.
Not infrequently reviewers were strongly in favour of telling an interesting, elegant and compelling story possibly at the expense of scientific diligence. Real experiments often generate messy data and do not tell a story that can be fully verified from the data produced. If reviewers and editors insist on clean data that tells a clear story then authors may be tempted to resort to unsound practices to clean up their data and make their conclusions clearer.