Paper presented at the 3d World Conference on Research Integrity,
Montreal, 5-8 May, 2013
Pieter J. D. Drenth
Em. Professor VU University, Amsterdam
Hon. President All European Academies (ALLEA)
Since the public disclosure of Diederik Stapel’s fraud, and especially since the publication of the report Flawed science: the fraudulent research practices of social psychologist Diederik Stapel by the three committees that were asked by the three universities that had employed Mr. Stapel (University of Amsterdam 1995-2000, University of Groningen 2000-2005, and Tilburg University 2005-2012) reactions in the professional and public press have been manifold. Surprise and disbelief were soon overpowered by anger and vexation. Particularly (often front page) articles in the public press were outcries of indignation. Many of them ventured to try to analyse the psychological motives and personality of the scientific swindler. The national and international scientific press has been a bit more reserved, but also clearly denounced the dishonest practices exposed in the report. Even the few attempts to criticize the report, mostly by social psychologists, who felt that the whole sub-discipline of social psychology was brought into discredit, condemned Mr. Stapel’s misbehaviour without exception.
Still these utterances of anger and frustration, how understandable they may be, are not very fruitful. It is much more meaningful to analyse in depth how this major case of scientific misconduct could have developed and continued unnoticed for more than 15 years.
Nature and extent of the fraud
First the facts; In the Stapel case we speak about falsification and fabrication, and not about plagiarism, the least harmful of the ‘big three’.
After a thorough investigation of all 137 official publications of Mr. Stapel the three Committees have concluded that in 25 of these articles data have been manipulated and that in 30 articles data have been fabricated (non-existing schools or never examined samples of respondents). For another 10 articles it was concluded that fraud was highly probable (based on statistical analysis). Almost half of Stapel’s published articles, therefore, were based on fraudulent data. Moreover, in 10 out of the 18 dissertations of doctoral students which he supervised ficticious data were used.
The Committees found no evidence that co-authors deliberately collaborated on data falsification, although in many cases a more critical attitude could have been expected. Also the PhD students concerned were not aware that the data were ficticious, neither was there, to the opinion of the Committees, an element of culpable ignorance. Therefore, these findings did not have formal repercussions for their previously awarded doctoral degree.
The damage which a deceit of this magnitude has caused is manifold:
Personal: co-authors, post-doctoral researchers and especially promoti see a great deal of their work vanish in smoke; a, sometimes great, deal of their publications have suddenly become worthless.
Science, and especially social psychology, has suffered by the public exposure of such a large scale fraud; public scorn and mockery stroke not only Mr. Stapel personally, but also social psychological research in general. It will probably take quite some time to make a full recovery.
Universities where Mr. Stapel has worked suffered reputational damage. It is praiseworthy that they nevertheless decided to sift the matter to the very bottom.
International journals had/have to retract 65 articles; scientific journals dislike such retractions since it defaces their respectability.
External or internal funding agencies have to regret that their funds allocated to Mr. Stapel have been misused, and were not granted to other applicants that would have deserved these better.
The pressing question now is how this serious infringement of research integrity could have taken place for such a long period, which factors have fostered this violation of norms and standards, what has been neglected in the immediate and wider scientific environment of the perpetrator, and how could we prevent the occurrence of such misconduct in the future. In short, which lessons can we learn from the Stapel case? In the following we will bring a number of these lessons and recommendations to the fore.
In cases where complaints or allegations of fraud are lodged investigating committees usually restrict themselves to an investigation of articles reported to be suspect or to a small random sample of publications of the author in question. Often this can indeed deliver proof of fraud, on the basis of which appropriate measures can be taken regarding the fraudulent researcher. But in many cases there is more at stake. Particularly if we deal with serious misbehaviour such as falsification or fabrication of data there is no reason to believe that all other publications of the same author are ‘clean’. In other words other fraudulent articles and research reports could very well proliferate within the literature. This could lead to waste of time and money or false progress in case other researchers would build on these findings or if meta-analyses would be based on these fabricated data. It could also have harmful or dangerous consequences, particularly in applied sciences; think of medicine where products or treatment based on fraudulent research could be life threatening.
It is to the credit of the three universities that they gave the three committees a free hand to analyse all of Stapel’s publications. All his publications and the dissertations of the PhD students that he had supervised were re-examined. Statisticians were hired to subject the data and designs in these publications to a rigorous (re)analysis. The results were summarised above. We have to admit that this was a labour-intensive and expensive operation, but the universities owed this to themselves and to the whole scientific community.
The Drenth-Committee, that had to investigate the possible fraud of Stapel during his Amsterdam period (1995-2000), was faced with two difficulties: In the first place all original research data were lost (understandably after more than 12 years), and, secondly, there was no confession. To use his own words: ‘ I presented the data in the best light possible, but there was, as far as I could recall, no fraud during that period’. Consequently, the Committee’s analysis had to focus exclusively on suspect irregularities and statistically highly implausible results. A number of such irregularity indicators have been used. But the most important one was handed in by Chris Klaassen, professor of Statistics and member of the Drenth- Committee. He developed a Bayesian formula, indicating the probability of the reported data in the observed and control groups being manipulated by comparing their composition and distribution with those of data that would have been obtained if the experimental and control groups had been selected according to scientific principles. The quotient is then an indication of ‘proof of manipulation’.
Using these indications all Stapel’s Amsterdam publications of were judged on the probability of fraud on the following scale: not applicable – none – negligible – slight – relatively strong – strong. The last two categories have been taken as sufficient ground for the judgement ‘evidence of fraud’. It has to be pointed out, however, that this ‘evidence’ implies a high degree of probability from a statistical perspective, but does not constitute legal proof of fraud. And it has to be seen if this argument would hold water at a legal court if a fraudster would challenge a decision of the University to nullify the granted degree on the basis of such ‘evidence’ of fraud.
Anyway I want to call attention to the wide applicability of the Baysian formula. For any committee of enquiry, board of a faculty or university, or editorial board of a journal, that has to deal with an allegation of fraud, but with no verification data available, nor a confession of the accused author, this formula could prove helpful in reaching a conclusion.
Reactions universities and research institutes
The exposure of the scientific fraud of Mr. Stapel sent a shock wave across the universities and research institutes in the Netherlands (and abroad). This resulted in the development or (re)enforcement of a number of measures and regulations to foster a climate of scientific integrity nationally and at institutional levels, including:
the acceptance and publication of codes of conduct, rules of good practise, and well defined procedures in cases of misconduct;
clear mechanisms for reporting misconduct, the appointment of integrity officers and integrity committees, protection of whistle-blowers, the use of electronic plagiarism detection systems;
the requirement to establish a system of data storage and archiving, while maintaining the accessibility of the data;
the requirement for researchers to be appointed as employees to sign a statement that they have taken cognizance of the code of conduct and will obey the current standards of responsible research.
insistence on retraction of tainted publications by the journals concerned, preferably with mentioning the reason for this retraction.
At the same time universities realised likely more than before that they have an obligation to foster research integrity within the academic community through teaching and mentoring of the students and junior staff, and by the senior staff setting an example in their own research. It was realised that responsible conduct should be a mandatory subject in courses on methodology and experimentation, and that students and staff members have a duty to take action in case they observe or seriously suspect violations of the rules of responsible research.
Universities realised that these requirements also apply to international collaborative research, which has increased sharply during the last few decades. It is important that in such research International agreements on standards of research integrity are reached, and agreed procedures to bring suspected deviations of these standards to the immediate attention of the responsible research leaders are established (as for instance recommended in the European Code of Conduct, ESF/ALLEA, 2011).
A discussion has also started on the desired alleviation of an emphasis on metrics (number of publications, citation index, number of citations in high impact journals, H-factor, etc.) in the incentive systems and promotion policy, and on the search for additional ways of quality measurement. It is not unlikely that the ‘publish or perish’ culture has contributed to the prevalence of misconduct. Dysfunctional craving for ‘high scores’ leads to behaviour that crosses the limits of what is admissible (see also IAP, 2012).
The Committees found it too simplistic to dismiss their findings as a merely individual or local aberration. Mr. Stapel worked at at least three different places, and in many different capacities. His work went through many hands: supervisors, doctoral examination committees, co-authors, colleagues, reviewers, editors…. And there was hardly ever criticism, suspicion or mistrust within the peer community.
Add to this that as an additional haul of the thorough analysis of all publications of Mr. Stapel many indications of sloppy science came to light; not a matter of straight forward fraud but all kinds of infringements of the rules for proper science, including increasing sample sizes until significant differences are found, deleting outcomes of some experiments with no confirming data, removal of unwelcome observations with post-hoc justifications, statistical defects, lapses in experimental testing, incomplete or incorrect reporting, etc. In short, many manifestations of what is called ‘verification bias’. The committees found it striking that none of the co-authors, reviewers or colleagues traced these statistical and experimental shortcomings.
It is clear that a sharpening of control mechanisms is needed. This could be pursued among others by:
increasing the alertness and sensitivity of supervisors, collaborators and reviewers for possible violations of research integrity and signs of verification bias;
proper storage and management of the data in order to make it possible to carry out replication studies without too much trouble;
more appreciation of and support for replication and verification studies;
emphasizing the full responsibility of co-authors for the whole publication, unless explicitly stated otherwise;
making sure that supervision of doctoral students also involves checking and assessing the quality of the original data; supervisors must be certain that the data has been collected, coded and processed honestly and responsibly;
stressing transparency, and the obligation to share and discuss findings with colleagues and, if possible, research group members;
never allowing data collection and interpretation to be an isolated, individual and uncontrolled activity;
urging scientific journals to deal with the problem of ‘publication bias’ (Fanelli, 2012), which leads to a too rose-coloured picture of the state of affairs in the research field, discourages high-risk projects and tempts researchers to lapse into verification bias, or, even worse, to falsify or fabricate their data.
In the European Code of Conduct mentioning is made of ‘minor misdemeanours’: some ‘adjustment’ of data or figures, omitting one or two unwelcome observations, summarising incorrectly, cutting a corner here and there….in short, much of what has been called ‘verification bias’ in the Stapel report. These misdemeanours may not lead to formal allegations and investigations as should be done with regard to gross misconduct, but may be just as damaging for two reasons: in the first place because of their probable frequency, and, secondly, because it may be a prelude to more serious malversations. This well illustrated by the development from bad to worse as depicted in the Stapel report and as confessed in the ‘coming out’ book of Mr. Stapel himself (Stapel, 2012). The European Code of Conduct also maintains that with respect to such minor misdemeanours we deal with unacceptable violations of the principles of scientific integrity; it is falsification in statu nascendi.
The positive side of the revelation and in depth analysis of the deplorable Stapel case is a sharpened focus in the Netherlands and abroad on research integrity at all levels. If this would contribute to fostering a prevailing culture of responsible research and robust management methods that ensure awareness and application of high standards, the examination committee’s major efforts have not been in vain.
Drenth, P.J.D. (2013), Institutional responses to violations of research integrity. Paper at COPE European seminar Publication ethics from student to professional. London, 22-03-2013.
Drenth, P.J.D., Levelt, J.M., & Noort, E. (2013,) Flawed science? a rejoinder. The Psychologist 26, 2, 80-81.
ESF/ALLEA (2011), The European Code of Conduct for Research Integrity, ESF (www.esf.org) ALLEA (www.allea.org),
Fanelli, D. (2012), Negative results are disappearing from most disciplines and countries. Scientometrics, 90, 891-904.
Gibson, S. (2013), Flawed science? The Psychologist,26,2,80.
InterAcademy Council (IAC)/InterAcademy Panel (IAP) (2012), Responsible Conduct in the Global Research Enterprise. http://www.interacademycouncil.net/www.interacademies.net, ISBN9789069846453.
Levelt, W.J.M., Drenth, P.J.D. & Noort, E. (2012), Falende Wetenschap: De frauduleuze onderzoekspraktijken van sociaal-psycholoog Diederik Stapel (Flawed Science: The fraudulent research practices of social psychologist Diederik Stapel, WordPress RSS). Report Tilburg University, University of Amsterdam, University of Groningen.
Stapel, D. (2012), Ontsporing (Derailment). Amsterdam: Prometheus.