TY - JOUR
T1 - Can We Mathematically Spot the Possible Manipulation of Results in Research Manuscripts Using Benford’s Law?
AU - Lazebnik, Teddy
AU - Gorlitsky, Dan
N1 - Publisher Copyright:
© 2023 by the authors.
PY - 2023/11
Y1 - 2023/11
N2 - The reproducibility of academic research has long been a persistent issue, contradicting one of the fundamental principles of science. Recently, there has been an increasing number of false claims found in academic manuscripts, casting doubt on the validity of reported results. In this paper, we utilize an adapted version of Benford’s law, a statistical phenomenon that describes the distribution of leading digits in naturally occurring datasets, to identify the potential manipulation of results in research manuscripts, solely using the aggregated data presented in those manuscripts rather than the commonly unavailable raw datasets. Our methodology applies the principles of Benford’s law to commonly employed analyses in academic manuscripts, thus reducing the need for the raw data itself. To validate our approach, we employed 100 open-source datasets and successfully predicted (Formula presented.) of them accurately using our rules. Moreover, we tested the proposed method on known retracted manuscripts, showing that around half (48.6%) can be detected using the proposed method. Additionally, we analyzed 100 manuscripts published in the last two years across ten prominent economic journals, with 10 manuscripts randomly sampled from each journal. Our analysis predicted a (Formula presented.) occurrence of results manipulation with a (Formula presented.) confidence level. Our findings show that Benford’s law adapted for aggregated data, can be an initial tool for identifying data manipulation; however, it is not a silver bullet, requiring further investigation for each flagged manuscript due to the relatively low prediction accuracy.
AB - The reproducibility of academic research has long been a persistent issue, contradicting one of the fundamental principles of science. Recently, there has been an increasing number of false claims found in academic manuscripts, casting doubt on the validity of reported results. In this paper, we utilize an adapted version of Benford’s law, a statistical phenomenon that describes the distribution of leading digits in naturally occurring datasets, to identify the potential manipulation of results in research manuscripts, solely using the aggregated data presented in those manuscripts rather than the commonly unavailable raw datasets. Our methodology applies the principles of Benford’s law to commonly employed analyses in academic manuscripts, thus reducing the need for the raw data itself. To validate our approach, we employed 100 open-source datasets and successfully predicted (Formula presented.) of them accurately using our rules. Moreover, we tested the proposed method on known retracted manuscripts, showing that around half (48.6%) can be detected using the proposed method. Additionally, we analyzed 100 manuscripts published in the last two years across ten prominent economic journals, with 10 manuscripts randomly sampled from each journal. Our analysis predicted a (Formula presented.) occurrence of results manipulation with a (Formula presented.) confidence level. Our findings show that Benford’s law adapted for aggregated data, can be an initial tool for identifying data manipulation; however, it is not a silver bullet, requiring further investigation for each flagged manuscript due to the relatively low prediction accuracy.
KW - anomaly detection
KW - first digit law
KW - results reproduction
KW - statistical analysis
UR - http://www.scopus.com/inward/record.url?scp=85178168147&partnerID=8YFLogxK
U2 - 10.3390/data8110165
DO - 10.3390/data8110165
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85178168147
SN - 2306-5729
VL - 8
JO - Data
JF - Data
IS - 11
M1 - 165
ER -