
To rebuild public trust, scholars at large have recommended research groups and journals adopt open-data policies and preregister studies.
| Photo Credit: Dan Dimmock/Unsplash
As if political broadsides weren’t enough to undermine public confidence in science, a deep-seated issue became apparent from within science itself in the 2010s: the replication crisis. Researchers began to realise many published papers, especially in psychology and medicine, contained results that couldn’t be replicated. It was a surfeit of bad science that also undermined the work of others that was erected on faulty results.
But according to a new paper published in Advances in Methods and Practices in Psychological Science, psychology at least may have learnt its lesson. Its author, Duke University postdoc Paul Bogdan, parsed 2.4 lakh papers published between 2004 and 2024 to check whether the field had become more robust since the crisis unfolded. Bogdan focused on fragile p-values: statistical results that barely clear the usual cut-off to be considered significant (0.01 to 0.05). The larger the share of such values, the shakier the evidence.
According to Bogdan’s analysis, the share of fragile significant results had dropped from 32% at the start of the crisis to 26%. He also found that the downward slide appeared in every major sub-discipline, suggesting a broad cultural shift toward sturdier work.

Sample size was a key driver. The median size climbed rapidly from 2015 while the reported effect sizes inched downward. This was likely because small studies inflate the effects of their findings whereas bigger ones give truer but smaller estimates. Together, these trends pointed to rising statistical power across the literature.
Journals with higher impact scores and papers with more citations also tended to feature fewer fragile p-values, reversing a pre-crisis pattern in which splashy outlets often published weaker but more sensational findings.
Bogdan revealed one curiosity: scientists at top-ranked universities still published slightly shakier numbers. He used text-mining to explain the mismatch. Words tied to biology-heavy, clinically demanding studies were associated with fragile results as well as high-ranking institutions. This is because such projects are expensive, labour-intensive, and often ethically constrained, making large samples difficult to gather.
In sum, psychology appears to have tightened its standards even as some better-funded corners of the field remain under-powered because they’re tackling tough questions.
To rebuild public trust, scholars at large have recommended that research groups and journals adopt open-data policies and preregister studies (so even negative results are reported), and that governments better fund resource-heavy studies.
Published – June 15, 2025 06:00 am IST