What Peer Review Actually Does and Does Not Guarantee

Peer review occupies a position in public discourse somewhere between quality stamp and sacred rite. Understanding what it genuinely provides — and what it was never designed to do — matters both for evaluating scientific claims and for defending science honestly against bad-faith critics.

What peer review was designed to do

The modern peer review system emerged in its recognizable form in the mid-twentieth century, though editorial consultation is older. Its core function is gatekeeping for plausibility and presentation: reviewers, typically two to four experts in the relevant field, assess whether a submitted paper's methodology is sound, its claims are supported by its data, its citations are accurate, and its conclusions do not outrun its evidence.

That is a meaningful but limited task. Reviewers are not running independent experiments. They are reading a manuscript, and they can only evaluate what appears on the page. They cannot detect fabricated data that has been skillfully disguised. They cannot know what the authors left out. They are working unpaid, in their spare time, under no formal obligation to be thorough, and they are often chosen from a small pool of specialists who may have professional relationships — competitive or collegial — with the authors.

The system was built to filter out obvious errors and enforce disciplinary norms. It was not built to guarantee truth.

What it does not catch

The replication crisis, concentrated in psychology and medicine but visible across many fields, made plain that peer-reviewed publication is compatible with results that do not hold up. Studies passed review because their statistical methods were orthodox at the time, even when those methods permitted substantial false-positive rates. P-hacking — running multiple analyses and reporting only the significant one — leaves no visible trace in a manuscript. Small sample sizes, underpowered designs, and outcome switching are easy to miss when a reviewer has no access to the original data or pre-registered protocol.

Fraud is rarer but real. The cases that have come to light — in nutrition science, social psychology, anaesthesiology — were not caught by peer review. They were caught later, by failed replications, whistleblowers, or data forensics applied after publication. Peer review has almost no tools for detecting deliberate manipulation of datasets.

There is also a subtler problem: reviewer expertise is imperfect by design. Journals try to match reviewers to papers, but at the frontier of any discipline the pool of genuine experts is small. Reviewers may be competent in the general field but unfamiliar with a specific technique. Highly mathematical sections may be skimmed. Novel methods may be accepted on the authors' authority.

Why it still matters

None of this is an argument for abandoning peer review, and critics who cite its failures to dismiss science wholesale are committing a different error — holding science to a standard of infallibility that no human epistemic process could meet, and ignoring the comparison class. Pre-print servers, personal blogs, press releases, and political testimony have no filter at all. Peer review, imperfect as it is, systematically excludes a large volume of work that is simply too poorly conducted or reported to meet even minimal disciplinary standards.

Beyond filtering, peer review performs a function that is easy to undervalue: it forces authors to anticipate objections. The knowledge that a manuscript will be read critically by expert strangers changes how it is written. Methods sections become more explicit. Claims become more hedged where hedging is warranted. This effect is real even when reviewers miss things.

Open peer review, registered reports, and post-publication review are genuine improvements being adopted by a growing number of journals. Registered reports — in which a journal commits to publish a study based on its design before results are known — directly address the problem of outcome-dependent publication. These reforms treat the system's weaknesses honestly rather than pretending they do not exist.

What this means for evaluating claims

A published, peer-reviewed paper is not a settled fact. It is an entry into the scientific record that has cleared a meaningful but fallible threshold. The right response to this is not skepticism toward science as such, but calibrated confidence that tracks the evidence behind any specific claim: the size and independence of the effect, whether it has been replicated, whether the methods have been pre-registered, whether the finding coheres with related work.

Single studies, however prestigious the journal, are hypotheses with some initial support. Systematic reviews and meta-analyses aggregate evidence. Replicated findings across independent labs are genuinely robust. Understanding these gradations is not a concession to anti-science sentiment; it is what taking science seriously actually requires.

The strongest defense of peer review is also the most honest one: it is a human institution doing a hard job imperfectly, and the scientific community has been more willing than most human institutions to examine its own failures and redesign its processes in response.