The Autonomous Epistemic Institution: Trust, Provenance, and the Consilience Test in Agentic Science
As AI systems become autonomous producers of knowledge claims, the object that must be evaluated is no longer the model but the institution it forms. This essay reads the recent literature on epistemic AI agents against the older social epistemology of institutions, and offers one working system as a test case.
The naming of a new object
For two centuries the natural sciences have been produced by institutions. Universities, academies, laboratories, learned societies: the individual investigator has always mattered, but the knowledge has been the property of the institution – vetted by its procedures, archived in its records, corrected by its successors. A finding earned trust not because a clever person asserted it, but because an apparatus of review stood behind it.
A new kind of investigator has now appeared, and it is worth being careful about what to call it. The prevailing term of art is “AI for Science,” and it is a misnomer. It names a tool: a faster instrument in the hand of a human researcher. What has actually arrived is better described by what a recent survey of the field names agentic science – the stage at which a system progresses from partial assistance to full scientific agency, carrying out hypothesis generation, experimental design, execution, analysis, and iterative refinement on its own authority (Wei et al. 2025).
The distinction is not a quarrel about words. A tool does what it is told. An agent decides what to ask next.
When a system crosses that line, the object of evaluation changes with it. One no longer assesses an instrument, by the simple question of whether it returns the right answer when the input is correct. One assesses an institution – by the harder question of whether its procedures are the kind that produce right answers at all. An autonomous research system that generates, tests, and publishes knowledge claims is an epistemic institution, however small, and the case to be made here is that it should be judged as institutions have long been judged: not by the successes it displays, but by the reliability of the procedures that produced them.
That this is not a forecast can be shown concretely. One such system, the AI Scientist-v2, already executes the full research lifecycle end to end – formulating hypotheses, designing and running experiments, analyzing results, and authoring the manuscript – by means of a progressive tree search over candidate experiments managed by a dedicated experiment-manager agent (Yamada et al. 2025). A manuscript it produced was submitted to a workshop and cleared that venue’s acceptance threshold in peer review. It is worth being exact about what that demonstrates and what it does not. Peer review of this kind certifies that a paper met a venue’s bar for novelty and presentation; it does not independently reproduce the experiments or verify the truth of the results. The achievement is real and it is narrow: an autonomous agent produced an artifact that a human institution’s gate admitted. Whether the knowledge inside the artifact was sound is a separate question – and it is precisely the question the agent’s own procedures, not the reviewing venue’s, would have to answer.
Two bodies of work bear on this. One is old and one is new, and a working system can be read against both.
The older vocabulary: institutions as epistemic agents
The older body of work is the social epistemology of institutions, and its central move is to treat the institution itself – not merely the people inside it – as the bearer of epistemic properties.
This is less obvious than it sounds. It is easy to say that a careless laboratory employs careless people. It is a stronger and more useful claim that the laboratory can be careless as a body: that its procedures for review, its rules for what counts as a result, its habits of disclosure, can be epistemically careful or careless independent of the diligence of any individual within it. A strand of recent social epistemology argues for a claim of just this kind: that epistemic virtues and vices can be attributed to institutions as collective agents, not only to their members – that an institution can be, as a body, candid or self-serving about the limits of what it knows, open or closed to correction (see, for instance, the work collected as “Epistemic Virtues of Institutions”, 2020).
If an institution is the kind of thing that can hold a virtue, then it is the kind of thing that can be designed well or badly. This is the contribution of what Alvin Goldman called systems-oriented social epistemology. A system is an entity with working components and multiple goals; systems-oriented social epistemology asks how best to design systems whose goals include epistemic goods – the production or distribution of knowledge and true belief (Goldman, in Stanford Encyclopedia of Philosophy). The approach is prescriptive rather than merely descriptive. It does not only ask how knowledge happens to get made; it asks which arrangement of components – which aggregation rules, which review procedures, which channels for sharing information – would make it get made more reliably.
From this vantage, an institution’s credibility does not rest on its outcomes. A lucky guess is not evidence of competence, and a record of past successes is not, by itself, a reason to trust the next claim. Credibility rests on whether the decision procedures are the kind that reliably track truth, and on whether the reasons the institution offers for its conclusions can themselves be examined – a line of argument developed, among other places, in work on the social epistemology of public institutions. An institution that cannot show its procedure has not earned trust; it has only asked for it.
This vocabulary was developed for human institutions – for courts, voting systems, peer review, expert panels. But nothing in it depends on the components being human. It depends only on the components being arrangeable, and on the goal being an epistemic good. An autonomous research system is precisely such an arrangement. The older vocabulary fits the new object without alteration. What it does not yet supply is an account of the particular failure modes a non-human institution introduces. For that, one needs the newer work.
The newer vocabulary: architecting trust
The newer body of work takes the AI agent as its unit and asks what it would take to trust one. Its most complete statement defines an epistemic AI agent as an entity that autonomously pursues epistemic goals and actively shapes the shared knowledge environment – curating the information that reaches people, and generating advice both ordinary and specialized (Marchal et al. 2026). The framework it proposes rests on three pillars: cultivating the trustworthiness of such agents, aligning them with human epistemic goals, and reinforcing the surrounding socio-epistemic infrastructure.
The first pillar is the one with teeth, because it makes trustworthiness verifiable rather than vague. A trustworthy epistemic agent, on this account, must demonstrate three properties: epistemic competence, robust falsifiability, and epistemically virtuous behavior. Competence is not the bare ability to find a pattern; it is the ability to recognize the limits of the patterns it finds. Falsifiability requires that the agent’s claims be exposed to tests that could defeat them. Virtuous behavior is the disposition to disclose, to qualify, to correct. These are supported, in the framework, by technical provenance systems – chains of custody linking each claim to its evidence – without which an agent’s output is indistinguishable from confident invention.
The framework is also candid about what goes wrong. Poorly aligned epistemic agents threaten two specific harms: cognitive deskilling, as people defer judgment they ought to retain, and epistemic drift, as the shared body of belief moves under the influence of agents calibrated to plausibility rather than truth.
Set the two vocabularies side by side and they converge, which is itself a small instance of the point this essay will press. The older work says: judge the institution by its procedures. The newer work says: a trustworthy epistemic agent must be competent, falsifiable, and virtuous, with provenance to back it. These are the same demand expressed twice – once for the collective and once for the agent – and an autonomous research institution must satisfy both at once, because it is both at once.
That demand can be stated abstractly only so far. It is more instructive to ask what a system looks like that has actually tried to build the properties in.
The consilience test
Before turning to such a system, one distinction must be made sharp, because everything downstream depends on it: the distinction between convergence and consilience.
Convergence is agreement. If several independent analysts examine the same evidence and reach the same conclusion, they have converged. Agreement of this kind is a weak warrant. It is consistent with all of them sharing the same blind spot, the same training data, the same prior. Convergence tells you that a conclusion is stable among observers; it does not tell you that the conclusion is true.
Consilience is a stronger and rarer thing. It occurs when a hypothesis formed to account for one class of facts turns out to account for a class of facts it was never constructed to address – when an induction drawn from one domain leaps the boundary and explains a domain that lay outside its original reach. A hypothesis that only explains the observations that produced it is a curve fitted to those observations: a description wearing the costume of a discovery. A hypothesis that explains observations it was not built from has done something a curve cannot do. This is the test that separates the two, and it is the test by which any institution producing knowledge claims should be willing to be judged.
The relevance to autonomous research is direct. A system that generates hypotheses and then evaluates them against the very data that suggested them will report a high rate of success and will have discovered nothing. The danger is acute precisely because such a system can generate and test at a volume no human institution can match: it can produce an enormous quantity of internally coherent, externally empty results. The consilience test is the discipline that prevents this – and an institution that means to be trusted must build the test into its procedure, not leave it to the goodwill of its agents.
A working instance
Consider, then, one system built deliberately against these requirements. It is offered here not as a finished proof but as a worked example – a case in which the abstract properties have been given concrete mechanisms, so that one can see what they cost and where they strain.
One honest scoping is owed before the mechanisms are described. Two different things can be meant by calling a research system autonomous, and they should not be run together. The first is scientific agency: the system poses its own questions and pursues them on its own authority. The second is institutional trustworthiness: the system’s procedures are inspectable, and they gate what it is permitted to claim. This system is offered as an instance of the second, not the first. Its scientific agency is, today, partial and operator-mediated – the research agenda is substantially set by a human, and much of the path from a question to a published verdict is orchestrated prompting rather than an agent deciding, unprompted, what to investigate next. What is genuinely autonomous is the operations layer: agents that run on their own schedules, observe incoming events, and execute plan cycles without supervision.
And what is genuinely rigorous is, for the most part, not “agentic” at all. The validators, the batteries, the audit gate are deterministic code; they exercise no judgment, they execute a fixed procedure. That is deliberate, and a strength rather than an embarrassment – a procedure that does not depend on an agent behaving well is reproducible in a way that judgment is not. The claim made for this system is therefore narrow: it is not a specimen of autonomous science but a specimen of the trust architecture – and the worth of describing it lies precisely there, because that architecture is separable from the degree of autonomy, and can be built before the autonomy arrives.
The system is a multi-agent research operation of around forty persistent agents, built on a single workflow graph, amounting to something on the order of a quarter-million lines of code, and operated by one person alongside other employment. Its public face is an observatory that publishes a corpus of roughly fifteen hundred research “signals” – candidate findings – each carrying an explicit verdict, including the verdicts that record failure. The first thing to notice is structural: the institution’s output surface is, by design, not a record of wins. This is the operational form of the institutional virtue of candor. An institution that displays only its confirmations has made it impossible for an outsider to assess its care, and has thereby declined the central obligation of an epistemic institution. A ledger of failures is not, in itself, the virtue it appears to be: a kill rate near zero would mark tests too lenient ever to fail one, and so the procedural signal is the rate itself – substantial, and steady as the corpus grows – not the bare existence of the ledger that records it.
Falsifiability is built in below that surface. Every candidate signal is run through a validator that emits four independent batteries of test – associative, confound, stability, and mechanism – each returning one of four verdicts: pass, fail, weak, or skip. A signal earns a confirmation-tier verdict only when the batteries support it. The decisive feature is the weight placed on out-of-sample replication: a finding that performs well on the data that generated it but fails a held-out test does not, by the methodology’s design, earn confirmation, however high its other scores. This is the consilience test rendered as a procedure. It is not left to an agent’s judgment whether a result has reached beyond its origin; the architecture withholds the word “confirmed” until it has.
Provenance is enforced at the point of writing. The institution’s publication generator is constrained to compose only from a fixed grounding document of explicit source blocks, or from signal validation files on disk. At the moment of composition it does not browse and it does not invent: it can assert only what its grounding sources already contain, and a claim with no supporting source must be written as an explicit gap rather than smoothed over. Assembling that grounding document is a separate, earlier step – one that does draw on outside research, the open web included – but its sources are recorded and carried forward, so the writing stage inherits a fixed and inspectable evidence base. The guarantee is therefore narrow and exact: not that the institution’s knowledge never touches the web, but that its prose can never outrun the evidence placed in front of it. Each claim in a finished paper traces to a specific source, and every high-severity vulnerability a devil’s-advocate review has attached to a cited signal is propagated, automatically, into that paper’s limitations section. The machinery includes the weaknesses of the evidence whether or not the writer would prefer it. This is a narrow but real instance of the provenance infrastructure the trust framework names as a precondition.
Competence is calibrated rather than assumed. The agents operate under a graduated trust gate of four named tiers – supervised, semi-automatic, monitored, and autonomous. An agent moves up only on accumulated, bundled evidence of reliable performance, and the highest tier – along with any action that issues a directive – remains gated by a human being as a matter of standing policy. Trust here is a quantity that is spent down and earned back, not a property granted at instantiation. This is the framework’s first pillar built as plumbing: autonomy indexed to demonstrated competence.
One feature deserves emphasis because it is easily mistaken for a mere economy. The system’s only required model is one that runs locally, on the operator’s own hardware; cloud providers, caches, and auxiliary databases are all optional and degrade gracefully when absent. The consequence is that the operation runs at a small operating cost. The epistemic point is not the cost but what the cost demonstrates: that the rigor described above is a function of design and not of budget. The batteries, the verdict taxonomy, the provenance constraint, the trust gate – none of these required a data center. They required decisions. An institution that cannot afford scale is thereby forced to compete on the quality of its procedure, which is, for an epistemic institution, the right thing to be forced to compete on.
The verdict taxonomy itself carries part of the load. It separates confirmation from a graded band of honest non-confirmations – a finding may be recorded as null, as noise, as suggestive, as emerging, as contested – and it reserves distinct categories for a claim attested to the literature but never put through the batteries, and for a foundational fact to which statistical batteries do not apply. Killed signals are kept on a public ledger, and any draft that cites one without disclosing its status fails the audit gate. The tiers are not interchangeable; a near miss is not permitted to become, by quiet erosion, a confirmation. This is what it looks like for an institution to be precise about the limits of what it knows: it builds a vocabulary fine enough to name those limits, and a gate that refuses to let the vocabulary be blurred.
What could be wrong
An essay that presented this system as a solved problem would have violated the standard it is recommending. Several things are genuinely open.
The most serious is the consilience claim itself. A corpus of fifteen hundred signals is large for one operator and small for a claim of broad consilience. If those signals cluster in a narrow range of domains, the cross-domain leap that consilience requires is not really being tested; the batteries may be confirming consistency within a single field and calling it convergence across many. In fact the present corpus is not confined to one field – it ranges across agricultural and climatic cycles, commodity-price spectra, geomagnetic and cardiovascular coupling, and archaeoastronomical alignment – though breadth of subject is not yet breadth of the inductive leap, and only the leap is the test. The honest position is that the institution’s procedure is sound and the corpus has not yet been forced across enough domain boundaries to exercise that procedure to its limit. The verdicts stand as procedurally earned; the larger claim that the method generalizes is itself still a signal awaiting its held-out battery.
The second weakness is the single operator, and it is larger than a question of staffing. A human-gated top tier is a virtue when the human is attentive and a bottleneck when the human is tired – but, as the scoping above conceded, the operator is not only a gate. The operator substantially sets the research agenda: which questions enter the pipeline at all is today a human decision more than an autonomous one. So the operator is a single point of failure twice over – once as the gate that can tire, and once as the source of an agenda that can narrow to one person’s interests and blind spots. The framework’s warning about cognitive deskilling cuts inward on both counts: an operator who comes to trust the gate may stop watching it, and an institution whose curiosity is one person’s curiosity inherits the limits of that person.
The third is subtler. The validators are themselves instruments, and instruments have a sensitivity. A battery tuned to avoid false positives will, at the same setting, miss weak but real effects, recording them as noise. The band of honest non-confirmations is honest about the claims it contains; it cannot be honest about the claims it never detected. Some true signals are lost in the noise floor of the institution’s own tests, and the institution cannot, from inside, say how many.
None of these is a refutation. Each is the kind of limitation that the procedure described above is built to surface rather than hide, and stating them plainly is not an aside to the argument but a demonstration of it.
What would settle it
The recurring temptation of any institution that produces knowledge is to mistake expected confirmation for proof. A hypothesis that passes the tests it was designed to pass has shown only that it was designed competently. That is the weakest grade of evidence, and a system that generates and tests at machine volume is exposed to it more, not less, than a human one.
The genuine test always lies in the future, and in facts not yet contemplated. It is whether the verdicts hold when the corpus is forced into domains far from its origin, and whether the procedure that looks rigorous at fifteen hundred signals remains rigorous at fifteen thousand. The argument of this essay is only that the question is now the right one to ask, and that it is answerable: “trust the institution, not the output” is a specification and not a slogan, and a specification can be built. Whether this particular instance has built it well, the institution has – correctly – left open. What an autonomous epistemic institution can finally promise is in any case narrower than correctness, and more durable for being narrow. It does not earn trust by being right. It earns trust by being legibly wrong when it is wrong – and that, not a record of success, is the commitment its architecture is built to keep.
References
- Marchal, N., Chan, S., Franklin, M., Revel, M., Keeling, G., Fischli, R., Chandra, B., and Gabriel, I. (2026). Architecting Trust in Artificial Epistemic Agents. Google DeepMind / Google Research. arXiv:2603.02960. https://arxiv.org/abs/2603.02960
- Goldman, A. Systems-oriented social epistemology, as surveyed in Social Epistemology, Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/entries/epistemology-social/
- Epistemic Virtues of Institutions (2020), in Institutions in Action. Springer. https://link.springer.com/chapter/10.1007/978-3-030-32618-0_3
- The Social Epistemology of Public Institutions. Palgrave Macmillan / Springer. https://link.springer.com/chapter/10.1057/9780230316645_9
- Wei, J., Yang, Y., Zhang, X., Chen, Y., et al. (2025). From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery. arXiv:2508.14111. https://arxiv.org/abs/2508.14111
- Yamada, Y., Lange, R. T., Lu, C., Hu, S., Lu, C., Foerster, J., Clune, J., and Ha, D. (2025). The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search. arXiv:2504.08066. https://arxiv.org/abs/2504.08066