The name Consiliences Institute is not an evocation - it is an operational commitment. It derives from William Whewell’s 1840 formulation of the consilience of inductions: a hypothesis earns credibility not when it accommodates the data that inspired it, but when it predicts, explains, or coheres with phenomena in domains it was never constructed to address. Whewell’s own example was Newton’s theory of gravity: confirmed not just by falling objects, but by its ability to explain Kepler’s planetary orbits, tidal patterns, and the precession of the equinoxes - none of which were used to build it. The convergence of independent lines of evidence on a single mechanism is what he called consilience, and he named it to sharpen the distinction between confirmation bias and genuine discovery.

We apply this as a formal quality criterion, not a rhetorical posture. Every signal in the Observatory is evaluated against eight validation dimensions before receiving a verdict. Findings that pass become CONFIRMED; those confirmed by two or more genuinely independent research traditions are upgraded to CONSILIENCE. The Observatory currently tracks 707 signals across all domains. Of these, 459 carry some form of confirmed verdict, 68 are formally classified as noise, and 59 carry the Observatory’s highest confirmation tier — having passed all eight validation dimensions and demonstrated pattern confirmation across multiple independent datasets or research groups. We publish the kill list because a research programme that never kills anything is not doing science.


The research uses AI to execute the cross-domain parallel analysis. We are direct about why, and equally direct about what this does and does not mean.

Rigorous consilience testing at scale requires holding independent data streams from climatology, agronomy, epidemiology, economic history, and long-wave cycles simultaneously - not as metaphors for each other, but as data. What AI adds is the ability to run this in parallel across domains without the coordination overhead of specialist teams. That is a genuine operational advantage. It is not a philosophical transformation of the method.

The research agenda itself emerges from autonomous AI agent activity. The Observatory agents identify candidate signals, run validation pipelines, and surface findings without human pre-specification of what to look for. A human operator maintains the publication standards, applies the quality gate at the editorial stage, and decides what is released and what is killed. The AI is not executing a narrow brief; it is exploring a problem space within a framework designed to minimise its failure modes.

The risks this introduces are real and we document them elsewhere. The short version: AI systems will fabricate plausible-sounding statistics and cannot be trusted to originate original quantitative analysis. Our structural guard: deterministic statistical code produces the validation files; the AI writes prose summarising results it did not generate; a second-pass validator checks the handoff against the raw data; human review is required before publication. The critical chain is data in, code validates, AI summarises, human approves. We have not closed the loop in the dangerous direction.


We do not report trends. We do not report correlations. We report patterns that have survived falsification attempts across independent domains - surrogate testing, era-splitting, devil’s advocate review, mechanism plausibility checks - and that converge on a single explanatory mechanism from research traditions that did not know they were testing the same hypothesis.

If that standard produces few results, that is the point. The 8.3% top-tier pass rate — 59 of 707 candidate signals reaching the highest confirmation tier — is not a failure of ambition. It is a calibration. Whewell’s own criterion was stringent precisely because easy confirmation is worthless. The signals that survive are worth reporting because most do not.

The quality criterion does not distinguish between mainstream and heterodox sources. A finding that passes the eight-dimension framework is worth reporting regardless of whether the underlying question is fashionable or the source tradition is academically respectable. A finding that fails is not worth reporting regardless of how conventional its pedigree. The standard is the filter — not the field.