Methodology Taxonomy
The canonical strings — the eight statistical dimensions, the four batteries, the verdict tiers, the Whewell rubric, and the devil’s advocate process — used across every Consiliences signal, paper, bulletin, and dashboard. Anyone writing content that references a dimension, battery, or verdict, whether a human author or an agent, must use the exact string from this taxonomy. Validator scripts and audit gates grep for non-canonical variants and refuse them before they reach production.
This page is the public surface of the repository’s source-of-truth file. Drift in canonical strings across dossiers, validators, and audit gates is one of the easier ways to lose epistemic discipline; holding a single source of truth is the structural countermeasure.
The integrated validation framework — the statistical battery, the Devil’s Advocate process, the nine-criterion Whewell rubric, and the audit gate — is collectively the Whewell Gate. The sections below define each component; “Whewell Gate” is the shorthand for all four working together, while the individual names (Whewell rubric, audit gate, and so on) continue to name the parts.
Section 1 — The eight dimensions
The Layer 1 statistical battery is a deterministic set of eight statistical tests applied by validator code against each candidate signal’s data. Each dimension targets a different failure mode. The order is canonical.
| # | Canonical name | What it targets |
|---|---|---|
| 1 | Bonferroni-corrected significance | Multiple-comparisons false positives |
| 2 | Effect size | Statistically-significant-but-trivial findings |
| 3 | Out-of-sample hold-out replication | In-sample overfitting |
| 4 | Mechanism plausibility (domain literature review) | Spurious correlation without causal substrate |
| 5 | Confound isolation through partial regression | Confounded associations |
| 6 | Directional falsification across regimes | Sign-flipping signals (regime instability) |
| 7 | Era and epoch stability | Era-specific artifacts |
| 8 | Phase-randomised surrogate specificity | Indistinguishability from structured noise |
These eight dimensions are the deterministic layer — reproducible from the dossier. Layer 2 (devil’s advocate) and Layer 3 (Whewell rubric) are scored by structured review and tracked separately.
Section 2 — The four batteries
The eight dimensions are organised into four named batteries. These battery names appear in each signal’s dossier validator_results slots.
| Roman | Canonical name | Slot prefix |
|---|---|---|
| I | Associative | battery_1_* |
| II | Confound | battery_2_* |
| III | Stability | battery_3_* |
| IV | Mechanism | battery_4_* |
Every battery slot in a dossier must record a verdict of PASS, FAIL, WEAK, or SKIP.
Section 3 — Verdicts
The verdict surface comes in two shapes: the user-facing six-tier set used on public surfaces and in reader-oriented copy, and the full dossier set — the larger canonical vocabulary that lives in each signal’s dossier. Both are valid. The user-facing set is the simplification; the dossier set is the operational truth. A mapping table connects them.
3a — User-facing six-tier set
Used on public surfaces, in essays, and in any reader-oriented copy. Lower-cased by design.
| # | Canonical string | Meaning |
|---|---|---|
| 1 | consilience | Strongest verdict — clears all three layers and two or more independent research traditions converge on the mechanism |
| 2 | confirmed | Clears the statistical battery and at least six of nine Whewell criteria; primary citation tier |
| 3 | confirmed w/ caveats | Core finding holds with documented binding limitations |
| 4 | statistical | Passes the statistical battery; full three-layer review pending or under revision |
| 5 | suspended | Insufficient data for resolution; under continued monitoring |
| 6 | killed | Failed primary test or fatal confound identified; permanently retired |
3b — Full dossier verdict set
Used in dossier verdict and verdict_canonical fields, the full dossier set is the larger canonical vocabulary — twenty-two strings — that is the operational truth behind the six-tier presentation set above. It splits the confirmed family into routing, strong, structural, statistical, weak, and caveated tiers; the suspended family into emerging, suggestive, contested, unscorable, prediction, and proof-of-concept; and the killed family into null, noise, falsified, inverted, and retired. Two strings sit outside the six-tier mapping: REDUNDANT (a bookkeeping tier — cite the canonical sister signal) and DEFINITIONAL (a foundational fact where the statistical batteries do not apply by design — see Definitional Anchors).
The full twenty-two-row table, with the user-facing tier and notes for each dossier verdict, lives on its own citable page: Verdict Tiers.
3c — Verdict tiers are not interchangeable
CONFIRMED_WEAK, CONFIRMED_WITH_CAVEATS, SUSPENDED, NULL, EMERGING, CONSILIENCE, SUGGESTIVE, CONTESTED, and NOISE are not CONFIRMED. A paper that treats them as equivalent fails the audit gate. When citing a signal in a public artifact, the dossier verdict is the truth; the user-facing tier is a presentation simplification.
3d — Failure-mode presentation strings
The Killed Signals page describes each retired signal with a short, human-readable failure-mode tag — a label that says why the signal failed, in plainer language than a canonical verdict. These tags are presentation strings only: they are not verdicts, and validator code never reads them. The table below maps each failure-mode tag to its canonical verdict so the readable label and the operational truth stay tied together.
| Failure-mode tag | Canonical verdict | What it means |
|---|---|---|
| PURE NOISE | NOISE | Indistinguishable from phase-randomised structured noise |
| DIRECTIONAL INVERSION | INVERTED | Sign-flipped from the stated hypothesis; killed for regime instability |
| CONFOUNDED | KILLED | A fatal confound explains the association |
| ADAPTIVE RESPONSE | KILLED | The apparent signal is a system adapting, not the hypothesised mechanism |
| ERA-SPECIFIC | KILLED | Holds only inside one era or epoch; fails era-stability |
| UNDERPOWERED | KILLED or WEAK | Sample too small to resolve; killed if the test failed, WEAK battery slot if inconclusive |
| PARTIAL | CONFIRMED_STATISTICAL | Battery passed but full three-layer review is pending or incomplete |
| WEAK 1/4 PASS | (battery-pass rate) | Not a verdict — a count of how many of the four batteries passed; reported alongside the dossier verdict |
Section 4 — The devil’s advocate process
Layer 2 of validation is a structured counter-argument pass required for every signal that reaches a tentative CONFIRMED verdict. The reviewer must explicitly state the strongest case for why the signal might be a false positive: where the data is thin, where the mechanism requires an untested assumption, and what specific observation would refute the finding. The protocol is documented for every signal — whether ultimately published or not — and it interrogates both the statistical verdict of Layer 1 and the epistemic judgment of Layer 3. If the devil’s advocate pass kills a finding, it is reported as killed; findings are never inflated past what survives this layer.
Section 5 — The Whewell rubric
Layer 3 of validation is a nine-criterion epistemic rubric drawn from the historical philosophy of science and associated with William Whewell’s doctrine of consilience. It is scored by structured review of the full signal dossier, not by the Layer 1 batteries.
| # | Canonical key | Meaning |
|---|---|---|
| 1 | prediction | Did the hypothesis predict an observation before measurement? |
| 2 | consilience | Do independent data streams converge on the same mechanism? |
| 3 | mechanism_plausible | Is the proposed mechanism biologically or physically plausible? |
| 4 | mechanism_cited | Is the mechanism citation present in peer-reviewed literature? |
| 5 | falsifiability | Is there a specific observation that would refute the hypothesis? |
| 6 | specificity | Does the signal survive negative-control specificity tests? |
| 7 | reproducibility | Has the result been reproduced by independent code or personnel? |
| 8 | accuracy | Are the quantitative estimates accurate against external data? |
| 9 | generalizability | Does the result hold beyond the discovery sample? |
A Whewell score of six or more out of nine is the publication threshold for the CONFIRMED tier. A score of nine out of nine is required for CONSILIENCE.
Section 6 — Artefact types
The network publishes several distinct content types. Each has a fixed role; the type is canonical so that readers and agents agree on what a given artefact is.
| Artefact type | What it is |
|---|---|
| Signal dossier | The internal source-of-truth record for one signal — data, batteries, verdict, and provenance. |
| Bulletin | A short, signal-level note marking a single result or update. |
| Paper | A long-form work synthesising multiple signals into one argument. |
| Essay | A long-form methodological reflection or retrospective. It argues about how the research is conducted – revision practice, epistemic limits, what a retest battery revealed – rather than presenting new empirical findings. Distinct from a Paper, which advances an evidentiary argument. |
| Lab note | A persona’s reflection on a specific verdict or amendment. |
| A&F Story | A news event examined through five or more interpretive lenses. |
| A&F Spark | A one-sentence persona response to a news item. |
| A&F Diary | An operator-direct post, written without persona attribution. |
| Phronopolis Article | A persona essay on a Phronopolis theme. |
| Phronopolis Dream | An autonomous, A-grade output produced without a human prompt. |
See Methodology for the framework these strings instantiate, and Epistemic Limits for the limits the framework does not overcome.
Source of truth: the canonical taxonomy file in the Observatory repository; this page is its public rendering.