Trading Battery Kills (April 2026)
Seven trading signals killed by the April 2026 five-test finding-validator battery.
Unity of knowledge in an age of fragmentation
Hypotheses the Observatory tested and formally retired. A research programme that never kills anything is not doing science.
A research programme that never kills anything is not doing science. It is collecting confirmations.
The clear failures documented below are the easy part. The harder editorial judgement — what to do with an r = 0.18, p = 0.03, surrogate-passing but small-effect signal — is not a kill and so does not appear in any list. That is where the effect-size weighting and the consilience-upgrade requirements do most of their work, and it is where the remaining epistemic risk lives. Read the kill list with that in mind: it documents the unambiguous retirements, not the marginal calls.
| Signal class | Kills / total | Kill rate |
|---|---|---|
| Corpus-wide (all evaluated hypotheses) | 289 of 1594 | 18.1% |
| Trading signals (April 2026 battery) | 7 of 16 | 44% |
| Ancient-knowledge subset | 26 of ~75 | 34.6% |
| Session 6 quantitative-measurement batch | 1 of 19 | 5.3% |
The trading-signal and ancient-knowledge rates run higher than the corpus-wide figure because they apply a stricter multi-test battery; the Session 6 batch runs lower because quantitative physical measurements survive translation. Each rate is discussed in context below.
The Observatory has formally evaluated over 1594 independent hypotheses. Of these, 289 have been permanently retired — killed after failing the primary test, a fatal confound, or a formal multi-test battery that single-pass validation did not catch. What follows is a selection of the more scientifically instructive kills.
Not all kills are equal. The cases here are tagged by failure mode:
The corpus-wide kill rate across all evaluated hypotheses is 18.1% — 289 of 1594 signals retired. We consider that kill rate a quality signal, not a problem.
A selection note: the cases shown here are not the easiest kills. We highlight the hard cases — plausible mechanisms, coherent hypotheses, null results — because they better illustrate what the process is actually doing.
CONFOUNDEDHypothesis: Elevated geomagnetic storm activity (Kp index, Dst index) drives measurable increases in equity market volatility (VIX) through physiological stress pathways in traders.
Why it seemed plausible: Laboratory studies show geomagnetic storms affect melatonin secretion and autonomic nervous system function. If this influences decision-making under stress, a market signal might follow.
What the tests showed:
A note on the literature: Some published papers - including a widely cited 2003 study - report geomagnetic effects on equity returns. The Observatory’s verdict diverges from these because our tests controlled explicitly for the solar/business-cycle confound (geomagnetic activity and economic volatility share a common driver in the 11-year solar cycle) and applied surrogate significance testing. The positive papers did not. We are not dismissing that literature; we are reporting what happens when you add those controls. The effect disappears.
Verdict: KILLED. The EM - to - financial pathway is one of the clearest cognitive bias artifacts in the Observatory: humans intuitively reach for a “cosmic forces affect human behaviour” explanation. The data consistently refuses it.
PURE NOISEThe claim that the lunar cycle leaves a footprint in equity returns is one of the most persistent in alternative finance, and on the surface it is not absurd. A respectable body of academic papers has reported the effect; the proposed mechanism — moonlight disrupting sleep, sleep disrupting investor mood — has at least biological plausibility; and the 29.5-day cycle is short enough that decades of market data yield hundreds of observations to test against. None of that survives contact with the right control.
The decisive test is a surrogate one. When the observed return series is compared against randomised surrogate series carrying the same autocorrelation structure, the lunar signal returns a p-value of 0.531 — indistinguishable from noise. The published positive findings collapse for a related reason: they do not survive Bonferroni correction once you account for the number of markets and time windows that were searched before a “significant” pairing was found. And the mechanism itself does not hold up either, because modern indoor lighting has decoupled human melatonin production from actual moonlight for the better part of a century.
Verdict: KILLED. The lunar-equity literature is a multiple-comparisons artifact, and the surrogate control is what makes that diagnosis unambiguous rather than merely suspected.
ADAPTIVE RESPONSEHypothesis: Documented declines in wild pollinator populations produce measurable crop yield reductions across insect-pollinated commodity crops.
Why it seemed plausible: The biological mechanism is unambiguous - roughly 75% of flowering crops depend on animal pollination. Wild bee decline is well-documented. The causal pathway from colony collapse to yield reduction should be detectable.
What the tests showed:
Verdict: KILLED as a commodity signal. The ecology is correct; the economics neutralises it. This is a useful lesson in the difference between a confirmed biological mechanism and a tradeable signal: the market can adapt faster than the mechanism propagates.
PURE NOISEHypothesis: Earth’s Schumann resonance frequencies (7.83 Hz fundamental) affect human neurological function through electromagnetic coupling, producing measurable physiological or cognitive effects.
Why it seemed plausible: The Schumann resonance frequency overlaps with the human alpha brain wave range. Some researchers proposed resonant entrainment as a mechanism.
What the tests showed:
Verdict: KILLED on mechanism grounds. The field strength is physically insufficient by many orders of magnitude. This is a case where the mechanism review (Validation Dimension 8) is decisive before statistical testing is even necessary.
CONFOUNDEDHypothesis: The 87-year Gleissberg solar cycle modulates epidemic outbreak frequency in Chinese historical records through solar - climate - immune pathway interactions.
Why it seemed plausible: Chinese dynastic records contain unusually detailed epidemic documentation spanning multiple centuries. The Gleissberg cycle has documented climate effects. An immune system link had theoretical support.
What the tests showed:
Verdict: KILLED (NULL). The source data has a systematic bias that mimics cycles. Any periodicity in the record reflects the cycles of Chinese bureaucratic capacity as much as epidemic biology.
These five kills share a common structure: plausible mechanism, coherent hypothesis, null result. In each case, something specific ended the inquiry - a surrogate control, a field strength calculation, an adaptive market response, a data provenance problem.
The Observatory’s 8-step validation framework is designed to surface these failure modes before publication, not after. The devil’s advocate pass (Dimension 6) and the surrogate significance test (Dimension 4) between them account for four of the five kills above.
One thing this list cannot show: how we would have reported a case where the test returned a weak positive and we chose to proceed anyway. The kill list documents the clear failures, not the marginal calls — as the preamble at the top of this page notes, that is where the remaining epistemic risk lives.
The confirmed signals survived this process. The killed signals did not. Both facts matter.
In April 2026 a formal five-test finding-validator battery was applied to every active trading signal on the public dashboard. Of sixteen signals tested, seven failed all four pass-fail criteria — a 44% kill rate, well above the corpus-wide 16.3%. Four COT contrarian signals failed (corn, wheat, crude oil; EUR/USD was a directional kill), two macro signals inverted directionally (Sahm Rule, SIPRI military spending), and one spectral-transmission signal failed (market-microstructure Kitchin).
The full per-signal writeups are on the dedicated Trading Battery Kills page.
Extending the battery to long-wave and transmission-chain hypotheses produced two more consequential kills.
PURE NOISEThe Kondratieff long wave — a roughly 55-year cycle in inflation and prices, supposedly driven by successive clusters of technological innovation, from steam and textiles through rail and steel, electricity and chemicals, automobiles and petrochemicals, to ICT and digital — has a long pedigree in heterodox economics. We took it seriously. The Bank of England’s Millennium-scale CPI series (1270–2016) is one of the longest price records available anywhere, prior analyses had reported spectral significance across several long-run datasets, and our own earlier validation had given the signal a CONSILIENCE verdict on the strength of cross-dataset replication. The retest exists because that earlier verdict was reached before phase-randomised surrogate testing was applied at this scale.
When the surrogate test was applied, the cycle dissolved. Five hundred phase-randomised iterations put the observed spectral peak in the 40–65 year band at 0.0035 against a surrogate null mean of 0.0034 — a z-score of 0.12, p = 0.36. The peak is exactly what random data of the same amplitude spectrum produces. The supporting tests tell the same story: split the record at 1900 and the pre-1900 half is marginal (p = 0.13) while the post-1900 half shows nothing (p = 0.67); widen or shift the band and only the narrow 45–55 year window clears p < 0.05 while every neighbouring band fails; and the one additional dataset tested, Shiller’s S&P 500 annual returns, is a single corroboration rather than the independent replication a CONSILIENCE verdict requires.
Verdict: KILLED on surrogate grounds. The 55-year cycle reported in 800 years of price history is a phase-randomised artifact, not a periodic signal, and the earlier CONSILIENCE verdict was an overstatement traceable to a test that had not yet been run. We are not claiming the cycle cannot exist — only that the best 800-year dataset available cannot distinguish it from noise. The methodological lesson is the durable part: a spectral peak in a single dataset is not evidence of a cycle, and the surrogate comparison is what separates the two.
CONFOUNDEDHypothesis: A four-step causal chain from inventory-cycle indicators to equity volatility, mediated specifically by a credit-sentiment intermediary, propagates the Kitchin inventory cycle through to financial-market outcomes.
Why it seemed plausible: Each pairwise link in the chain is statistically significant on the full sample. The chain has a coherent economic story.
What the tests showed:
Verdict: PARTIAL. The specific-mediator framing does not survive multi-test review; intermediary specificity is not supported. The claim is retained as a downgraded statistical observation. Operational implications and reframed-claim details are withheld from public reporting.
The April 2026 long-wave retest adds one full kill (Kondratieff 55-year wave) and one major caveat on a credit-mediator hypothesis. The total Observatory kill rate rises marginally; more importantly, two widely-cited claims have been reframed.
Taking the trading-signal battery and long-wave battery together, the revised pattern is clear: claims that survive single-test validation often do not survive multi-test battery, and the distinction is decisive for commercial application.
One more battery worth publishing here, because the result runs against the narrative we had grown comfortable with.
CONFOUNDEDThe hypothesis under test here is not about any single practice but about a whole class of them. Traditional-knowledge systems — Moerman ethnobotany, Ayurveda, TCM, Ifa divination pharmacology, waru waru Andean engineering, Nubian tetracycline, Vedic constitutional frameworks, Mesoamerican calendrical astronomy, and their kin — were proposed to encode correct-on-modern-science practices more often than random hypothesis-generation would. If that held, the accumulated pattern-recognition of pre-modern empirical cultures would amount to a privileged source of research-worthy hypotheses. It was a comfortable idea to hold. The Observatory carries case-level confirmed examples across several of these traditions, the published literature contains independent cross-cultural convergence findings, and the narrative that ancient cultures encoded durable truths fit the data we had looked at.
It does not fit the data once you look at the whole batch. The figures that follow are within the ancient-knowledge signal batch specifically — 26 killed of roughly 75 ancient-knowledge signals — and are not corpus-wide rates. Inside that subset, the confirmed rate is 57.7 per cent against 81.0 per cent for all other signals: a gap of 23.3 percentage points, with z = −2.98 against a permutation null. The kill rate runs the same direction — 34.6 per cent inside the ancient-knowledge subset versus 10.4 per cent across the comparison set. Far from outperforming, ancient-source signals as a class do worse on both confirmation and kill rate than the rest of the Observatory at this scope.
The denominator is what gives the number its weight. The corpus-wide kill rate is 16.3 per cent — 251 signals killed on NOISE and NULL verdicts of 1,542 in the corpus, verified 15 May 2026 — so the 34.6 per cent within-subset figure sits about 3.3 times above that baseline. That gap is the finding. But the picture is not uniform: a separate 19-signal Session 6 batch covering calendars, agronomy, surveying, and time-keeping recorded just one kill of nineteen (5.3 per cent). Where the underlying observation is a quantitative physical measurement that survives translation, the pattern reverses. Both results stand together — the broad-population claim fails, the narrow specific-subset claim holds.
Verdict: KILLED at the population level. What is retired here is the broad framing — ancient knowledge as a privileged source of research hypotheses, full stop — not the confirmed cases inside it. We publish this kill because a killed signal is an output of the research process, not a failure of it, and a programme that retires hypotheses publicly when the evidence demands it is structurally different from one that does not. That this particular kill cuts against a theme we had grown comfortable with is the point, not an embarrassment.
Several different counts of “ancient-knowledge signals” appear across the papers — 19, 26, 75, and 119. These are not contradictory. They are different cohorts, defined at different times and at different scopes, as the ancient-knowledge work grew. The table below names each one so a reader meeting two figures in two papers can see why they differ.
| Cohort | When | n | Notes |
|---|---|---|---|
| Earliest ancient-corpus reference | April 2026 | 119 | The first broad inventory — the count of traditional-knowledge claims surveyed in the Ancient Precision Archive paper, before the standard validation framework had been applied to all of them as Observatory signals. Wider than the later signal batches because it includes claims that were never promoted to formal signals. |
| Session-6 quantitative-measurement batch | April 2026 | 19 | A focused batch of calendar, agronomy, surveying, and time-keeping signals — claims whose core is a quantitative physical measurement that survives translation. One kill (5.3%). |
| Full ancient-knowledge battery | April 2026 | ~75 | The full set of ancient-source signals carried in the Observatory at the time of the meta-battery. 26 killed (34.6%). |
| Ancient Precision Archive corpus | April–May 2026 | 26 | The ancient-source signals examined as a class in the population-level reconciliation above — the cohort that yielded the 57.7% confirmed vs 81.0% comparison figure. A subset of the ~75-signal battery, scoped to the class-level confirmation test. |
The numbers move because the cohort moves: 119 is the widest (surveyed claims), ~75 is the formal signal battery, 26 is the class-level reconciliation subset, and 19 is the narrow quantitative-measurement batch. Reading any one figure as the “true” corpus size, or treating two of them as a contradiction, mistakes a sequence of nested cohorts for a single fixed population.
UNDERPOWEREDHypothesis: Galactic cosmic ray flux, indexed by neutron monitor counts or the inverted sunspot number, leads corn prices by approximately 24 months. A previously-cited figure of r=+0.475 at the 24-month lag had entered the Observatory forecast scenarios as support for the 2031-2032 agricultural-price prediction.
Why it seemed plausible: The Svensmark cosmic-ray-cloud hypothesis has a coherent mechanism (GCR flux modulates low cloud nucleation, which modulates temperature and precipitation, which modulates crop yields). The 2029-2030 solar minimum is predictable astronomy, so if the chain held, the forward prediction would be crisp.
What the tests showed:
Verdict: WEAK, 1/4 PASS. The previously-cited r=+0.475 at 24 months is not supported by the data. The best observed correlation is four times smaller than claimed, at a different lag. The magnitude does not support the 2031-2032 agricultural-price prediction at the strength originally attached to it.
Seven trading signals killed by the April 2026 five-test finding-validator battery.