Trading Battery Kills (April 2026)
Seven trading signals killed by the April 2026 five-test finding-validator battery.
Hypotheses the Observatory tested and formally retired. A research programme that never kills anything is not doing science.
A research programme that never kills anything is not doing science. It is collecting confirmations.
The Observatory has formally evaluated over 400 independent hypotheses. Of these, over 120 have been permanently retired — killed after failing the primary test, a fatal confound, or a formal multi-test battery that single-pass validation did not catch. What follows is a selection of the more scientifically instructive kills.
Not all kills are equal. The cases here are tagged by failure mode:
The kill rate across all evaluated hypotheses is approximately 30%. The trading-signal battery applied in April 2026 produced a higher rate — 44% — because it applies a stricter multi-test standard than single-pass confirmation. We consider both rates a quality signal, not a problem.
A selection note: the cases shown here are not the easiest kills. We highlight the hard cases — plausible mechanisms, coherent hypotheses, null results — because they better illustrate what the process is actually doing.
CONFOUNDEDHypothesis: Elevated geomagnetic storm activity (Kp index, Dst index) drives measurable increases in equity market volatility (VIX) through physiological stress pathways in traders.
Why it seemed plausible: Laboratory studies show geomagnetic storms affect melatonin secretion and autonomic nervous system function. If this influences decision-making under stress, a market signal might follow.
What the tests showed:
A note on the literature: Some published papers - including a widely cited 2003 study - report geomagnetic effects on equity returns. The Observatory’s verdict diverges from these because our tests controlled explicitly for the solar/business-cycle confound (geomagnetic activity and economic volatility share a common driver in the 11-year solar cycle) and applied surrogate significance testing. The positive papers did not. We are not dismissing that literature; we are reporting what happens when you add those controls. The effect disappears.
Verdict: KILLED. The mechanism operates through physical climate systems (cloud cover, temperature, precipitation). It does not operate through trader psychology. The EM - to - financial pathway is one of the clearest cognitive bias artifacts in the Observatory: humans intuitively reach for a “cosmic forces affect human behaviour” explanation. The data consistently refuses it.
What survives: The physical climate pathway (EM - cloud - agriculture) is confirmed. The psychological market pathway is not.
PURE NOISEHypothesis: Lunar phase cycles produce statistically significant patterns in equity market returns, mediated by circadian rhythm disruption or investor sentiment effects.
Why it seemed plausible: Numerous academic papers claimed lunar effects on stock markets. The mechanism had biological plausibility (melatonin, sleep quality). The 29.5-day cycle is short enough to accumulate many observations.
What the tests showed:
Verdict: KILLED. The published literature on lunar - equity effects is a multiple comparisons artifact. The surrogate control is the decisive test.
ADAPTIVE RESPONSEHypothesis: Documented declines in wild pollinator populations produce measurable crop yield reductions across insect-pollinated commodity crops.
Why it seemed plausible: The biological mechanism is unambiguous - roughly 75% of flowering crops depend on animal pollination. Wild bee decline is well-documented. The causal pathway from colony collapse to yield reduction should be detectable.
What the tests showed:
Verdict: KILLED as a commodity signal. The ecology is correct; the economics neutralises it. This is a useful lesson in the difference between a confirmed biological mechanism and a tradeable signal: the market can adapt faster than the mechanism propagates.
PURE NOISEHypothesis: Earth’s Schumann resonance frequencies (7.83 Hz fundamental) affect human neurological function through electromagnetic coupling, producing measurable physiological or cognitive effects.
Why it seemed plausible: The Schumann resonance frequency overlaps with the human alpha brain wave range. Some researchers proposed resonant entrainment as a mechanism.
What the tests showed:
Verdict: KILLED on mechanism grounds. The field strength is physically insufficient by many orders of magnitude. This is a case where the mechanism review (Validation Dimension 8) is decisive before statistical testing is even necessary.
CONFOUNDEDHypothesis: The 87-year Gleissberg solar cycle modulates epidemic outbreak frequency in Chinese historical records through solar - climate - immune pathway interactions.
Why it seemed plausible: Chinese dynastic records contain unusually detailed epidemic documentation spanning multiple centuries. The Gleissberg cycle has documented climate effects. An immune system link had theoretical support.
What the tests showed:
Verdict: KILLED (NULL). The source data has a systematic bias that mimics cycles. Any periodicity in the record reflects the cycles of Chinese bureaucratic capacity as much as epidemic biology.
These five kills share a common structure: plausible mechanism, coherent hypothesis, null result. In each case, something specific ended the inquiry - a surrogate control, a field strength calculation, an adaptive market response, a data provenance problem.
The Observatory’s 8-step validation framework is designed to surface these failure modes before publication, not after. The devil’s advocate pass (Dimension 6) and the surrogate significance test (Dimension 4) between them account for four of the five kills above.
One thing this list cannot show: how we would have reported a case where the test returned a weak positive and we chose to proceed anyway. The kill list documents the clear failures. The harder editorial judgement - what to do with a r = 0.18, p = 0.03, surrogate-passing but small-effect signal - is not captured here. That is where the effect size weighting and consilience upgrade requirements do most of their work, and where the remaining epistemic risk lives.
The confirmed signals survived this process. The killed signals did not. Both facts matter.
In April 2026 we applied a formal five-test finding-validator battery to every active trading signal on the public dashboard. The battery: Monte Carlo null, blind replication with era split, specificity against negative controls, tolerance sensitivity. Of sixteen signals tested, seven failed all four pass-fail criteria. Three were directional inversions where the observed effect ran opposite to the published claim. We document them here in the same spirit as the five cases above.
PURE NOISE · EUR/USD DIRECTIONAL INVERSIONHypothesis: Extreme commercial hedger short positioning (z-score below minus two) signals contrarian buy opportunities.
Why it seemed plausible: Commercial hedgers are producers and end-users; when they are maximally short, the market prices in the most bearish scenario physical participants expect. Contrarian trades at these extremes have produced returns in academic literature.
What the tests showed:
Verdict: KILLED for corn, wheat, crude. EUR/USD directionally inverted. COT positioning as a contrarian signal survives only in soybeans (88% hit at z below minus two, but n = 17 over 39 years, 5 post-2010). We continue to track soybeans; the others leave the dashboard.
DIRECTIONAL INVERSIONHypothesis: When the Sahm Rule triggers, commodity demand destruction follows and commodity prices decline over six to twelve months.
Why it seemed plausible: The Sahm Rule is a reliable recession indicator; recessions reduce industrial activity which drives commodity demand.
What the tests showed:
The more extreme the recession signal, the larger the subsequent commodity rally. The 2008 and 2020 Sahm triggers both produced massive commodity rallies - China stimulus after 2008, Federal Reserve quantitative easing and reflation after 2020.
Verdict: DIRECTIONALLY INVERTED. The published claim is directionally backwards. The real pattern appears to be that extreme recession signals precede policy response which precedes a commodity-friendly reflationary regime. Whether a reframed version of the claim survives its own devil’s advocate pass is a separate question we have not completed.
DIRECTIONAL INVERSIONHypothesis: Global military spending surges (year-over-year growth above five per cent in real terms) lead commodity price booms by two to four years.
Why it seemed plausible: World wars produced commodity booms; Cold War build-ups drove sustained metals demand. NATO commitments of 2023 and 2024 looked like the start of another cycle.
What the tests showed:
Military surge effect is indistinguishable from baseline. Military declines actually precede higher commodity returns. The hypothesis does not survive at any tolerance threshold.
Verdict: KILLED. Historical wartime booms were driven by specific supply disruptions (oilfields occupied, minerals in combat zones, shipping interdicted), not generic industrial-mobilisation demand. Peacetime military surges do not reproduce the effect. The 2025-2027 bullish-commodities prediction attached to NATO 2023-2024 surges is not supported by the seventy-four-year record.
PURE NOISEHypothesis: The 3.4-year Kitchin inventory cycle propagates through financial microstructure, producing detectable spectral peaks in the 2.5-4.5 year band of credit spreads (BAA-10Y) and VIX.
Why it seemed plausible: The Kitchin cycle is well-documented in inventory data. If inventory drives real activity it should leave a spectral signature in credit and volatility series sensitive to that activity.
What the tests showed:
The Kitchin-band peak is the smallest of the three. It is also indistinguishable from phase-randomised surrogates (surrogate null 0.085 plus or minus 0.021; observed 0.079 is below the null mean). Both BAA and VIX fail the peak-significance test.
Verdict: KILLED as a spectral-transmission claim. This is importantly different from the Kitchin phase-clock signal, which operates on ISRATIO and maps instantaneous phase to equity-regime forward returns. The phase-clock survives a 5-test battery with held-out z = 8.58 on post-2010 data; the spectral-transmission claim on financial series does not. The mechanism operates through real-economic phase, not credit-spread spectral power.
As of April 2026 the Observatory’s trading-signal kill rate after formal battery testing is approximately 44 per cent of tested signals. This is higher than the overall Observatory kill rate of 28 per cent because the battery applies a stricter multi-test standard than the original confirmation process. We consider the higher number a feature. The signals that survive battery testing - VIX term structure, gold-silver ratio, Kitchin phase clock, EBP equity stress, VIX regime - carry correspondingly higher confidence.
The main list above illustrates classical kill modes (mechanism, surrogate, adaptive response). The April 2026 set illustrates a different lesson: a signal passing single-test validation does not imply the signal survives held-out replication, era stability, threshold sensitivity, and specificity against controls simultaneously. The five-test battery is how we separate the two.
Extending the battery to long-wave and transmission-chain hypotheses produced two more consequential kills.
PURE NOISEHypothesis: A ~55-year Kondratieff long wave in inflation and prices shows a statistically significant spectral peak in the Bank of England Millennium-scale CPI series (1270-2016), reflecting a cycle of technological-innovation clusters (steam/textiles, rail/steel, electricity/chemicals, automobiles/petrochemicals, ICT/digital) driving macro regimes.
Why it seemed plausible: The Kondratieff claim has a long pedigree in heterodox economics. The BoE millennium dataset is one of the longest price series available anywhere. Prior analyses reported spectral significance across multiple long-run datasets. Our own earlier validation gave the signal a CONSILIENCE verdict on the basis of cross-dataset replication.
What the tests showed:
Verdict: KILLED on surrogate grounds. The previously-reported 55-year cycle in 800-year price history appears to be a phase-randomised surrogate-producible artifact, not a true periodic signal. The earlier CONSILIENCE verdict was overstated because surrogate testing had not been applied at this scale. We are not claiming the cycle cannot exist - only that with the best 800-year dataset available, there is no evidence to distinguish it from noise.
This is an important methodological lesson. A spectral peak observed in a single dataset is not a sufficient basis for the Kondratieff claim. The test that separates a real cycle from a coincidental peak is phase-randomised surrogate comparison, and when we apply it, the 55-year claim does not survive.
CONFOUNDEDHypothesis: The four-step causal chain ISRATIO → EBP → NFCI → VIX mediates the propagation of the Kitchin inventory cycle through to equity volatility. EBP specifically (the Gilchrist-Zakrajsek Excess Bond Premium) captures credit-sentiment shifts before they show up in financial-conditions indices.
Why it seemed plausible: Each pairwise link is statistically significant in VAR(4) on 385 monthly observations. The chain has a coherent economic story.
What the tests showed:
Verdict: PARTIAL. The ISRATIO → macro-stress → NFCI → VIX transmission is real. EBP is NOT specifically mediating it - unemployment mediates equally well or better. The claim should be reframed. The forward-bet thesis based on NFCI → VIX (the robust link) is preserved; the EBP-specific-mediator framing is not.
The April 2026 long-wave retest adds one full kill (Kondratieff 55-year wave) and one major caveat (EBP not specifically Kitchin-mediating). The total Observatory kill rate rises marginally; more importantly, two widely-cited claims have been reframed.
Taking the trading-signal battery and long-wave battery together, the revised pattern is clear: claims that survive single-test validation often do not survive multi-test battery, and the distinction is decisive for commercial application.
One more battery worth publishing here, because the result runs against the narrative we had grown comfortable with.
CONFOUNDEDHypothesis: Traditional-knowledge systems (Moerman ethnobotany, Ayurveda, TCM, Ifa divination pharmacology, waru waru Andean engineering, Nubian tetracycline, Vedic constitutional frameworks, Mesoamerican calendrical astronomy, and similar ancestral knowledge traditions) selected correct-on-modern-science practices more often than random hypothesis would produce. The claim is that the accumulated pattern-recognition of pre-modern empirical cultures surfaces truths that modern science later confirms, and that this makes traditional knowledge a privileged source of research-worthy hypotheses.
Why it seemed plausible: The Observatory has genuine confirmed cases - Nubian tetracycline (bone labels demonstrate therapeutic levels 1,598 years before Western discovery), Ifa pharmacological hit rate (with publication bias caveat), aboriginal fire management, waru waru, berberine for diabetes in TCM, artemisinin from qinghao, and the phylogenetic convergence finding (Saslis-Lagoudakis 2012 PNAS p<0.001 across seven zero-contact traditions). The narrative that ancient cultures encoded durable truths in their practices appeared to fit the data.
What the tests showed:
Verdict: KILLED as a general claim. The position that ancient knowledge serves as a privileged source of research hypotheses is not supported by the Observatory’s data. Ancient-source signals underperform on both confirmation and kill rate relative to the rest of the Observatory.
What survives: The phylogenetic convergence anchor (Saslis-Lagoudakis 2012) remains a real cross-cultural finding at p less than 0.001. Specific named signals that did pass (Nubian tetracycline, aboriginal fire management, berberine-for-diabetes, waru waru agriculture) are real and remain in the corpus. The phenomenon of a few ancient traditions encoding durable empirical truths is not in question. What fails is the inference from “some ancient claims validate” to “ancient knowledge is a privileged source of validated knowledge.”
Revised narrative: Traditional-knowledge sources produce hypotheses that validate at below-average rates by Observatory standards. The residual positives are real and worth studying case-by-case. The framing of ancient wisdom as a reliable corpus of truths deserves to be retired.
This is an uncomfortable kill for a research program that has valorised the frequency-convergence and traditional-knowledge themes. We include it here because that is what the evidence says.
UNDERPOWEREDHypothesis: Galactic cosmic ray flux, indexed by neutron monitor counts or the inverted sunspot number, leads corn prices by approximately 24 months. A previously-cited figure of r=+0.475 at the 24-month lag had entered the Observatory forecast scenarios as support for the 2031-2032 agricultural-price prediction.
Why it seemed plausible: The Svensmark cosmic-ray-cloud hypothesis has a coherent mechanism (GCR flux modulates low cloud nucleation, which modulates temperature and precipitation, which modulates crop yields). The 2029-2030 solar minimum is predictable astronomy, so if the chain held, the forward prediction would be crisp.
What the tests showed:
Verdict: WEAK, 1/4 PASS. The previously-cited r=+0.475 at 24 months is not supported by the data. The best observed correlation is four times smaller than claimed, at a different lag. Direction is correct, but the magnitude does not support the 2031-2032 agricultural-price prediction at the strength originally attached to it. The post-1990 subsample showing r=0.30 is worth watching as a possible recent-era regime but is not a substitute for the stronger long-run claim.
Revised narrative: The GCR-climate-crop mechanism remains biophysically plausible; the Observatory’s own data provides only weak empirical support for the commodity-price signal at the claimed strength.
Seven trading signals killed by the April 2026 five-test finding-validator battery.