Military Strikes Forecasting

A compact military-strikes forecaster built from public news

10.6%

lower Brier score than GPT-5.4

2×

better calibration than GPT-5.4

8.4×

larger Brier Skill Score lift than GPT-5.4

We trained a compact forecaster for Numinous-style military-strikes questions using public news, resolved outcomes, and Foresight Learning. On a held-out set of military-strikes forecasts, it beat GPT-5.4 on Brier score, calibration, and Brier Skill Score.

What we did

Generated military-strikes forecasting examples from five public-news search queries.
Resolved each question from later reporting, with no annotators or proprietary feeds.
Tuned from GPT-OSS-120B with Foresight Learning and benchmarked against GPT-5.4 and the base model.

Primary artifacts for this case study.

Notebookgithub.com

Military strikes fine-tune

Notebook for training and evaluating a military-strikes forecaster

Military Strikes Forecasting

What we did

Read more