A forecaster that predicts supplier and logistics disruptions before they happen
A sample training example — question, source, and outcome-derived label.
Benchmark comparisons against frontier models.
The trained model beats GPT-5 and gpt-oss-120b on every headline metric: Brier 0.0791 vs. 0.1203 (GPT-5), Brier Skill Score +16.9% vs. −26.4%, and ECE 0.0525 vs. 0.1740 for the pretrained base — a ~70% reduction in calibration error.
The trained model closely tracks the perfect-calibration diagonal — when it says 30% probability, roughly 30% of disruptions materialize. GPT-5 and the base model severely under-predict high-risk events. Precision@10% improves 4× (35% vs. 9%).
Papers, models, datasets, notebooks, and write-ups for this case study.
Leverage your own raw data or use public sources. No labeling required.