Supply Chain Disruptions

A forecaster that predicts supplier and logistics disruptions before they happen

34%
more accurate than GPT-5 on supply chain predictions
↗ arXiv paper
59%
better calibrated than GPT-5
↗ arXiv paper
precision@10% vs. baseline on high-risk disruptions
↗ arXiv paper

What we did


Example datapoint

A sample training example — question, source, and outcome-derived label.


Results

Benchmark comparisons against frontier models.

Aggregate Performance vs. GPT-5 and gpt-oss-120b

The trained model beats GPT-5 and gpt-oss-120b on every headline metric: Brier 0.0791 vs. 0.1203 (GPT-5), Brier Skill Score +16.9% vs. −26.4%, and ECE 0.0525 vs. 0.1740 for the pretrained base — a ~70% reduction in calibration error.

Three bar charts comparing Binary Brier, Brier Skill Score, and Binary ECE for the trained model, base rate, GPT-5, and gpt-oss-120b — trained model leads on all three

Calibration vs. GPT-5 on Supply Chain Predictions

The trained model closely tracks the perfect-calibration diagonal — when it says 30% probability, roughly 30% of disruptions materialize. GPT-5 and the base model severely under-predict high-risk events. Precision@10% improves 4× (35% vs. 9%).

Reliability diagram showing the trained model (yellow) tracking perfect calibration while GPT-5 and gpt-oss-120b deviate significantly from the diagonal

Read more

Papers, models, datasets, notebooks, and write-ups for this case study.

Ready to build your own expert?

Leverage your own raw data or use public sources. No labeling required.