Trump Admin Actions

An LLM that predicts what the Trump administration will do next

+29%
better calibration than baseline on political outcomes
↗ Model card
682
held-out test questions evaluated on live political events
↗ Model card

What we did


Example datapoint

A sample training example — question, source, and outcome-derived label.


Results

Benchmark comparisons against frontier models.

Calibration on 682 Political Test Questions

Trump-Forecaster achieves lower Expected Calibration Error than GPT-5 in both context-aware (ECE 0.079 vs. 0.091) and context-free (ECE 0.164 vs. 0.191) settings — 13–14% better calibration, with the largest gains when no additional context is available.

Grouped bar chart comparing ECE of Foresight vs. GPT-5 in with-context and without-context conditions on 682 held-out political questions

Read more

Papers, models, datasets, notebooks, and write-ups for this case study.

Ready to build your own expert?

Leverage your own raw data or use public sources. No labeling required.