An LLM that predicts what the Trump administration will do next
A sample training example — question, source, and outcome-derived label.
Benchmark comparisons against frontier models.
Trump-Forecaster achieves lower Expected Calibration Error than GPT-5 in both context-aware (ECE 0.079 vs. 0.091) and context-free (ECE 0.164 vs. 0.191) settings — 13–14% better calibration, with the largest gains when no additional context is available.
Papers, models, datasets, notebooks, and write-ups for this case study.
Leverage your own raw data or use public sources. No labeling required.