Trump Admin Actions

+29%

better calibration than baseline on political outcomes

↗ Model card

682

held-out test questions evaluated on live political events

↗ Model card

What we did

Collected timestamped administration statements, executive orders, and news coverage.
Generated forecasting questions about policy actions, court rulings, and legislative outcomes.
Labeled each with the actual subsequent action or decision.
Fine-tuned a specialized model and evaluated on 682 held-out political questions vs. GPT-5.

Example datapoint

A sample training example — question, source, and outcome-derived label.

DATASET Policy actions

Question

Will the Jan 27, 2025 executive order threatening 25% tariffs on Canadian imports face a successful federal court injunction or vacatur within 90 days of signing?

Question source

New York Times Jan 27, 2025

Trump Threatens Canada With 25% Tariffs Over Border Security

Label

Yes.

Type

binary

Confidence

0.92

Label source

Reuters Apr 22, 2025

Federal judge enjoins enforcement of 25% Canadian import tariffs pending merits briefing

DATASET Policy actions

Question

What is the probability that EPA’s March 2025 light-duty vehicle emissions standard is withdrawn or replaced via notice-and-comment rulemaking before the Nov 3, 2026 midterm elections?

Question source

The Hill Mar 10, 2025

Administration officials defend rule as final after interagency review

Label

0.34

Type

continuous

Confidence

0.77

Label source

Federal Register Sep 2, 2025

Agency publishes notice of proposed rulemaking to replace prior standard

DATASET Policy actions

Question

Will the Senate pass the FY2026 National Defense Authorization Act conference report with at least 60 votes before the Aug 2025 recess, given current Armed Services Committee markup timing?

Question source

Defense News Jun 9, 2025

SASC chair targets floor vote by late July after UAV procurement fight

Label

Yes.

Type

binary

Confidence

0.88

Label source

U.S. Senate Jul 22, 2025

NDAA FY2026 passes 68–29 after Ukraine aid title compromise

Results

Benchmark comparisons against frontier models.

Calibration on 682 Political Test Questions

Trump-Forecaster achieves lower Expected Calibration Error than GPT-5 in both context-aware (ECE 0.079 vs. 0.091) and context-free (ECE 0.164 vs. 0.191) settings — 13–14% better calibration, with the largest gains when no additional context is available.