Prediction Market Forecasting

Beat frontier AI models 100x your size on live predictions

Train specialized models to outperform general-purpose LLMs on prediction markets. Using future outcomes as training labels, models learn causal reasoning from real-world temporal data — no annotation required.

#1
ranked forecaster on ProphetArena, beating GPT-5, Gemini & Claude
↗ Blog post
69%
reduction in calibration error vs. base model on live Polymarket questions
↗ Blog post
10–100×
smaller than the frontier models it beats
↗ TMLR paper

Example prediction questions

The kinds of questions a model trained on your data can answer.


Key results

Benchmark comparisons against frontier models

ProphetArena Overall Leaderboard

Foresight V3 holds the #1 spot on ProphetArena's live benchmark, ahead of Gemini 3 Pro and GPT-5.2 — while being 10–100× smaller than the frontier models it beats.

ProphetArena leaderboard: Foresight V3 #1, Gemini 3 Pro #2, GPT-5.2 #3

Live Polymarket Benchmark

On 251 live Polymarket questions, Foresight-32B achieved Brier score 0.199 vs. GPT-5's 0.207 — with 69% lower calibration error (ECE 6.0% vs. 16.1%) and positive simulated trading profit while frontier models lost money.

Three bar charts comparing Brier score, calibration error, and simulated trading profit: Foresight-32B leads on all three vs. o3, Gemini 2.5 Pro, Grok-4, and Claude Opus

Explore

Primary write-ups and artifacts for this solution.

Ready to build your own expert?

Leverage your own raw data or use public sources. No labeling required.