Beat frontier AI models 100x your size on live predictions
Train specialized models to outperform general-purpose LLMs on prediction markets. Using future outcomes as training labels, models learn causal reasoning from real-world temporal data — no annotation required.
The kinds of questions a model trained on your data can answer.
Benchmark comparisons against frontier models
Foresight V3 holds the #1 spot on ProphetArena's live benchmark, ahead of Gemini 3 Pro and GPT-5.2 — while being 10–100× smaller than the frontier models it beats.
On 251 live Polymarket questions, Foresight-32B achieved Brier score 0.199 vs. GPT-5's 0.207 — with 69% lower calibration error (ECE 6.0% vs. 16.1%) and positive simulated trading profit while frontier models lost money.
Primary write-ups and artifacts for this solution.
Leverage your own raw data or use public sources. No labeling required.