Prediction Markets

Specialized forecasters that beat frontier LLMs on live prediction markets

#1
overall on Prophet Arena, ahead of GPT-5.2, Gemini 3 Pro, Claude, Grok, and Kimi
↗ Blog post
69%
lower calibration error vs. the base model on live Polymarket questions
↗ Blog post
~65%
of the Brier gap closed between the base model and Polymarket prices
↗ Blog post

Foresight V3 was trained from historical open-web news. The SDK generated prediction questions from information available at each article timestamp, resolved them from later sources, and Foresight Learning reinforced the reasoning paths that produced better probabilities. The result reached #1 on Prophet Arena, where every model receives identical context.

What we did


Read more

Primary artifacts for this case study.