The Future is
the Label.

Peer-reviewed research. Live benchmark wins. Government-cleared.

Feb 2026 Benchmark

#1 on ProphetArena Sports

Foresight-32B beats every other model at predicting sports outcomes on ProphetArena, a live prediction market leaderboard — with 105.9% Market Return, ahead of GPT-5.2, Minimax M2, Gemini 3 Pro, and Qwen3-235B.

ProphetArena Sports leaderboard — Foresight V1 32B #1
Jan 29, 2026 Benchmark

Foresight-32B outperforms frontier models on ForecastBench

Top 5 on the ForecastBench tournament, outperforming Gemini 3 Pro, Claude Sonnet 4.5, and o3.

ForecastBench tournament leaderboard
Jan 27, 2026 Research

Foresight-tuned 32B model outperforms GPT-5 at predicting public company risks

Foresight learning on raw SEC filings trains a 32B parameter model to beat GPT-5 in accuracy & calibration at predicting public company risks. Deployable on a single GPU for maximum data privacy.

SEC Risk: Brier Score, Brier Skill Score, ECE, and Calibration Reliability Diagram
Jan 9, 2026 Core Method

Future-as-Label enables scalable RL

We show that AI can learn directly from real-world outcomes at unlimited scale, no human annotation required. The future itself becomes the training signal. Improved Brier scores 27% and halved calibration error, outperforming Qwen3-235B with a 32B model.

Brier scores: Foresight training vs base models
Aug 2025 Performance

Foresight-32B beats frontier LLMs on live Polymarket predictions

On live Polymarket data, Foresight-32B defeated models 100x larger across every key metric — Brier score, calibration error, and simulated trading profit.

Polymarket benchmark — Brier Scores, Calibration Error, Simulated Trading
Jul 2025 Government

Defense & DARPA awardable

Vetted and approved for immediate defense procurement. Government agencies can access our technology directly via the ERIS and CDAO Tradewinds federal innovation marketplaces.

CDAO Tradewinds Solutions Marketplace — Awardable DARPA ERIS Marketplace — Awardable
May 2025 Peer-Reviewed

Published in TMLR: Outcome-based RL achieves frontier accuracy with a 14B model

Our 14B model matches OpenAI o1 in predictive accuracy and generates >10% profit in live trading simulations — published in Transactions on Machine Learning Research.

Simulated trading profit across models
Feb 2025 Research

LLMs can teach themselves to predict the future

Self-play and DPO yield 7–10% accuracy improvements on Phi-4 14B and DeepSeek-R1 14B — bringing smaller models to frontier-level forecasting performance without any human-annotated training data.

Ridge Plot of Brier Scores — fine-tuned vs base models

Our Founder

Ben Turtel

Ben Turtel

Founder & CEO
Founder & CTO of Rivet @ Area 120
Acquired by Google Assistant
10+ years in Machine Learning, AI, and NLP
6+ years Google SWE (L5), applied AI
Masters in Scientific Computing from NYU
Mentor to startups at StartX (Stanford), The Garage (Northwestern), and CoinTelegraph Accelerator
LinkedIn LinkTree Substack
StartX Koa Lab Area 120 CDL Google Microsoft NVIDIA Inception Higher Ground Labs Gumi Cryptos Phaze Ventures Endless Frontier Labs

Train AI experts for any domain.

See how Lightning Rod turns your sources into verified training data in minutes.

Get Started Book a Demo