Pro Golf

+17%

Brier Skill Score on held-out golf questions

41%

lower calibration error than GPT-5

lower Brier score than GPT-5

We built a public golf forecasting dataset and trained Golf-Forecaster on resolved tournament questions. The model card reports better Brier score, Brier Skill Score, and calibration than GPT-5 on temporally held-out golf questions.

What we did

Generated golf forecasting questions from public news context and resolved outcomes.
Split the data temporally so evaluation used later questions than training.
Tuned from GPT-OSS-120B with a Brier-score reward and evaluated against GPT-5.

Primary artifacts for this case study.

Notebookgithub.com

Golf forecasting fine-tune

Step-by-step notebook for training a golf outcome predictor

Modelhuggingface.co

Golf-Forecaster

Fine-tuned model for golf tournament outcome prediction

Datasethuggingface.co

Golf forecasting dataset

Tournament questions paired with resolved finishing positions

What we did

Read more