Clinical Outcomes

A compact clinical forecaster trained directly from raw patient notes

+27%
Brier Skill Score vs. the clinical base-rate baseline
↗ arXiv paper
~69%
lower calibration error vs. the base model
↗ arXiv paper
84%
win-rate vs. the base model in blind reasoning review
↗ arXiv paper

We tuned a compact clinical forecaster from GPT-OSS-120B to predict clinical events from raw MIMIC-III notes, using later patient records to resolve the outcomes. The paper reports a 27% Brier Skill Score, about 70% lower calibration error than the base model, and a slightly better Brier score than GPT-5.

What we did


Read more

Primary artifacts for this case study.