Public Company Risks

Predict which 10-K risk factors actually precede enforcement, restatements, and drawdowns

+11.6%
Brier Skill Score on 6,109 SEC risk queries
↗ Blog post
64.7%
lower calibration error than GPT-5
↗ Blog post

What we did


Example datapoint

A sample training example — question, source, and outcome-derived label.


Results

Benchmark comparisons against frontier models.

SEC Risk Prediction: Brier Score, Skill, and Calibration

Fine-tuned Qwen3-32B achieves Brier Skill Score +11.6% with ECE of 0.029 across 6,109 SEC risk queries — 64.7% lower calibration error than GPT-5 (ECE 0.081). The model learns to distinguish boilerplate legal language from meaningful signals preceding adverse outcomes.

Four charts: Brier score, Brier Skill Score, ECE, and calibration reliability diagram comparing trained Qwen3-32B vs. GPT-5 vs. Naive Baseline

Read more

Papers, models, datasets, notebooks, and write-ups for this case study.

Ready to build your own expert?

Leverage your own raw data or use public sources. No labeling required.