Foresight Data Training Data That
Training Data That
Labels Itself
Go from messy historical data to verified training datasets — no labeling or annotation needed.
Real-world data has timestamps.
Not clean labels.
Turn historical data into verified training datasets automatically using Future-as-Label.
Turn messy data into
training-ready datasets
Choose Sources
Public web, news, filings — or your own docs, emails, tickets.
Define Questions
Natural language instructions + examples. No schema required.
Auto-Label
Outcomes from later in the data become ground-truth labels.
Verify
Every row traceable to sources. Full provenance built in.
Simple, powerful API
Define your sources, time window, and question style. The SDK handles generation, labeling, and verification.
- Bring your own files or use built-in public sources
- Questions auto-generated from your domain context
- Labels verified against real outcomes — no human annotation
from lightningrod import Pipeline
pipeline = Pipeline([
FileSetSeedGenerator(file_set_id="your-fileset-id"),
ForwardLookingQuestionGenerator(
instructions="Questions about business outcomes"
),
RAGLabeler()
])
dataset = pipeline.run(n_samples=100) Every Record is Verified
Each data point comes with evidence, citations, and confidence — not just a label.
- Ground-truth labels from real outcomes, not LLM opinions
- Full citations traceable to original sources
- Reasoning chain explaining how each answer was resolved
- Ready for fine-tuning — export as HuggingFace, Parquet, or JSON
{
"question": "Will the EU AI Act be enforced against a major tech company by Feb 2025?",
"correct_answer": 0,
"resolution_reasoning": "Prohibited practices provisions took effect Feb 2, 2025. No enforcement actions announced...",
"source_citations": [
"reuters.com/...",
"ec.europa.eu/..."
]
} Trusted by teams building AI