Generate training data from real world sources.
Instantly.
Go from messy historical data to verified datasets in minutes.
What teams are saying
Real-world data has timestamps.
Not clean labels.
We turn historical data into verified training datasets automatically using Future-as-Label.
Use our built-in public sources—news, SEC filings, web—or bring your own docs, emails, and tickets.
Go from zero data to deployable AI in hours, not months.
Simple, powerful API
Generate verified datasets in a few lines of code. Our SDK handles the complexity.
- Grounded in retrieved evidence, not synthetic slop
- Bootstrap with public feeds: news, SEC filings, Wikipedia
- Complete provenance with citations and source docs
from lightningrod import Pipeline
pipeline = Pipeline([
NewsSeedGenerator(query="AI regulation"),
QuestionGenerator(
instructions="Forward-looking questions with dates"
),
WebSearchLabeler()
])
dataset = pipeline.run(n_samples=100) How it works
Choose Sources
Public web, news, filings—or your own docs, emails, tickets.
Define Questions
Natural language instructions + examples. No schema required.
Auto-Label
Outcomes from later in the data become ground-truth labels.
Verify
Every row traceable to sources. Full provenance built in.
The future is the label.
We pioneered Future-as-Label training: using temporal structure in historical data to generate supervision at scale. We used this to beat frontier AIs 100x larger on live prediction benchmarks.





