QA from Public News

Query in.
Dataset out.

No data? No problem. Just a search query and a time window. We pull domain-specific news, generate forward-looking questions, and verify the answers automatically.

No data? No problem.

Start from nothing but a search query. We pull from historical news sources and build a complete, verified dataset for any topic.

Grounded in real outcomes

Questions are forward-looking, answers are verified against what actually happened. Every label is backed by evidence, not opinion.

Any topic, any time window

AI regulation, biotech, crypto, geopolitics — pick a domain and a date range. The pipeline handles sourcing, question generation, and labeling.

Simple, powerful API

Just a search query and a time window. We handle the rest.

  • Pull from historical news sources across any domain
  • Forward-looking questions verified against real outcomes
  • Full provenance with evidence and citations
GitHub
from lightningrod import Pipeline

pipeline = Pipeline([
    NewsSeedGenerator(query="AI regulation"),
    ForwardLookingQuestionGenerator(
        instructions="Questions about policy outcomes"
    ),
    WebSearchLabeler()
])

dataset = pipeline.run(n_samples=100)

Trusted by teams building AI

Shore Capital
Swayable
AirHelp
Brunswick Group
Fabletics
InPolicy
Precognition Labs
Caremaze
Takeoff 41
★★★★★

"Super impressed by Lightning Rod. We thought data prep would take weeks. We handed them our internal docs and got back 10,000 high-quality, citable QA pairs in hours—we were fine-tuning the next day."

Joe Phongpreecha
Joe Phongpreecha Co-founder & CEO, Takeoff 41
★★★★★

"10,000 labeled examples that we immediately put to work in our eval pipeline, teleporting us weeks ahead. The quality and thoroughness of the explanation made us highly confident to start using the data."

★★★★★

"Lightning Rod took a messy set of conversational transcripts and turned them into a complete training set ready for fine-tuning. The turnaround was fast enough that we went from idea to deployment in a single sprint. Without this, we would have been stuck in a proof-of-concept loop for months—instead, we got awesome results we could use on day one."

★★★★★

"We have an enormous amount of unstructured data about our portfolio companies, but it wasn't labeled or usable for training. Lightning Rod is the only solution that turns messy sources into high-quality, verified training data—unlocking real AI solutions to make smarter, better decisions."

★★★★★

"We had an excellent experience with Lightning Rod Labs. They delivered thousands of high-confidence Q&A pairs in an incredibly short amount of time—something that would have taken our team weeks to do manually. The cross-checking gave us strong confidence in the accuracy and reliability of the output. I highly recommend them to any team building AI!"

BB Chen
BB Chen Co-founder, CareTie
★★★★★

"We rapidly generated high-quality synthetic datasets to stress-test edge cases and policy variants that were difficult to source organically, significantly improving precision and recall in a fraction of the time."

★★★★★

"Incredibly easy way to generate high-quality datasets from public sources."

Generate datasets from public news today.