Building a Multi-Agent Research Lab: Scientist, Librarian, Analyst, and Writer Agents

The Research Lab Concept

Research is inherently a multi-stage process: formulating questions, finding sources, analyzing evidence, and synthesizing findings into a coherent document. A single AI agent attempting all four stages produces shallow results because it cannot specialize — it must juggle search queries, citation tracking, statistical reasoning, and academic writing simultaneously.

A multi-agent research lab assigns each stage to a specialized agent. The Scientist formulates hypotheses and directs research. The Librarian discovers and manages sources. The Analyst evaluates evidence and finds patterns. The Writer synthesizes everything into a structured document. Each agent excels at its narrow responsibility, and the handoffs between them enforce quality gates.

Shared Data Structures

from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional
from enum import Enum
import uuid

@dataclass
class Source:
    source_id: str = field(default_factory=lambda: str(uuid.uuid4()))
    title: str = ""
    url: str = ""
    content_summary: str = ""
    relevance_score: float = 0.0
    source_type: str = ""  # "paper", "article", "dataset", "book"
    metadata: Dict[str, Any] = field(default_factory=dict)

@dataclass
class ResearchQuestion:
    question: str
    sub_questions: List[str] = field(default_factory=list)
    hypothesis: Optional[str] = None
    priority: int = 1

@dataclass
class AnalysisFinding:
    claim: str
    supporting_sources: List[str]  # Source IDs
    confidence: float = 0.0  # 0.0 to 1.0
    evidence_summary: str = ""
    contradicting_sources: List[str] = field(default_factory=list)

@dataclass
class ResearchProject:
    topic: str
    questions: List[ResearchQuestion] = field(default_factory=list)
    sources: List[Source] = field(default_factory=list)
    findings: List[AnalysisFinding] = field(default_factory=list)
    draft: str = ""
    status: str = "initialized"

The Scientist Agent

The Scientist drives the research process. It formulates research questions, evaluates whether enough evidence has been gathered, and decides when the research is complete.

from openai import AsyncOpenAI
import json

client = AsyncOpenAI()

async def scientist_agent(
    topic: str, existing_findings: Optional[List[AnalysisFinding]] = None
) -> List[ResearchQuestion]:
    context = f"Research topic: {topic}\n"
    if existing_findings:
        context += "\nExisting findings:\n"
        for f in existing_findings:
            context += f"- {f.claim} (confidence: {f.confidence})\n"
        context += "\nIdentify gaps and generate follow-up questions.\n"
    else:
        context += "Generate initial research questions and hypotheses.\n"

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a research scientist. Generate structured research "
                    "questions with sub-questions and hypotheses. Return JSON: "
                    "questions (list of objects with question, sub_questions, "
                    "hypothesis, priority)."
                ),
            },
            {"role": "user", "content": context},
        ],
        response_format={"type": "json_object"},
    )
    data = json.loads(response.choices[0].message.content)
    return [ResearchQuestion(**q) for q in data["questions"]]

The Librarian Agent

The Librarian handles source discovery and management. It searches for relevant materials, deduplicates sources, and maintains a citation index.

async def librarian_agent(
    questions: List[ResearchQuestion],
    existing_sources: List[Source],
) -> List[Source]:
    existing_titles = {s.title for s in existing_sources}

    search_prompt = "Find relevant sources for these research questions:\n"
    for q in questions:
        search_prompt += f"- {q.question}\n"
        for sq in q.sub_questions:
            search_prompt += f"  - {sq}\n"

    if existing_sources:
        search_prompt += (
            f"\nAlready have {len(existing_sources)} sources. "
            "Find complementary sources that fill gaps."
        )

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a research librarian. For each research question, "
                    "suggest relevant academic papers, articles, and datasets. "
                    "Return JSON: sources (list of objects with title, url, "
                    "content_summary, relevance_score, source_type)."
                ),
            },
            {"role": "user", "content": search_prompt},
        ],
        response_format={"type": "json_object"},
    )
    data = json.loads(response.choices[0].message.content)
    new_sources = []
    for s in data["sources"]:
        if s["title"] not in existing_titles:
            new_sources.append(Source(**s))
    return new_sources

The Analyst Agent

The Analyst evaluates evidence across sources, identifies patterns, and produces structured findings with confidence scores.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

async def analyst_agent(
    questions: List[ResearchQuestion],
    sources: List[Source],
) -> List[AnalysisFinding]:
    analysis_prompt = "Analyze these sources against the research questions.\n"
    analysis_prompt += "\nQUESTIONS:\n"
    for q in questions:
        analysis_prompt += f"- {q.question} (hypothesis: {q.hypothesis})\n"
    analysis_prompt += "\nSOURCES:\n"
    for s in sources:
        analysis_prompt += (
            f"- [{s.source_id[:8]}] {s.title}: {s.content_summary}\n"
        )

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a research analyst. Cross-reference sources to "
                    "produce evidence-based findings. For each finding, cite "
                    "supporting source IDs and note any contradictions. Return "
                    "JSON: findings (list of objects with claim, "
                    "supporting_sources, confidence, evidence_summary, "
                    "contradicting_sources)."
                ),
            },
            {"role": "user", "content": analysis_prompt},
        ],
        response_format={"type": "json_object"},
    )
    data = json.loads(response.choices[0].message.content)
    return [AnalysisFinding(**f) for f in data["findings"]]

The Writer Agent

The Writer synthesizes findings into a structured research document with proper citations.

async def writer_agent(
    project: ResearchProject,
) -> str:
    write_prompt = f"Topic: {project.topic}\n\n"
    write_prompt += "FINDINGS:\n"
    for f in project.findings:
        write_prompt += (
            f"- {f.claim} (confidence: {f.confidence})\n"
            f"  Evidence: {f.evidence_summary}\n"
        )
    write_prompt += "\nSOURCES:\n"
    for s in project.sources:
        write_prompt += f"- [{s.source_id[:8]}] {s.title} ({s.url})\n"

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are an academic writer. Synthesize the findings into "
                    "a structured research document with sections: Abstract, "
                    "Introduction, Methodology, Findings, Discussion, "
                    "Conclusion, References. Use inline citations [source_id]. "
                    "Write in a clear, evidence-based academic style."
                ),
            },
            {"role": "user", "content": write_prompt},
        ],
    )
    return response.choices[0].message.content

The Research Orchestrator

The orchestrator runs the full research loop, allowing the Scientist to request additional rounds of source gathering and analysis.

async def run_research_lab(
    topic: str, max_rounds: int = 3
) -> ResearchProject:
    project = ResearchProject(topic=topic)

    for round_num in range(1, max_rounds + 1):
        print(f"\n--- Research Round {round_num} ---")

        # Scientist formulates questions
        questions = await scientist_agent(topic, project.findings or None)
        project.questions.extend(questions)

        # Librarian finds sources
        new_sources = await librarian_agent(questions, project.sources)
        project.sources.extend(new_sources)
        print(f"Found {len(new_sources)} new sources")

        # Analyst evaluates evidence
        findings = await analyst_agent(questions, project.sources)
        project.findings.extend(findings)

        # Check if we have sufficient high-confidence findings
        high_confidence = [
            f for f in project.findings if f.confidence >= 0.7
        ]
        if len(high_confidence) >= 5:
            print("Sufficient evidence gathered")
            break

    # Writer produces the final document
    project.draft = await writer_agent(project)
    project.status = "completed"
    return project

FAQ

How do I integrate real source retrieval instead of LLM-generated sources?

Replace the Librarian agent's LLM call with actual API calls to Google Scholar (via SerpAPI), Semantic Scholar, arXiv, or PubMed. Feed the retrieved abstracts and metadata into the Source dataclass. The Analyst then works with real evidence instead of synthesized summaries. You can also combine both: use the LLM to generate search queries, execute them against real APIs, then let the LLM rank and summarize the results.

How does the Scientist decide when research is "done"?

The Scientist evaluates two criteria: coverage (do the findings address all research questions?) and confidence (are the confidence scores above the threshold?). In the orchestrator above, we stop when we have at least 5 high-confidence findings. In production, you would also check that each research question has at least one finding addressing it.

Can I add a Peer Reviewer agent to improve quality?

Absolutely — add a Peer Reviewer between the Analyst and Writer stages. The Peer Reviewer checks findings for logical consistency, flags unsupported claims, and verifies that citations actually support the claims made. If the review fails, loop back to the Scientist with the reviewer's feedback to trigger another research round targeting the weaknesses identified.

#ResearchAgents #MultiAgentLab #KnowledgeManagement #AIPaperGeneration #ResearchAutomation #AgenticAI #PythonAI #AIResearch

Building a Multi-Agent Research Lab: Scientist, Librarian, Analyst, and Writer Agents

The Research Lab Concept

Shared Data Structures

The Scientist Agent

The Librarian Agent

The Analyst Agent

The Writer Agent

The Research Orchestrator

FAQ

How do I integrate real source retrieval instead of LLM-generated sources?

How does the Scientist decide when research is "done"?

Can I add a Peer Reviewer agent to improve quality?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding