Skip to content
Learn Agentic AI13 min read0 views

AI Research Agent: Automated Literature Search and Summary Generation

Build an AI research agent that searches academic papers via the Semantic Scholar API, summarizes key findings, manages citations, and synthesizes insights across multiple sources into a coherent literature review.

The Research Bottleneck

A typical literature review involves searching multiple databases, skimming dozens of abstracts, reading a handful of full papers, extracting key claims, and weaving them into a coherent narrative. A single researcher might spend two to three weeks on this process. An AI research agent compresses the search-and-summarize loop from days to minutes while the human focuses on critical evaluation and synthesis decisions.

The agent we build here uses the Semantic Scholar API for paper discovery, LLM-powered summarization for each paper, and a synthesis step that identifies themes and contradictions across the collected literature.

Paper Search Tool

Semantic Scholar provides a free API that returns paper metadata, abstracts, citation counts, and more. The search tool wraps this API:

import httpx
from agents import Agent, Runner, function_tool

S2_BASE = "https://api.semanticscholar.org/graph/v1"

@function_tool
async def search_papers(query: str, limit: int = 10) -> str:
    """Search Semantic Scholar for papers matching a query.
    Returns titles, authors, year, citation count, and abstracts."""
    params = {
        "query": query,
        "limit": limit,
        "fields": "title,authors,year,abstract,citationCount,externalIds",
    }
    async with httpx.AsyncClient() as client:
        resp = await client.get(f"{S2_BASE}/paper/search", params=params)
        resp.raise_for_status()

    papers = resp.json().get("data", [])
    results = []
    for p in papers:
        authors = ", ".join(a["name"] for a in (p.get("authors") or [])[:3])
        doi = (p.get("externalIds") or {}).get("DOI", "N/A")
        abstract = (p.get("abstract") or "No abstract available.")[:500]
        results.append(
            f"Title: {p['title']}\n"
            f"Authors: {authors}\n"
            f"Year: {p.get('year', 'N/A')} | Citations: {p.get('citationCount', 0)}\n"
            f"DOI: {doi}\n"
            f"Abstract: {abstract}\n"
        )
    return "\n---\n".join(results) if results else "No papers found."

Citation Management Tool

Keeping track of references is essential. This tool stores papers the agent decides are relevant and outputs formatted citations:

_citation_store: list[dict] = []

@function_tool
def save_citation(title: str, authors: str, year: str, doi: str) -> str:
    """Save a paper to the citation list for the final bibliography."""
    entry = {"title": title, "authors": authors, "year": year, "doi": doi}
    _citation_store.append(entry)
    return f"Saved. Total citations: {len(_citation_store)}"

@function_tool
def get_bibliography() -> str:
    """Return all saved citations in APA-like format."""
    if not _citation_store:
        return "No citations saved yet."
    lines = []
    for i, c in enumerate(_citation_store, 1):
        lines.append(f"[{i}] {c['authors']} ({c['year']}). {c['title']}. DOI: {c['doi']}")
    return "\n".join(lines)

Assembling the Research Agent

The agent needs instructions that define a clear research workflow — search, filter, summarize, synthesize:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

research_agent = Agent(
    name="Research Agent",
    instructions="""You are an academic research agent. When given a research topic:
1. Use search_papers to find the 10 most relevant papers.
2. Evaluate each abstract for relevance. Discard papers that do not directly
   address the topic.
3. For each relevant paper, save_citation to build the bibliography.
4. Summarize each relevant paper in 2-3 sentences focusing on methodology
   and key findings.
5. After reviewing all papers, write a synthesis section that identifies
   common themes, conflicting results, and open questions.
6. End with the full bibliography from get_bibliography.""",
    tools=[search_papers, save_citation, get_bibliography],
)
import asyncio

async def main():
    result = await Runner.run(
        research_agent,
        "Survey the recent literature on retrieval-augmented generation "
        "for question answering systems. Focus on papers from 2024-2026.",
    )
    print(result.final_output)

asyncio.run(main())

The agent searches for RAG papers, filters by relevance and recency, saves citations for the strongest matches, summarizes each one, and produces a synthesis section identifying trends like the shift from sparse to dense retrieval and the emergence of hybrid chunking strategies.

Enhancing with Full-Text Analysis

Abstracts only tell part of the story. For deeper analysis, add a tool that fetches full paper text via open-access repositories:

@function_tool
async def fetch_paper_text(doi: str) -> str:
    """Fetch the full text of an open-access paper via Unpaywall."""
    async with httpx.AsyncClient() as client:
        resp = await client.get(
            f"https://api.unpaywall.org/v2/{doi}",
            params={"email": "your@email.com"},
        )
        if resp.status_code != 200:
            return "Paper not available open-access."
        data = resp.json()
        oa_url = data.get("best_oa_location", {}).get("url_for_pdf")
        if not oa_url:
            return "No open-access PDF URL found."
        return f"Full text available at: {oa_url}"

Handling Rate Limits and Errors

Academic APIs enforce rate limits. Wrap HTTP calls with exponential backoff:

import asyncio

async def resilient_get(client, url, params, max_retries=3):
    for attempt in range(max_retries):
        resp = await client.get(url, params=params)
        if resp.status_code == 429:
            wait = 2 ** attempt
            await asyncio.sleep(wait)
            continue
        resp.raise_for_status()
        return resp
    raise Exception("Max retries exceeded")

FAQ

Can this agent access papers behind paywalls?

No. The agent uses public APIs and open-access repositories. For paywalled content, you would need institutional access or an API key from a licensed database like IEEE Xplore or PubMed Central.

How accurate are the LLM-generated summaries?

LLM summaries of abstracts are generally reliable for capturing high-level findings. However, they can miss nuances in methodology sections. Always have a domain expert review the synthesis before using it in a formal publication.

How do I focus the search on a specific time range?

Add a year filter to the Semantic Scholar API request by appending &year=2024-2026 to the query parameters. You can also instruct the agent to discard papers outside the target date range during the filtering step.


#Research #LiteratureReview #SemanticScholar #Summarization #AIAgents #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.