AI Research Agent: Automated Literature Search and Summary Generation
Build an AI research agent that searches academic papers via the Semantic Scholar API, summarizes key findings, manages citations, and synthesizes insights across multiple sources into a coherent literature review.
The Research Bottleneck
A typical literature review involves searching multiple databases, skimming dozens of abstracts, reading a handful of full papers, extracting key claims, and weaving them into a coherent narrative. A single researcher might spend two to three weeks on this process. An AI research agent compresses the search-and-summarize loop from days to minutes while the human focuses on critical evaluation and synthesis decisions.
The agent we build here uses the Semantic Scholar API for paper discovery, LLM-powered summarization for each paper, and a synthesis step that identifies themes and contradictions across the collected literature.
Paper Search Tool
Semantic Scholar provides a free API that returns paper metadata, abstracts, citation counts, and more. The search tool wraps this API:
import httpx
from agents import Agent, Runner, function_tool
S2_BASE = "https://api.semanticscholar.org/graph/v1"
@function_tool
async def search_papers(query: str, limit: int = 10) -> str:
"""Search Semantic Scholar for papers matching a query.
Returns titles, authors, year, citation count, and abstracts."""
params = {
"query": query,
"limit": limit,
"fields": "title,authors,year,abstract,citationCount,externalIds",
}
async with httpx.AsyncClient() as client:
resp = await client.get(f"{S2_BASE}/paper/search", params=params)
resp.raise_for_status()
papers = resp.json().get("data", [])
results = []
for p in papers:
authors = ", ".join(a["name"] for a in (p.get("authors") or [])[:3])
doi = (p.get("externalIds") or {}).get("DOI", "N/A")
abstract = (p.get("abstract") or "No abstract available.")[:500]
results.append(
f"Title: {p['title']}\n"
f"Authors: {authors}\n"
f"Year: {p.get('year', 'N/A')} | Citations: {p.get('citationCount', 0)}\n"
f"DOI: {doi}\n"
f"Abstract: {abstract}\n"
)
return "\n---\n".join(results) if results else "No papers found."
Citation Management Tool
Keeping track of references is essential. This tool stores papers the agent decides are relevant and outputs formatted citations:
_citation_store: list[dict] = []
@function_tool
def save_citation(title: str, authors: str, year: str, doi: str) -> str:
"""Save a paper to the citation list for the final bibliography."""
entry = {"title": title, "authors": authors, "year": year, "doi": doi}
_citation_store.append(entry)
return f"Saved. Total citations: {len(_citation_store)}"
@function_tool
def get_bibliography() -> str:
"""Return all saved citations in APA-like format."""
if not _citation_store:
return "No citations saved yet."
lines = []
for i, c in enumerate(_citation_store, 1):
lines.append(f"[{i}] {c['authors']} ({c['year']}). {c['title']}. DOI: {c['doi']}")
return "\n".join(lines)
Assembling the Research Agent
The agent needs instructions that define a clear research workflow — search, filter, summarize, synthesize:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
research_agent = Agent(
name="Research Agent",
instructions="""You are an academic research agent. When given a research topic:
1. Use search_papers to find the 10 most relevant papers.
2. Evaluate each abstract for relevance. Discard papers that do not directly
address the topic.
3. For each relevant paper, save_citation to build the bibliography.
4. Summarize each relevant paper in 2-3 sentences focusing on methodology
and key findings.
5. After reviewing all papers, write a synthesis section that identifies
common themes, conflicting results, and open questions.
6. End with the full bibliography from get_bibliography.""",
tools=[search_papers, save_citation, get_bibliography],
)
Running a Literature Search
import asyncio
async def main():
result = await Runner.run(
research_agent,
"Survey the recent literature on retrieval-augmented generation "
"for question answering systems. Focus on papers from 2024-2026.",
)
print(result.final_output)
asyncio.run(main())
The agent searches for RAG papers, filters by relevance and recency, saves citations for the strongest matches, summarizes each one, and produces a synthesis section identifying trends like the shift from sparse to dense retrieval and the emergence of hybrid chunking strategies.
Enhancing with Full-Text Analysis
Abstracts only tell part of the story. For deeper analysis, add a tool that fetches full paper text via open-access repositories:
@function_tool
async def fetch_paper_text(doi: str) -> str:
"""Fetch the full text of an open-access paper via Unpaywall."""
async with httpx.AsyncClient() as client:
resp = await client.get(
f"https://api.unpaywall.org/v2/{doi}",
params={"email": "your@email.com"},
)
if resp.status_code != 200:
return "Paper not available open-access."
data = resp.json()
oa_url = data.get("best_oa_location", {}).get("url_for_pdf")
if not oa_url:
return "No open-access PDF URL found."
return f"Full text available at: {oa_url}"
Handling Rate Limits and Errors
Academic APIs enforce rate limits. Wrap HTTP calls with exponential backoff:
import asyncio
async def resilient_get(client, url, params, max_retries=3):
for attempt in range(max_retries):
resp = await client.get(url, params=params)
if resp.status_code == 429:
wait = 2 ** attempt
await asyncio.sleep(wait)
continue
resp.raise_for_status()
return resp
raise Exception("Max retries exceeded")
FAQ
Can this agent access papers behind paywalls?
No. The agent uses public APIs and open-access repositories. For paywalled content, you would need institutional access or an API key from a licensed database like IEEE Xplore or PubMed Central.
How accurate are the LLM-generated summaries?
LLM summaries of abstracts are generally reliable for capturing high-level findings. However, they can miss nuances in methodology sections. Always have a domain expert review the synthesis before using it in a formal publication.
How do I focus the search on a specific time range?
Add a year filter to the Semantic Scholar API request by appending &year=2024-2026 to the query parameters. You can also instruct the agent to discard papers outside the target date range during the filtering step.
#Research #LiteratureReview #SemanticScholar #Summarization #AIAgents #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.