Building a Research Agent with Claude: Web Search, Analysis, and Report Generation
Build a complete research agent that searches the web, evaluates sources, synthesizes findings, and generates structured reports using Claude and the Anthropic SDK.
What a Research Agent Does
A research agent automates the cycle that human researchers follow: formulate questions, search for information, evaluate source credibility, extract key findings, and synthesize everything into a coherent report. With Claude's large context window and reasoning capabilities, you can build agents that handle this entire pipeline — from raw web search results to polished analysis.
The agent we will build uses three tools: web search, page content extraction, and report formatting. Claude orchestrates them in a loop, deciding when to search for more information and when it has enough to write the final report.
Architecture Overview
The research agent follows a plan-search-synthesize pattern:
- Planning: Claude breaks the research question into sub-queries
- Searching: The agent searches the web for each sub-query
- Extraction: Relevant pages are fetched and key content is extracted
- Synthesis: Claude analyzes all gathered information and produces a report
Defining the Research Tools
import anthropic
import json
import requests
client = anthropic.Anthropic()
tools = [
{
"name": "web_search",
"description": "Search the web for information. Returns a list of results with titles, URLs, and snippets. Use specific, targeted queries for best results.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query string"
},
"num_results": {
"type": "integer",
"description": "Number of results to return (1-10)",
"default": 5
}
},
"required": ["query"]
}
},
{
"name": "fetch_page",
"description": "Fetch and extract the main text content from a URL. Use this to get details from a search result.",
"input_schema": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The URL to fetch"
}
},
"required": ["url"]
}
},
{
"name": "save_report",
"description": "Save the final research report to a file. Call this when research is complete and the report is written.",
"input_schema": {
"type": "object",
"properties": {
"title": {"type": "string", "description": "Report title"},
"content": {"type": "string", "description": "Full report in markdown"},
"sources": {
"type": "array",
"items": {"type": "string"},
"description": "List of source URLs used"
}
},
"required": ["title", "content", "sources"]
}
}
]
Implementing Tool Execution
Each tool maps to a real function. For web search, you can use any search API — here we use a generic pattern:
def execute_tool(name: str, inputs: dict) -> dict:
if name == "web_search":
return perform_web_search(inputs["query"], inputs.get("num_results", 5))
elif name == "fetch_page":
return fetch_page_content(inputs["url"])
elif name == "save_report":
return save_research_report(inputs)
return {"error": f"Unknown tool: {name}"}
def perform_web_search(query: str, num_results: int) -> dict:
# Replace with your preferred search API (Brave, Serper, SerpAPI)
api_key = os.environ["SEARCH_API_KEY"]
response = requests.get(
"https://api.search.brave.com/res/v1/web/search",
headers={"X-Subscription-Token": api_key},
params={"q": query, "count": num_results},
)
results = response.json().get("web", {}).get("results", [])
return {
"results": [
{"title": r["title"], "url": r["url"], "snippet": r.get("description", "")}
for r in results
]
}
def fetch_page_content(url: str) -> dict:
try:
resp = requests.get(url, timeout=10, headers={"User-Agent": "ResearchBot/1.0"})
# In production, use readability or trafilatura for text extraction
from trafilatura import extract
text = extract(resp.text) or ""
return {"content": text[:8000], "url": url} # Truncate to manage tokens
except Exception as e:
return {"error": str(e), "url": url}
def save_research_report(data: dict) -> dict:
filename = data["title"].lower().replace(" ", "_")[:50] + ".md"
with open(filename, "w") as f:
f.write(f"# {data['title']}\n\n")
f.write(data["content"])
f.write("\n\n## Sources\n\n")
for url in data["sources"]:
f.write(f"- {url}\n")
return {"saved": filename, "word_count": len(data["content"].split())}
The Research Agent Loop
The agent loop runs until Claude either saves a report or exhausts its research budget:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
def run_research_agent(topic: str, max_turns: int = 20) -> str:
system = """You are a thorough research agent. Given a topic:
1. Break it into 3-5 specific sub-questions
2. Search for each sub-question
3. Fetch the most relevant pages for detailed information
4. Synthesize findings into a comprehensive report
5. Save the report using the save_report tool
Always cite your sources. Prioritize recent, authoritative sources."""
messages = [{"role": "user", "content": f"Research this topic: {topic}"}]
for turn in range(max_turns):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=system,
tools=tools,
messages=messages,
)
if response.stop_reason == "end_turn":
return response.content[0].text
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result),
})
messages.append({"role": "user", "content": tool_results})
return "Research agent reached maximum turns without completing."
Source Evaluation Strategy
Strong research agents do not just collect information — they evaluate it. Add instructions that guide Claude to assess source quality:
system_with_evaluation = """When evaluating sources, consider:
- Domain authority (academic, government, established media vs blogs)
- Publication date (prefer sources from the last 12 months)
- Author credentials (named experts vs anonymous content)
- Corroboration (do multiple independent sources agree?)
If sources conflict, note the disagreement and explain which
position has stronger evidence."""
Claude will naturally apply these criteria when writing the synthesis, noting where sources agree and where they diverge.
FAQ
How do I prevent the agent from searching endlessly?
Set a max_turns limit as shown in the code above. You can also add a searches_remaining counter in your system prompt and decrement it with each search call. Another approach is to track total tokens used and stop when approaching a budget threshold.
What search APIs work best with research agents?
Brave Search API and Serper.dev both provide reliable, affordable web search. For academic research, consider Google Scholar via SerpAPI. The choice depends on your use case — Brave is best for general web content, while specialized APIs work better for niche domains like medical or legal research.
How do I handle rate limits during intensive research?
Implement exponential backoff in your perform_web_search function and add a short delay between consecutive searches. For Claude API rate limits, catch anthropic.RateLimitError and retry with backoff. The Anthropic SDK has built-in retry logic that handles transient errors automatically.
#Claude #ResearchAgent #WebSearch #ReportGeneration #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.