Building Your First Agentic AI App: From Zero to Production
Beginner-friendly walkthrough of building a complete agentic AI app — from project setup and agent creation to testing and deployment. Progressive complexity.
What We Are Building
In this tutorial, you will build a complete agentic AI application: a Research Assistant that can search the web, summarize documents, save notes, and answer follow-up questions using its saved context. By the end, you will have a working application deployed behind a FastAPI server with a clean API.
This guide is designed for developers who are comfortable with Python but new to agentic AI. We start simple and add complexity progressively. Every step produces working code you can test.
Prerequisites
- Python 3.11 or later
- An Anthropic API key (or OpenAI API key — we will show both)
- Basic familiarity with REST APIs and async Python
- Docker installed (for the database in later steps)
Step 1: Project Setup
Create a new project and install dependencies:
mkdir research-agent && cd research-agent
python -m venv venv
source venv/bin/activate
pip install anthropic openai fastapi uvicorn python-dotenv pydantic
Create your project structure:
mkdir -p app/{agents,tools,models}
touch app/__init__.py app/main.py
touch app/agents/__init__.py app/agents/research.py
touch app/tools/__init__.py app/tools/web_search.py app/tools/notes.py
touch app/models/__init__.py app/models/schemas.py
touch .env .env.example .gitignore
Add your API key to .env:
ANTHROPIC_API_KEY=sk-ant-your-key-here
APP_ENV=development
And configure .gitignore:
.env
__pycache__/
venv/
.pytest_cache/
Step 2: Define Your Data Models
Start by defining the data structures your application will use. Clear models prevent bugs and make your API self-documenting:
# app/models/schemas.py
from pydantic import BaseModel, Field
from datetime import datetime
class ChatMessage(BaseModel):
role: str = Field(
..., description="Message role: user or assistant"
)
content: str = Field(..., description="Message content")
class ChatRequest(BaseModel):
message: str = Field(
..., description="User message to the agent"
)
conversation_id: str | None = Field(
None, description="ID to continue an existing conversation"
)
class ChatResponse(BaseModel):
response: str = Field(..., description="Agent response")
conversation_id: str = Field(
..., description="Conversation ID for follow-ups"
)
tools_used: list[str] = Field(
default_factory=list,
description="Tools the agent used in this turn",
)
class Note(BaseModel):
title: str
content: str
created_at: datetime = Field(
default_factory=datetime.utcnow
)
Step 3: Build the Tools
Tools give your agent the ability to interact with the outside world. We will build two tools: web search and note management.
Web Search Tool
For this tutorial, we use a simple web search implementation. In production, you would use an API like Tavily, Brave Search, or SerpAPI:
# app/tools/web_search.py
import json
from urllib.request import urlopen, Request
from urllib.parse import quote_plus
def web_search(query: str, max_results: int = 5) -> str:
"""Search the web for information on a topic.
Returns a JSON string with search results including
title, snippet, and URL for each result.
"""
# In production, replace with Tavily or Brave Search API
# This is a simplified mock for tutorial purposes
mock_results = [
{
"title": f"Result about {query}",
"snippet": (
f"Comprehensive information about {query}. "
"This source covers key concepts, recent "
"developments, and practical applications."
),
"url": f"https://example.com/{quote_plus(query)}",
}
]
return json.dumps({
"query": query,
"results": mock_results[:max_results],
})
Notes Tool
The notes tool lets the agent save and retrieve information across conversation turns:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
# app/tools/notes.py
import json
from datetime import datetime
# In-memory storage (use a database in production)
_notes: dict[str, dict] = {}
def save_note(title: str, content: str) -> str:
"""Save a research note for later reference.
Use this when you find important information that
the user might need later.
"""
note_id = f"note-{len(_notes) + 1}"
_notes[note_id] = {
"id": note_id,
"title": title,
"content": content,
"created_at": datetime.utcnow().isoformat(),
}
return json.dumps({
"status": "saved",
"note_id": note_id,
"title": title,
})
def list_notes() -> str:
"""List all saved research notes.
Use this to check what information has already been
saved before doing redundant searches.
"""
if not _notes:
return json.dumps({"notes": [], "count": 0})
summaries = []
for note in _notes.values():
summaries.append({
"id": note["id"],
"title": note["title"],
"created_at": note["created_at"],
})
return json.dumps({"notes": summaries, "count": len(summaries)})
def get_note(note_id: str) -> str:
"""Retrieve a specific saved note by its ID."""
note = _notes.get(note_id)
if note:
return json.dumps(note)
return json.dumps({"error": f"Note {note_id} not found"})
Step 4: Create the Agent
Now build the agent that uses these tools. This is the core of your application:
# app/agents/research.py
import json
from anthropic import Anthropic
from dotenv import load_dotenv
load_dotenv()
client = Anthropic()
SYSTEM_PROMPT = """You are a research assistant. Your job is to
help users research topics by searching the web, summarizing
findings, and saving important notes for future reference.
Guidelines:
- Always search before answering factual questions
- Save key findings as notes so they persist across the
conversation
- Check existing notes before searching for something you
may have already found
- Cite your sources when providing information
- If you are unsure about something, say so clearly
- Be concise but thorough"""
TOOLS = [
{
"name": "web_search",
"description": (
"Search the web for information. Use this for any "
"factual question or research task."
),
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query",
},
"max_results": {
"type": "integer",
"description": "Number of results (default 5)",
"default": 5,
},
},
"required": ["query"],
},
},
{
"name": "save_note",
"description": (
"Save a research note. Use when you find important "
"information the user might need later."
),
"input_schema": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Brief note title",
},
"content": {
"type": "string",
"description": "Detailed note content",
},
},
"required": ["title", "content"],
},
},
{
"name": "list_notes",
"description": "List all saved notes.",
"input_schema": {
"type": "object",
"properties": {},
},
},
{
"name": "get_note",
"description": "Retrieve a specific note by ID.",
"input_schema": {
"type": "object",
"properties": {
"note_id": {
"type": "string",
"description": "The note ID to retrieve",
},
},
"required": ["note_id"],
},
},
]
Now add the agent loop function to the same file:
# Add to app/agents/research.py
from app.tools.web_search import web_search
from app.tools.notes import save_note, list_notes, get_note
TOOL_FUNCTIONS = {
"web_search": web_search,
"save_note": save_note,
"list_notes": list_notes,
"get_note": get_note,
}
def execute_tool(name: str, inputs: dict) -> str:
"""Execute a tool by name with the given inputs."""
func = TOOL_FUNCTIONS.get(name)
if not func:
return json.dumps({"error": f"Unknown tool: {name}"})
try:
return func(**inputs)
except Exception as e:
return json.dumps({"error": str(e)})
def run_research_agent(
user_message: str,
conversation_history: list[dict] | None = None,
max_iterations: int = 8,
) -> tuple[str, list[dict], list[str]]:
"""Run the research agent and return response, updated
history, and list of tools used.
"""
messages = conversation_history or []
messages.append({"role": "user", "content": user_message})
tools_used = []
for _ in range(max_iterations):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=SYSTEM_PROMPT,
tools=TOOLS,
messages=messages,
)
if response.stop_reason == "tool_use":
messages.append({
"role": "assistant",
"content": response.content,
})
tool_results = []
for block in response.content:
if block.type == "tool_use":
tools_used.append(block.name)
result = execute_tool(
block.name, block.input
)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
messages.append({
"role": "user",
"content": tool_results,
})
else:
final_text = ""
for block in response.content:
if hasattr(block, "text"):
final_text += block.text
messages.append({
"role": "assistant",
"content": final_text,
})
return final_text, messages, tools_used
return "Max iterations reached.", messages, tools_used
Step 5: Build the API Layer
Wrap the agent in a FastAPI server with proper request handling:
# app/main.py
import uuid
from fastapi import FastAPI, HTTPException
from app.models.schemas import ChatRequest, ChatResponse
from app.agents.research import run_research_agent
app = FastAPI(
title="Research Agent API",
description="An AI research assistant that can search, "
"summarize, and save notes.",
version="1.0.0",
)
# In-memory conversation storage
# Use Redis or PostgreSQL in production
conversations: dict[str, list[dict]] = {}
@app.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
"""Send a message to the research agent."""
conversation_id = request.conversation_id or str(uuid.uuid4())
history = conversations.get(conversation_id, [])
try:
response, updated_history, tools_used = (
run_research_agent(request.message, history)
)
conversations[conversation_id] = updated_history
return ChatResponse(
response=response,
conversation_id=conversation_id,
tools_used=tools_used,
)
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Agent error: {str(e)}",
)
@app.get("/health")
async def health():
return {"status": "healthy"}
Run the server:
uvicorn app.main:app --reload --port 8000
Test it:
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message": "Research the latest advances in agentic AI"}'
Step 6: Add Conversation Persistence
For production, replace the in-memory conversation storage with a database. Here is a minimal PostgreSQL integration:
# app/core/database.py
import json
import asyncpg
import os
DATABASE_URL = os.getenv("DATABASE_URL")
async def get_pool():
return await asyncpg.create_pool(DATABASE_URL)
async def save_conversation(
pool, conversation_id: str, messages: list
):
await pool.execute(
"""INSERT INTO conversations (id, messages, updated_at)
VALUES ($1, $2, NOW())
ON CONFLICT (id) DO UPDATE SET
messages = $2, updated_at = NOW()""",
conversation_id,
json.dumps(messages),
)
async def load_conversation(
pool, conversation_id: str
) -> list | None:
row = await pool.fetchrow(
"SELECT messages FROM conversations WHERE id = $1",
conversation_id,
)
if row:
return json.loads(row["messages"])
return None
Step 7: Deploy to Production
Create a Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY pyproject.toml .
RUN pip install --no-cache-dir .
COPY app/ app/
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Build and Run
docker build -t research-agent .
docker run -p 8000:8000 --env-file .env research-agent
For production deployments, add:
- A reverse proxy (nginx or Caddy) for TLS termination
- Health check endpoints for container orchestration
- Structured logging with correlation IDs
- Rate limiting to prevent API abuse
- Authentication to protect the endpoint
At CallSphere, we run similar agent backends behind FastAPI with Kubernetes, using horizontal pod autoscaling to handle variable load across our production agent deployments.
What to Build Next
Once your basic agent is working, enhance it progressively:
- Add streaming responses — Use Claude's streaming API with Server-Sent Events to send tokens to the frontend as they are generated
- Implement real web search — Replace the mock with Tavily or Brave Search API
- Add a frontend — Build a simple React or Next.js chat UI
- Implement authentication — Add API key or JWT authentication
- Add observability — Integrate LangSmith or Arize Phoenix for tracing
- Write tests — Unit test your tools and integration test the agent loop
Frequently Asked Questions
How much does it cost to run this agent in production?
For a research agent using Claude 3.5 Sonnet, each conversation turn costs approximately 0.01 to 0.05 USD depending on context length and tool usage. A system handling 1,000 conversations per day would cost roughly 300 to 1,500 USD per month in LLM API fees. The biggest cost driver is conversation length — longer histories mean more input tokens per request. Implement conversation summarization for long sessions to control costs.
Can I use OpenAI instead of Anthropic?
Yes. The agent loop pattern is identical — send messages, check for tool calls, execute tools, feed results back. Replace the Anthropic client with the OpenAI client, adjust the message format slightly (OpenAI uses a tool_calls field on the assistant message rather than tool_use content blocks), and change the model name. The tool definitions, business logic, and API layer remain the same.
How do I handle slow tool executions?
Some tools (web search, database queries on large datasets) can take several seconds. Two strategies: (1) Set timeouts on tool execution and return an error if the tool takes too long, letting the agent decide how to proceed. (2) For truly slow operations, return a "processing" status immediately and implement a polling mechanism where the agent checks back for results. For the best user experience, stream the agent's text output while tools execute in the background.
What is the best way to test this agent?
Test at three levels. First, unit test each tool function independently with expected inputs and edge cases. Second, create a set of 20-30 representative user messages and run them through the agent, checking that the correct tools are called with reasonable arguments. Third, write end-to-end tests that send HTTP requests to the API and verify the response structure. Save conversation fixtures from real usage for regression testing.
How do I prevent the agent from hallucinating?
Three key techniques: (1) Always make the agent search before answering factual questions — never let it answer from memory alone. (2) Include a "cite your sources" instruction in the system prompt so the agent ties answers to retrieved information. (3) Implement an output guardrail that checks for unsupported claims by comparing the response against the tool results from that turn. No method is foolproof, but layering these approaches significantly reduces hallucination rates.
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.