Prompt Engineering for AI Agents: System Prompts, Tool Descriptions, and Few-Shot Patterns
Agent-specific prompt engineering techniques: crafting effective system prompts, writing clear tool descriptions for function calling, and few-shot examples that improve complex task performance.
Why Agent Prompts Are Different
Prompt engineering for AI agents is fundamentally different from prompting for single-turn completions. A chat prompt aims to produce a good response to one question. An agent prompt must guide behavior across dozens of turns, tool interactions, edge cases, and error conditions — often running autonomously without human oversight between turns.
The three pillars of agent prompt engineering are: (1) system prompts that define identity, boundaries, and behavioral rules; (2) tool descriptions that enable accurate function calling; and (3) few-shot examples that demonstrate complex reasoning patterns the model cannot reliably discover on its own.
Crafting Effective System Prompts
A system prompt for an agent serves as its operating manual. It must be precise enough to prevent unwanted behavior but flexible enough to handle novel situations. The best system prompts follow a structured format.
The ROLE-RULES-TOOLS-STYLE Framework
SYSTEM_PROMPT_TEMPLATE = """
## ROLE
You are {role_description}.
Your primary objective is {primary_objective}.
You serve {audience_description}.
## RULES
{numbered_rules}
## CONSTRAINTS
- NEVER {hard_constraint_1}
- NEVER {hard_constraint_2}
- ALWAYS {required_behavior_1}
- ALWAYS {required_behavior_2}
## AVAILABLE TOOLS
{tool_summary}
## RESPONSE STYLE
- {style_guideline_1}
- {style_guideline_2}
- {style_guideline_3}
## EXAMPLES OF CORRECT BEHAVIOR
{behavioral_examples}
"""
# Concrete example: Customer service agent
customer_service_prompt = """
## ROLE
You are a customer service agent for CloudSync, a cloud storage
platform. Your primary objective is to resolve customer issues
efficiently while maintaining a positive customer experience.
You serve individual and business customers who contact support
via chat.
## RULES
1. Verify customer identity before accessing any account data.
Ask for their email address and last 4 digits of their
payment method.
2. For billing issues, you may issue refunds up to $50 without
approval. Amounts over $50 require the refund_approval tool.
3. If a customer reports data loss, immediately escalate to the
data recovery team — do not attempt to troubleshoot.
4. For feature requests, log them using the feature_request tool
and thank the customer.
5. If you cannot resolve an issue in 5 exchanges, offer to
escalate to a senior agent.
## CONSTRAINTS
- NEVER share another customer's information
- NEVER promise features or timelines not in the knowledge base
- NEVER attempt to debug server-side infrastructure issues
- ALWAYS confirm destructive actions (account deletion,
data purging) before executing
- ALWAYS end resolved conversations with a satisfaction check
## AVAILABLE TOOLS
- lookup_account: Find customer account by email
- check_subscription: Get current plan and billing details
- issue_refund: Process refunds up to $50
- refund_approval: Request approval for refunds over $50
- create_ticket: Create a support ticket for follow-up
- feature_request: Log a feature request
- escalate: Transfer to senior agent or specialist team
- search_kb: Search the knowledge base for solutions
## RESPONSE STYLE
- Be empathetic but efficient — acknowledge frustration,
then move to resolution
- Use short paragraphs (2-3 sentences max)
- When providing steps, use numbered lists
- Never use corporate jargon — speak plainly
- If the customer is upset, validate their feelings before
problem-solving
"""
Common System Prompt Mistakes
Mistake 1: Vague boundaries. "Be helpful and answer questions" gives the agent no guardrails. Specify exactly what the agent can and cannot do.
Mistake 2: No failure mode instructions. Agents need to know what to do when they cannot help: escalate, ask for clarification, or acknowledge the limitation.
Mistake 3: Conflicting rules. "Always be brief" combined with "Always provide detailed explanations" creates unpredictable behavior. Resolve conflicts explicitly: "Be brief for simple questions; provide detailed explanations for complex troubleshooting."
Mistake 4: Missing tool usage guidance. Listing available tools is not enough. Specify when to use each tool and in what order.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Writing Effective Tool Descriptions
Tool descriptions are the bridge between natural language intent and function execution. When a user says "check if my payment went through," the model must map this to the correct tool with the correct parameters. The quality of your tool descriptions directly determines function calling accuracy.
Anatomy of a Good Tool Description
# BAD tool description
bad_tool = {
"type": "function",
"function": {
"name": "get_data",
"description": "Gets data from the database",
"parameters": {
"type": "object",
"properties": {
"id": {"type": "string"},
"type": {"type": "string"},
},
},
},
}
# GOOD tool description
good_tool = {
"type": "function",
"function": {
"name": "lookup_payment_status",
"description": (
"Check the status of a specific payment transaction. "
"Returns the payment amount, status (pending, completed, "
"failed, refunded), processing date, and payment method. "
"Use this when a customer asks about a specific payment "
"or wants to know if their payment was processed."
),
"parameters": {
"type": "object",
"properties": {
"payment_id": {
"type": "string",
"description": (
"The payment transaction ID, usually "
"starting with 'PAY-' followed by 12 "
"alphanumeric characters. Example: "
"'PAY-A1B2C3D4E5F6'"
),
},
"customer_email": {
"type": "string",
"description": (
"The customer's email address associated "
"with the payment. Used as a fallback "
"lookup if payment_id is not available."
),
},
},
"required": ["payment_id"],
},
},
}
Key Principles for Tool Descriptions
class ToolDescriptionBuilder:
"""Helper to build consistent, high-quality tool descriptions."""
@staticmethod
def build(
name: str,
what_it_does: str,
when_to_use: str,
parameters: dict,
returns: str,
example_input: dict = None,
common_errors: list[str] = None,
) -> dict:
description_parts = [what_it_does]
if when_to_use:
description_parts.append(f"Use when: {when_to_use}")
if returns:
description_parts.append(f"Returns: {returns}")
if common_errors:
description_parts.append(
"Common errors: " + "; ".join(common_errors)
)
if example_input:
import json
description_parts.append(
f"Example input: {json.dumps(example_input)}"
)
return {
"type": "function",
"function": {
"name": name,
"description": " ".join(description_parts),
"parameters": parameters,
},
}
# Usage
cancel_subscription_tool = ToolDescriptionBuilder.build(
name="cancel_subscription",
what_it_does=(
"Cancels a customer's subscription at the end of the "
"current billing period."
),
when_to_use=(
"the customer explicitly requests cancellation AND "
"you have confirmed their identity AND they have "
"confirmed they want to cancel after you have offered "
"a retention discount."
),
parameters={
"type": "object",
"properties": {
"account_id": {
"type": "string",
"description": "Customer account ID from lookup_account",
},
"reason": {
"type": "string",
"enum": [
"too_expensive",
"not_using",
"competitor",
"missing_features",
"poor_support",
"other",
],
"description": "Reason for cancellation",
},
"immediate": {
"type": "boolean",
"description": (
"If true, cancel immediately with prorated refund. "
"If false (default), cancel at end of billing period."
),
},
},
"required": ["account_id", "reason"],
},
returns=(
"Confirmation with cancellation date, any prorated "
"refund amount, and data retention policy"
),
common_errors=[
"ALREADY_CANCELLED: subscription was already cancelled",
"ANNUAL_LOCK: annual plans cannot be cancelled mid-term "
"without early termination fee",
],
)
Few-Shot Examples for Complex Tasks
Few-shot prompting provides concrete examples of desired input-output behavior. For agents, few-shot examples are particularly valuable for demonstrating: (1) multi-step reasoning patterns, (2) correct tool usage sequences, and (3) edge case handling.
FEW_SHOT_EXAMPLES = """
## EXAMPLE INTERACTIONS
### Example 1: Simple account inquiry
User: What plan am I on?
Agent thought: Customer is asking about their subscription.
I need to look up their account first.
Agent action: lookup_account(email="customer@email.com")
Tool result: {account_id: "ACC-123", plan: "Pro", status: "active"}
Agent response: You're on the Pro plan, which is currently active.
Is there anything you'd like to change about your subscription?
### Example 2: Refund request over limit
User: I was charged twice last month. I need a refund of $150.
Agent thought: This is a billing issue. The amount ($150) exceeds
my $50 refund limit, so I need to use refund_approval.
Agent action: lookup_account(email="customer@email.com")
Tool result: {account_id: "ACC-456", plan: "Business"}
Agent action: check_billing_history(
account_id="ACC-456", months=2
)
Tool result: {charges: [{amount: 75, date: "2026-03-01"},
{amount: 75, date: "2026-03-01"}]}
Agent thought: I can see two $75 charges on the same day.
This looks like a duplicate charge. I'll request refund
approval for one of them.
Agent action: refund_approval(
account_id="ACC-456", amount=75,
reason="Duplicate charge on 2026-03-01"
)
Agent response: I can see you were charged $75 twice on
March 1st. I've submitted a refund request for the duplicate
charge of $75. Our billing team will process this within
2-3 business days. You'll receive a confirmation email once
it's completed.
### Example 3: Edge case — unknown issue
User: My files are showing as corrupted when I download them.
Agent thought: This could indicate data corruption, which is a
data loss scenario. Per my rules, I must escalate immediately
to the data recovery team.
Agent action: escalate(
team="data_recovery",
reason="Customer reports file corruption on download",
account_id="ACC-789"
)
Agent response: I'm taking this very seriously. I've immediately
escalated your case to our data recovery specialists who have
the tools to investigate file integrity. They'll reach out to
you within 1 hour. In the meantime, please don't delete any
files — our team will need them for investigation.
"""
Dynamic Few-Shot Selection
For agents that handle diverse tasks, maintaining a library of examples and dynamically selecting the most relevant ones reduces token usage while improving accuracy.
from dataclasses import dataclass
@dataclass
class FewShotExample:
id: str
task_category: str
input_text: str
output_text: str
embedding: list[float] = None
difficulty: str = "medium" # easy, medium, hard
class DynamicFewShotSelector:
"""Selects the most relevant few-shot examples for a query."""
def __init__(self, embeddings_client, example_store):
self.embeddings = embeddings_client
self.store = example_store
async def select(
self,
query: str,
n_examples: int = 3,
diversity_weight: float = 0.3,
) -> list[FewShotExample]:
query_embedding = await self.embeddings.embed(query)
# Retrieve top candidates
candidates = await self.store.query(
embedding=query_embedding,
top_k=n_examples * 3, # over-fetch for diversity
)
# Select diverse subset using MMR
# (Maximal Marginal Relevance)
selected = []
remaining = list(candidates)
for _ in range(n_examples):
if not remaining:
break
best = None
best_score = -float("inf")
for candidate in remaining:
relevance = candidate.get("similarity", 0)
diversity = min(
(
self._embedding_distance(
candidate["embedding"],
s.embedding,
)
for s in selected
),
default=1.0,
)
score = (
(1 - diversity_weight) * relevance
+ diversity_weight * diversity
)
if score > best_score:
best_score = score
best = candidate
if best:
selected.append(FewShotExample(
id=best["id"],
task_category=best["metadata"]["category"],
input_text=best["metadata"]["input"],
output_text=best["metadata"]["output"],
embedding=best["embedding"],
))
remaining.remove(best)
return selected
def _embedding_distance(
self, a: list[float], b: list[float]
) -> float:
if not a or not b:
return 1.0
dot = sum(x * y for x, y in zip(a, b))
norm_a = sum(x ** 2 for x in a) ** 0.5
norm_b = sum(x ** 2 for x in b) ** 0.5
similarity = dot / (norm_a * norm_b) if norm_a and norm_b else 0
return 1 - similarity
def format_examples(
self, examples: list[FewShotExample]
) -> str:
formatted = "## RELEVANT EXAMPLES\n\n"
for i, ex in enumerate(examples, 1):
formatted += (
f"### Example {i} ({ex.task_category})\n"
f"Input: {ex.input_text}\n"
f"Output: {ex.output_text}\n\n"
)
return formatted
Assembling the Complete Agent Prompt
Combining all three elements into a coherent agent prompt:
class AgentPromptBuilder:
"""Assembles system prompt, tools, and few-shot examples."""
def __init__(
self,
system_prompt: str,
tools: list[dict],
few_shot_selector: DynamicFewShotSelector,
):
self.system_prompt = system_prompt
self.tools = tools
self.few_shot = few_shot_selector
async def build(
self,
user_query: str,
conversation_history: list[dict],
user_context: dict = None,
) -> dict:
# Select relevant few-shot examples
examples = await self.few_shot.select(
query=user_query, n_examples=2
)
examples_text = self.few_shot.format_examples(examples)
# Build context-aware system prompt
context_additions = ""
if user_context:
context_additions = (
f"\n## CURRENT USER CONTEXT\n"
f"- Name: {user_context.get('name', 'Unknown')}\n"
f"- Account: {user_context.get('account_id', 'Not verified')}\n"
f"- Plan: {user_context.get('plan', 'Unknown')}\n"
)
full_system = (
self.system_prompt
+ context_additions
+ "\n"
+ examples_text
)
messages = [
{"role": "system", "content": full_system},
*conversation_history,
{"role": "user", "content": user_query},
]
return {
"messages": messages,
"tools": self.tools,
"tool_choice": "auto",
}
FAQ
How long should an agent system prompt be?
Most effective agent system prompts are 500-1500 tokens. Below 500, you lack sufficient detail for consistent behavior. Above 1500, the model starts ignoring parts of the prompt (especially middle sections). If you need more than 1500 tokens, move behavioral examples and edge case handling into few-shot examples rather than cramming them into the system prompt. The system prompt should contain identity, core rules, and constraints. Everything else goes into examples or conversation context.
Should tool descriptions include examples of when NOT to use the tool?
Yes, especially for tools with similar capabilities. If you have both "issue_refund" (for quick refunds up to $50) and "refund_approval" (for larger amounts), explicitly stating "Do NOT use issue_refund for amounts over $50" in the tool description prevents misuse. Negative examples reduce tool confusion by 20-30% based on production data from function-calling deployments.
How many few-shot examples should I include?
Two to three examples provide the best balance between accuracy improvement and token cost. One example is often insufficient for the model to generalize the pattern. Four or more examples show diminishing returns and consume significant context. For diverse tasks, use dynamic few-shot selection to ensure the examples are relevant to the current query rather than using a fixed set.
Do I need different prompts for different LLM providers?
Yes, prompt effectiveness varies between models. Claude models respond well to structured XML-style formatting and explicit rules. GPT-4 class models prefer natural language instructions with markdown formatting. Open-source models like Llama often need more explicit formatting instructions and more examples. The core content should be the same, but the presentation format should be adapted to each model's strengths. Maintain a prompt template per model family and run A/B tests to optimize.
#PromptEngineering #SystemPrompts #ToolDescriptions #FewShot #AIAgents #FunctionCalling
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.