Tool Use in AI Agents: Extending LLM Capabilities with External Functions
Master the design and implementation of tools for AI agents — why tools matter, how to write effective tool descriptions, execution flow, error handling, and best practices for production tool systems.
Why Tools Are the Bridge Between Thinking and Doing
An LLM without tools is a brain without hands. It can reason, analyze, and generate text — but it cannot check the weather, query a database, send an email, or read a file. Tools are what turn a language model from a conversationalist into an agent that can affect the real world.
Tool use (also called function calling) is the mechanism by which an LLM requests the execution of an external function. The model does not run the function itself — it generates a structured request (function name + arguments), your code executes it, and the result is fed back into the model's context.
The Tool Execution Flow
Understanding the exact flow of a tool call is essential for debugging and designing reliable agents.
1. LLM receives messages + tool definitions
2. LLM decides to call a tool (instead of responding with text)
3. LLM outputs: {"tool": "search_db", "args": {"query": "overdue invoices"}}
4. Your code intercepts this, executes search_db(query="overdue invoices")
5. Your code appends the result as a tool message
6. LLM receives the result and decides what to do next
7. Repeat until LLM responds with text (no tool call)
The critical insight is that the LLM never executes anything. It only generates the intent to use a tool. Your application code is the executor, which means you have full control over permissions, validation, and error handling.
Designing Effective Tools
Tool quality directly determines agent quality. A poorly designed tool confuses the LLM and leads to wrong arguments, unnecessary calls, or missed opportunities to use the right tool.
Good Tool Design Principles
# GOOD: Clear name, specific description, well-typed parameters
{
"type": "function",
"function": {
"name": "search_invoices",
"description": (
"Search for invoices by status, client name, or date range. "
"Returns up to 20 matching invoices with amount, status, and due date."
),
"parameters": {
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": ["paid", "overdue", "pending", "cancelled"],
"description": "Filter by invoice status",
},
"client_name": {
"type": "string",
"description": "Partial or full client name to search for",
},
"due_before": {
"type": "string",
"description": "ISO date string. Return invoices due before this date.",
},
},
},
},
}
# BAD: Vague name, no description, untyped parameters
{
"type": "function",
"function": {
"name": "search",
"description": "Search for stuff",
"parameters": {
"type": "object",
"properties": {
"q": {"type": "string"},
},
},
},
}
The description is the most important field. The LLM reads it to decide when and how to use the tool. Write descriptions as if you were explaining the tool to a new team member — be specific about what it does, what it returns, and any limitations.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Building a Tool Registry
In production, you need a systematic way to register, discover, and execute tools. Here is a clean pattern:
from typing import Callable, Any
import json
import inspect
class ToolRegistry:
def __init__(self):
self._tools: dict[str, dict] = {}
self._executors: dict[str, Callable] = {}
def register(self, func: Callable, description: str, parameters: dict):
name = func.__name__
self._tools[name] = {
"type": "function",
"function": {
"name": name,
"description": description,
"parameters": parameters,
},
}
self._executors[name] = func
def get_tool_definitions(self) -> list[dict]:
return list(self._tools.values())
def execute(self, name: str, arguments: dict) -> Any:
if name not in self._executors:
return {"error": f"Unknown tool: {name}"}
try:
return self._executors[name](**arguments)
except Exception as e:
return {"error": f"Tool execution failed: {str(e)}"}
# Usage
registry = ToolRegistry()
def get_weather(city: str, units: str = "celsius") -> dict:
# In production, call a real weather API
return {"city": city, "temperature": 22, "units": units, "condition": "sunny"}
registry.register(
func=get_weather,
description="Get current weather for a city. Returns temperature and conditions.",
parameters={
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units (default: celsius)",
},
},
"required": ["city"],
},
)
Error Handling in Tool Execution
Tools fail. APIs time out, databases go down, users pass invalid arguments. How you handle tool errors determines whether your agent recovers gracefully or spirals into confusion.
def safe_execute_tool(registry: ToolRegistry, name: str, raw_args: str) -> str:
"""Execute a tool with comprehensive error handling."""
# Parse arguments
try:
arguments = json.loads(raw_args)
except json.JSONDecodeError as e:
return json.dumps({
"error": "Invalid arguments format",
"details": str(e),
"suggestion": "Please provide valid JSON arguments",
})
# Execute with timeout protection
try:
result = registry.execute(name, arguments)
return json.dumps(result, default=str)
except TimeoutError:
return json.dumps({
"error": f"Tool '{name}' timed out",
"suggestion": "Try again with a simpler query or different parameters",
})
except Exception as e:
return json.dumps({
"error": f"Tool '{name}' failed: {str(e)}",
"suggestion": "Check the arguments and try again",
})
The key insight is to always return structured error messages to the LLM, not raw exceptions. Include a suggestion field — it guides the LLM toward recovery instead of just repeating the same failing call.
Tool Permissions and Safety
Not all tools should be available to all agents. A customer-facing agent should not have access to delete_database. Implement tool-level permissions:
class PermissionedToolRegistry(ToolRegistry):
def __init__(self):
super().__init__()
self._permissions: dict[str, str] = {} # tool_name -> permission level
def register(self, func, description, parameters, permission="read"):
super().register(func, description, parameters)
self._permissions[func.__name__] = permission
def get_tools_for_level(self, level: str) -> list[dict]:
levels = {"read": 0, "write": 1, "admin": 2}
max_level = levels.get(level, 0)
return [
self._tools[name]
for name, perm in self._permissions.items()
if levels.get(perm, 0) <= max_level
]
FAQ
How many tools should an agent have access to?
Keep it under 20 for most agents. Research shows that LLM tool selection accuracy degrades as the number of available tools increases. If you need more, use a router pattern — a first LLM call selects the relevant tool category, then a second call picks the specific tool from a smaller set.
Should tool descriptions include examples?
Yes, especially for tools with complex parameters. Including a brief example in the description (like "Example: search_invoices(status='overdue', client_name='Acme')") significantly improves the LLM's ability to construct correct arguments.
How do I test tools independently from the agent?
Write unit tests for each tool function that verify correct outputs for valid inputs and proper error handling for invalid inputs. Then write integration tests that run the full agent loop with mock tool responses to verify the agent calls tools correctly. Test tools in isolation before testing them within the agent.
#ToolUse #FunctionCalling #AIAgents #Python #APIDesign #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.