Tool Use in AI Agents: Extending LLM Capabilities with External Functions

Why Tools Are the Bridge Between Thinking and Doing

An LLM without tools is a brain without hands. It can reason, analyze, and generate text — but it cannot check the weather, query a database, send an email, or read a file. Tools are what turn a language model from a conversationalist into an agent that can affect the real world.

Tool use (also called function calling) is the mechanism by which an LLM requests the execution of an external function. The model does not run the function itself — it generates a structured request (function name + arguments), your code executes it, and the result is fed back into the model's context.

The Tool Execution Flow

Understanding the exact flow of a tool call is essential for debugging and designing reliable agents.

1. LLM receives messages + tool definitions
2. LLM decides to call a tool (instead of responding with text)
3. LLM outputs: {"tool": "search_db", "args": {"query": "overdue invoices"}}
4. Your code intercepts this, executes search_db(query="overdue invoices")
5. Your code appends the result as a tool message
6. LLM receives the result and decides what to do next
7. Repeat until LLM responds with text (no tool call)

The critical insight is that the LLM never executes anything. It only generates the intent to use a tool. Your application code is the executor, which means you have full control over permissions, validation, and error handling.

Designing Effective Tools

Tool quality directly determines agent quality. A poorly designed tool confuses the LLM and leads to wrong arguments, unnecessary calls, or missed opportunities to use the right tool.

Good Tool Design Principles

# GOOD: Clear name, specific description, well-typed parameters
{
    "type": "function",
    "function": {
        "name": "search_invoices",
        "description": (
            "Search for invoices by status, client name, or date range. "
            "Returns up to 20 matching invoices with amount, status, and due date."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "status": {
                    "type": "string",
                    "enum": ["paid", "overdue", "pending", "cancelled"],
                    "description": "Filter by invoice status",
                },
                "client_name": {
                    "type": "string",
                    "description": "Partial or full client name to search for",
                },
                "due_before": {
                    "type": "string",
                    "description": "ISO date string. Return invoices due before this date.",
                },
            },
        },
    },
}

# BAD: Vague name, no description, untyped parameters
{
    "type": "function",
    "function": {
        "name": "search",
        "description": "Search for stuff",
        "parameters": {
            "type": "object",
            "properties": {
                "q": {"type": "string"},
            },
        },
    },
}

The description is the most important field. The LLM reads it to decide when and how to use the tool. Write descriptions as if you were explaining the tool to a new team member — be specific about what it does, what it returns, and any limitations.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Building a Tool Registry

In production, you need a systematic way to register, discover, and execute tools. Here is a clean pattern:

from typing import Callable, Any
import json
import inspect

class ToolRegistry:
    def __init__(self):
        self._tools: dict[str, dict] = {}
        self._executors: dict[str, Callable] = {}

    def register(self, func: Callable, description: str, parameters: dict):
        name = func.__name__
        self._tools[name] = {
            "type": "function",
            "function": {
                "name": name,
                "description": description,
                "parameters": parameters,
            },
        }
        self._executors[name] = func

    def get_tool_definitions(self) -> list[dict]:
        return list(self._tools.values())

    def execute(self, name: str, arguments: dict) -> Any:
        if name not in self._executors:
            return {"error": f"Unknown tool: {name}"}
        try:
            return self._executors[name](**arguments)
        except Exception as e:
            return {"error": f"Tool execution failed: {str(e)}"}

# Usage
registry = ToolRegistry()

def get_weather(city: str, units: str = "celsius") -> dict:
    # In production, call a real weather API
    return {"city": city, "temperature": 22, "units": units, "condition": "sunny"}

registry.register(
    func=get_weather,
    description="Get current weather for a city. Returns temperature and conditions.",
    parameters={
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"},
            "units": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature units (default: celsius)",
            },
        },
        "required": ["city"],
    },
)

Error Handling in Tool Execution

Tools fail. APIs time out, databases go down, users pass invalid arguments. How you handle tool errors determines whether your agent recovers gracefully or spirals into confusion.

def safe_execute_tool(registry: ToolRegistry, name: str, raw_args: str) -> str:
    """Execute a tool with comprehensive error handling."""
    # Parse arguments
    try:
        arguments = json.loads(raw_args)
    except json.JSONDecodeError as e:
        return json.dumps({
            "error": "Invalid arguments format",
            "details": str(e),
            "suggestion": "Please provide valid JSON arguments",
        })

    # Execute with timeout protection
    try:
        result = registry.execute(name, arguments)
        return json.dumps(result, default=str)
    except TimeoutError:
        return json.dumps({
            "error": f"Tool '{name}' timed out",
            "suggestion": "Try again with a simpler query or different parameters",
        })
    except Exception as e:
        return json.dumps({
            "error": f"Tool '{name}' failed: {str(e)}",
            "suggestion": "Check the arguments and try again",
        })

The key insight is to always return structured error messages to the LLM, not raw exceptions. Include a suggestion field — it guides the LLM toward recovery instead of just repeating the same failing call.

Tool Permissions and Safety

Not all tools should be available to all agents. A customer-facing agent should not have access to delete_database. Implement tool-level permissions:

class PermissionedToolRegistry(ToolRegistry):
    def __init__(self):
        super().__init__()
        self._permissions: dict[str, str] = {}  # tool_name -> permission level

    def register(self, func, description, parameters, permission="read"):
        super().register(func, description, parameters)
        self._permissions[func.__name__] = permission

    def get_tools_for_level(self, level: str) -> list[dict]:
        levels = {"read": 0, "write": 1, "admin": 2}
        max_level = levels.get(level, 0)
        return [
            self._tools[name]
            for name, perm in self._permissions.items()
            if levels.get(perm, 0) <= max_level
        ]

FAQ

How many tools should an agent have access to?

Keep it under 20 for most agents. Research shows that LLM tool selection accuracy degrades as the number of available tools increases. If you need more, use a router pattern — a first LLM call selects the relevant tool category, then a second call picks the specific tool from a smaller set.

Should tool descriptions include examples?

Yes, especially for tools with complex parameters. Including a brief example in the description (like "Example: search_invoices(status='overdue', client_name='Acme')") significantly improves the LLM's ability to construct correct arguments.

How do I test tools independently from the agent?

Write unit tests for each tool function that verify correct outputs for valid inputs and proper error handling for invalid inputs. Then write integration tests that run the full agent loop with mock tool responses to verify the agent calls tools correctly. Test tools in isolation before testing them within the agent.

#ToolUse #FunctionCalling #AIAgents #Python #APIDesign #AgenticAI #LearnAI #AIEngineering

Tool Use in AI Agents: Extending LLM Capabilities with External Functions

Why Tools Are the Bridge Between Thinking and Doing

The Tool Execution Flow

Designing Effective Tools

Good Tool Design Principles

Building a Tool Registry

Error Handling in Tool Execution

Tool Permissions and Safety

FAQ

How many tools should an agent have access to?

Should tool descriptions include examples?

How do I test tools independently from the agent?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding