Building Agentic AI Tool Libraries: A Developer's Guide to Custom Functions

Tools Are What Make Agents Useful

A language model without tools is a conversation partner. A language model with tools is an agent. Tools are the functions an agent can call to interact with the outside world — querying databases, sending emails, creating records, calling APIs, executing calculations. The quality and reliability of your tool library directly determines how useful your agent can be.

Yet tool development is often treated as an afterthought. Teams spend weeks on prompt engineering and minutes on tool design. The result is brittle tools with poor error messages, inconsistent parameter schemas, and no testing. This guide covers how to build production-grade tool libraries that are reusable, well-tested, and secure.

Tool Design Principles

Good tools share several characteristics that make them reliable in agent workflows.

Single Responsibility

Each tool should do one thing well. A tool called "manage_user" that can create, update, delete, and query users is harder for the agent to use correctly than four separate tools: "create_user", "update_user", "delete_user", and "get_user". The agent makes better function-calling decisions when tools have focused, unambiguous purposes.

Descriptive Names and Parameters

The tool name and parameter descriptions are part of the prompt. The LLM uses them to decide which tool to call and what arguments to pass. Invest in clear, specific names.

# Bad: vague name and parameters
def query(table: str, filter: str) -> str:
    """Query a table."""
    ...

# Good: specific name with documented parameters
def get_customer_orders(
    customer_id: str,
    status: str = "all",
    limit: int = 10
) -> list[dict]:
    """
    Retrieve orders for a specific customer.

    Args:
        customer_id: The unique identifier for the customer
        status: Filter by order status. Options: "all", "pending",
                "shipped", "delivered", "cancelled"
        limit: Maximum number of orders to return (1-100)
    """
    ...

Structured Return Values

Tools should return structured data, not free-form strings. When an agent receives structured output, it can reason about the data more effectively and extract specific fields.

from pydantic import BaseModel

class OrderResult(BaseModel):
    success: bool
    orders: list[dict]
    total_count: int
    has_more: bool
    message: str

def get_customer_orders(customer_id: str, limit: int = 10) -> OrderResult:
    orders = db.query_orders(customer_id, limit=limit + 1)
    return OrderResult(
        success=True,
        orders=orders[:limit],
        total_count=len(orders),
        has_more=len(orders) > limit,
        message=f"Found {min(len(orders), limit)} orders for customer {customer_id}"
    )

Parameter Validation

Never trust that the LLM will pass valid parameters. LLMs hallucinate parameter values, confuse parameter types, and occasionally pass completely unexpected inputs. Every tool must validate its inputs before executing.

Using Pydantic for Tool Parameters

from pydantic import BaseModel, Field, field_validator
from typing import Literal
from datetime import date

class SearchOrdersParams(BaseModel):
    customer_id: str = Field(
        ...,
        min_length=1,
        max_length=50,
        description="Customer ID to search orders for"
    )
    status: Literal["all", "pending", "shipped", "delivered", "cancelled"] = "all"
    date_from: date | None = Field(
        None,
        description="Start date for order search (YYYY-MM-DD)"
    )
    date_to: date | None = Field(
        None,
        description="End date for order search (YYYY-MM-DD)"
    )
    limit: int = Field(10, ge=1, le=100)

    @field_validator("date_to")
    @classmethod
    def validate_date_range(cls, v, info):
        if v and info.data.get("date_from") and v < info.data["date_from"]:
            raise ValueError("date_to must be after date_from")
        return v

Error Messages for the Agent

When validation fails, the error message goes back to the agent. Make these messages instructive so the agent can self-correct.

def execute_tool(tool_name: str, params: dict) -> dict:
    try:
        validated = TOOL_SCHEMAS[tool_name](**params)
        return TOOL_FUNCTIONS[tool_name](validated)
    except ValidationError as e:
        error_details = []
        for error in e.errors():
            field = ".".join(str(p) for p in error["loc"])
            error_details.append(f"Parameter '{field}': {error['msg']}")
        return {
            "success": False,
            "error": f"Invalid parameters for {tool_name}",
            "details": error_details,
            "hint": f"Check the tool schema and correct the parameters."
        }

Error Handling in Tools

Tools interact with external systems that fail. Databases go down, APIs timeout, rate limits trigger. Your tools must handle these failures gracefully and return useful information to the agent.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

The Tool Result Pattern

Every tool should return a consistent result structure that the agent can always parse.

class ToolResult(BaseModel):
    success: bool
    data: Any = None
    error: str | None = None
    error_code: str | None = None
    retryable: bool = False
    suggestion: str | None = None

def safe_tool_execution(func):
    async def wrapper(*args, **kwargs) -> ToolResult:
        try:
            result = await func(*args, **kwargs)
            return ToolResult(success=True, data=result)
        except RateLimitError:
            return ToolResult(
                success=False,
                error="API rate limit exceeded",
                error_code="RATE_LIMIT",
                retryable=True,
                suggestion="Wait 30 seconds and try again"
            )
        except TimeoutError:
            return ToolResult(
                success=False,
                error="External service timeout",
                error_code="TIMEOUT",
                retryable=True,
                suggestion="The service is slow. Try again."
            )
        except PermissionError as e:
            return ToolResult(
                success=False,
                error=f"Permission denied: {e}",
                error_code="FORBIDDEN",
                retryable=False,
                suggestion="This action requires elevated permissions."
            )
        except Exception as e:
            return ToolResult(
                success=False,
                error=f"Unexpected error: {type(e).__name__}",
                error_code="INTERNAL",
                retryable=False,
            )
    return wrapper

Tool Composition

Complex operations often require composing multiple simple tools. Rather than building monolithic tools, create composable primitives that the agent chains together.

Atomic Tools

# Primitive tools
async def get_customer(customer_id: str) -> Customer: ...
async def get_customer_orders(customer_id: str) -> list[Order]: ...
async def get_order_items(order_id: str) -> list[OrderItem]: ...
async def calculate_refund(order_id: str, item_ids: list[str]) -> RefundCalc: ...
async def process_refund(refund_calc: RefundCalc) -> RefundResult: ...

The agent composes these into a workflow: look up the customer, find their orders, identify the relevant items, calculate the refund, and process it. Each step is independently testable and reusable.

Composite Tools for Common Workflows

When certain tool chains are called frequently, create composite tools that bundle them together. This reduces the number of LLM calls and makes common operations faster.

async def process_customer_refund(
    customer_id: str,
    order_id: str,
    item_ids: list[str],
    reason: str
) -> RefundResult:
    """
    Process a refund for specific items in a customer order.
    Validates the customer, order ownership, item eligibility,
    calculates the refund amount, and processes the refund.
    """
    customer = await get_customer(customer_id)
    orders = await get_customer_orders(customer_id)

    target_order = next((o for o in orders if o.id == order_id), None)
    if not target_order:
        raise ValueError(f"Order {order_id} not found for customer {customer_id}")

    calc = await calculate_refund(order_id, item_ids)
    return await process_refund(calc)

Testing Tools in Isolation

Tools must be tested independently from the agent. This means testing the function logic, parameter validation, error handling, and edge cases without involving the LLM.

import pytest

class TestGetCustomerOrders:
    async def test_returns_orders_for_valid_customer(self, db_session):
        # Arrange
        customer = await create_test_customer(db_session)
        order = await create_test_order(db_session, customer_id=customer.id)

        # Act
        result = await get_customer_orders(customer.id)

        # Assert
        assert result.success is True
        assert len(result.data) == 1
        assert result.data[0]["id"] == order.id

    async def test_returns_empty_for_customer_with_no_orders(self, db_session):
        customer = await create_test_customer(db_session)
        result = await get_customer_orders(customer.id)
        assert result.success is True
        assert len(result.data) == 0

    async def test_handles_invalid_customer_id(self):
        result = await get_customer_orders("nonexistent-id")
        assert result.success is False
        assert result.error_code == "NOT_FOUND"

    async def test_respects_limit_parameter(self, db_session):
        customer = await create_test_customer(db_session)
        for _ in range(10):
            await create_test_order(db_session, customer_id=customer.id)

        result = await get_customer_orders(customer.id, limit=3)
        assert len(result.data) == 3
        assert result.has_more is True

Tool Registries

As your tool library grows, you need a systematic way to register, discover, and manage tools.

from typing import Callable, Any

class ToolRegistry:
    def __init__(self):
        self._tools: dict[str, dict] = {}

    def register(
        self,
        name: str,
        func: Callable,
        schema: type[BaseModel],
        description: str,
        permissions: list[str] | None = None,
        category: str = "general"
    ):
        self._tools[name] = {
            "function": func,
            "schema": schema,
            "description": description,
            "permissions": permissions or [],
            "category": category,
        }

    def get_tools_for_agent(
        self,
        agent_permissions: list[str],
        categories: list[str] | None = None
    ) -> list[dict]:
        available = []
        for name, tool in self._tools.items():
            if categories and tool["category"] not in categories:
                continue
            if all(p in agent_permissions for p in tool["permissions"]):
                available.append({
                    "name": name,
                    "description": tool["description"],
                    "parameters": tool["schema"].model_json_schema(),
                })
        return available

# Usage
registry = ToolRegistry()
registry.register(
    name="get_customer_orders",
    func=get_customer_orders,
    schema=SearchOrdersParams,
    description="Retrieve orders for a specific customer",
    permissions=["orders:read"],
    category="orders",
)

Permission Systems for Tools

Not every agent should have access to every tool. A customer-facing agent should not have access to the "delete_customer" tool. A read-only analytics agent should not have write tools.

Implement permissions at the registry level so that when you construct an agent, you specify its permission set and the registry returns only the tools that agent is authorized to use. This is defense-in-depth — even if prompt injection tries to convince the agent to call a tool it should not, the tool is not available in its function catalog.

Frequently Asked Questions

How many tools should a single agent have access to?

There is no hard limit, but practical experience shows that agent performance degrades with more than 15-20 tools in a single context. The LLM must understand all available tools to make good selection decisions, and too many tools lead to confusion and incorrect tool calls. Use tool categories and agent specialization to keep each agent's tool set focused. If you need more tools, use a multi-agent architecture where a router directs requests to specialized agents.

Should tools be synchronous or asynchronous?

Use asynchronous tools whenever the tool calls external services (databases, APIs, file systems). This prevents blocking the event loop and allows the agent runtime to handle multiple concurrent operations. Synchronous tools are fine for pure computation (formatting, calculations) that completes in microseconds.

How do you test that an agent uses tools correctly?

Test tools in isolation first (unit tests for the function logic). Then create integration tests with recorded LLM interactions — provide a user message, assert that the agent calls the expected tool with the expected parameters, and verify it handles the tool response correctly. Use deterministic LLM settings (temperature 0) for reproducible test results.

What is the best way to handle tool timeouts?

Set explicit timeout values for every external call within your tools (database queries, API requests). When a timeout occurs, return a structured error with retryable=True so the agent can decide whether to retry. For critical operations, implement circuit breaker patterns that fail fast when a downstream service is consistently slow.

Can agents compose tools on their own without composite tool functions?

Yes, and this is one of the core capabilities of agentic AI. Given atomic tools, agents can reason about which tools to call in what order to accomplish a goal. However, for very common multi-step workflows, composite tools reduce latency (fewer LLM calls) and improve reliability (the composition logic is deterministic rather than LLM-dependent).