API Design for AI Agent Tool Functions: Best Practices and Anti-Patterns

Tool Functions Are APIs for LLMs

When you design a REST API, you think about your consumer: a developer reading documentation, building a client, and handling responses. When you design tool functions for AI agents, your consumer is an LLM. The LLM reads the function name, description, and parameter schema, then decides when and how to call it.

This difference matters more than most developers realize. An LLM cannot browse your code, read inline comments, or ask clarifying questions about ambiguous parameter names. It makes decisions based entirely on the metadata you provide in the tool definition. Bad tool design leads to incorrect tool calls, wrong parameters, and confused agent behavior — not because the model is dumb, but because the API is unclear.

This post covers the principles, patterns, and anti-patterns of designing tool functions that LLMs can use reliably and effectively.

Principle 1: Names Must Be Self-Explanatory

An LLM selects a tool based primarily on its name and description. The name must convey what the tool does without ambiguity. Use verb-noun naming that reads like a command: search_products, get_order_status, create_support_ticket, cancel_subscription.

# GOOD: Clear, action-oriented names
tools = [
    {"name": "search_knowledge_base", "description": "Search support articles by keyword"},
    {"name": "get_customer_details", "description": "Retrieve a customer's profile and account info"},
    {"name": "create_support_ticket", "description": "Create a new support ticket for the customer"},
    {"name": "check_order_status", "description": "Check the current status of an order by order ID"},
    {"name": "schedule_callback", "description": "Schedule a phone callback from a support agent"},
]

# BAD: Ambiguous or overly generic names
tools = [
    {"name": "search", "description": "Search for things"},           # Search what?
    {"name": "get_data", "description": "Gets data from the system"}, # What data? What system?
    {"name": "process", "description": "Process the request"},        # What kind of processing?
    {"name": "handle_customer", "description": "Handle customer"},    # Handle how?
    {"name": "do_action", "description": "Performs an action"},       # Completely useless
]

The anti-pattern to watch for is over-abstraction. Developers who are used to building flexible, generic APIs create tools like execute_query or perform_operation that technically do everything but tell the LLM nothing about when to use them.

Principle 2: Use Enums, Not Free-Text, for Categorical Parameters

When a parameter has a fixed set of valid values, define it as an enum. LLMs are significantly more accurate at selecting from a list of options than generating the correct value from memory.

# GOOD: Enum parameters with clear descriptions
{
    "name": "update_ticket_priority",
    "description": "Change the priority level of a support ticket",
    "parameters": {
        "type": "object",
        "properties": {
            "ticket_id": {
                "type": "string",
                "description": "The support ticket ID (format: TKT-XXXXX)"
            },
            "priority": {
                "type": "string",
                "enum": ["low", "medium", "high", "critical"],
                "description": "The new priority level. Use 'critical' only for system outages or data loss."
            }
        },
        "required": ["ticket_id", "priority"]
    }
}

# BAD: Free-text parameter for categorical values
{
    "name": "update_ticket_priority",
    "description": "Change the priority level of a support ticket",
    "parameters": {
        "type": "object",
        "properties": {
            "ticket_id": {
                "type": "string",
                "description": "The ticket ID"
            },
            "priority": {
                "type": "string",
                "description": "The priority (e.g., low, medium, high)"
                # LLM might generate: "urgent", "P1", "very high", "ASAP"
            }
        }
    }
}

The enum approach eliminates an entire class of errors. Without enums, the LLM might generate "urgent" instead of "critical," "P1" instead of "high," or "normal" instead of "medium." Each incorrect value causes a validation error or worse — gets accepted and causes incorrect behavior.

Principle 3: Descriptions Should Include When-to-Use Guidance

The function description is not just documentation — it is a routing instruction for the LLM. A good description tells the model not just what the tool does but when to use it and when not to use it.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

# GOOD: Description includes when-to-use and when-not-to-use guidance
{
    "name": "escalate_to_human",
    "description": (
        "Transfer the conversation to a human support agent. "
        "Use this when: (1) the customer explicitly asks to speak to a human, "
        "(2) you cannot resolve the issue after 2 attempts, "
        "(3) the issue involves a billing dispute over $100, or "
        "(4) the customer expresses frustration or dissatisfaction. "
        "Do NOT use this for simple questions that can be answered from the knowledge base."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "reason": {
                "type": "string",
                "enum": [
                    "customer_requested",
                    "unresolved_after_attempts",
                    "billing_dispute",
                    "customer_frustrated",
                    "technical_issue_beyond_scope"
                ],
                "description": "The reason for escalation"
            },
            "conversation_summary": {
                "type": "string",
                "description": "Brief summary of the conversation so far for the human agent"
            }
        },
        "required": ["reason", "conversation_summary"]
    }
}

# BAD: Minimal description that does not guide usage
{
    "name": "escalate_to_human",
    "description": "Escalate to a human agent",
    "parameters": {
        "type": "object",
        "properties": {
            "reason": {"type": "string"},
            "summary": {"type": "string"}
        }
    }
}

Principle 4: Return Structured, Actionable Responses

Tool responses should be structured data that the LLM can reason over, not raw text blobs. Include the data the model needs to formulate its response to the user, and exclude internal implementation details.

# GOOD: Structured response with actionable data
async def check_order_status(order_id: str) -> dict:
    order = await db.get_order(order_id)
    if not order:
        return {
            "found": False,
            "message": f"No order found with ID {order_id}",
            "suggestion": "Ask the customer to verify the order ID or check their confirmation email"
        }

    return {
        "found": True,
        "order_id": order.id,
        "status": order.status,
        "status_description": STATUS_DESCRIPTIONS[order.status],
        "items": [
            {"name": item.product_name, "quantity": item.quantity, "price": item.price}
            for item in order.items
        ],
        "total": order.total,
        "estimated_delivery": order.estimated_delivery.isoformat() if order.estimated_delivery else None,
        "tracking_url": order.tracking_url,
        "can_cancel": order.status in ["pending", "processing"],
        "can_modify": order.status == "pending",
    }

# BAD: Unstructured text response
async def check_order_status(order_id: str) -> str:
    order = await db.get_order(order_id)
    return f"Order {order_id} status: {order.status}, total: ${order.total}"
    # Missing: what items? Can it be cancelled? Tracking info?

Notice the structured response includes flags like can_cancel and can_modify. These guide the LLM's next action without requiring it to reason about business logic. The model sees can_cancel: true and knows it can offer cancellation. Without this flag, the model has to guess whether the order status allows cancellation.

Principle 5: Error Responses Should Be Helpful, Not Generic

When a tool call fails, the error message is the only information the LLM has to recover. A generic "Something went wrong" gives the model nothing to work with. A specific error with a suggestion lets the model correct course.

# GOOD: Specific errors with recovery suggestions
async def apply_discount_code(cart_id: str, code: str) -> dict:
    cart = await get_cart(cart_id)
    if not cart:
        return {
            "success": False,
            "error": "cart_not_found",
            "message": f"Cart {cart_id} does not exist or has expired",
            "suggestion": "The cart may have expired. Ask the customer to re-add items."
        }

    discount = await validate_discount(code)
    if not discount:
        return {
            "success": False,
            "error": "invalid_code",
            "message": f"Discount code '{code}' is not valid",
            "suggestion": "Ask the customer to double-check the code spelling. "
                          "Common codes: WELCOME10, SUMMER25, LOYALTY15"
        }

    if discount.min_order_amount and cart.total < discount.min_order_amount:
        return {
            "success": False,
            "error": "minimum_not_met",
            "message": f"Cart total ${cart.total:.2f} is below the minimum "
                       f"${discount.min_order_amount:.2f} for code '{code}'",
            "suggestion": f"The customer needs to add ${discount.min_order_amount - cart.total:.2f} "
                          f"more to qualify for this discount."
        }

    # Apply discount
    new_total = cart.total - discount.amount
    await update_cart_total(cart_id, new_total)
    return {
        "success": True,
        "discount_applied": discount.amount,
        "new_total": new_total,
        "code": code,
    }

# BAD: Generic error messages
async def apply_discount_code(cart_id: str, code: str) -> dict:
    try:
        result = await internal_apply_discount(cart_id, code)
        return {"success": True, "total": result.total}
    except Exception as e:
        return {"success": False, "error": str(e)}
        # LLM receives: "error": "NoneType has no attribute 'amount'"
        # Completely unhelpful for recovery

Anti-Pattern: The God Tool

The most common anti-pattern is the "god tool" — a single tool that does everything based on a type parameter. This forces the LLM to remember which action requires which parameters and provides no structural guidance.

# ANTI-PATTERN: God tool
{
    "name": "manage_customer",
    "description": "Manage customer operations",
    "parameters": {
        "type": "object",
        "properties": {
            "action": {
                "type": "string",
                "enum": ["lookup", "update", "create", "delete", "merge"]
            },
            "customer_id": {"type": "string"},
            "data": {"type": "object"},  # What shape? Depends on action.
        }
    }
}

# BETTER: Separate tools with clear contracts
tools = [
    {"name": "lookup_customer", "parameters": {"customer_id": {"type": "string"}}},
    {"name": "update_customer_email", "parameters": {"customer_id": {"type": "string"}, "new_email": {"type": "string"}}},
    {"name": "update_customer_phone", "parameters": {"customer_id": {"type": "string"}, "new_phone": {"type": "string"}}},
]

Anti-Pattern: Exposing Internal IDs Without Context

Tools that require internal database IDs as inputs are unusable unless the agent has already called another tool that returned those IDs. Always provide a way for the agent to discover IDs from user-facing information.

# ANTI-PATTERN: Requires internal ID with no way to discover it
{
    "name": "get_subscription",
    "parameters": {
        "subscription_id": {"type": "string", "description": "Internal subscription UUID"}
    }
}

# BETTER: Accept user-facing identifiers
{
    "name": "get_subscription",
    "description": "Look up a subscription by customer email or subscription ID",
    "parameters": {
        "type": "object",
        "properties": {
            "customer_email": {
                "type": "string",
                "description": "Customer's email address (preferred lookup method)"
            },
            "subscription_id": {
                "type": "string",
                "description": "Subscription ID if known (format: SUB-XXXXX)"
            }
        }
    }
}

Testing Your Tool Design

The best way to validate tool design is to run the agent against diverse user inputs and check the tool-call trace. Look for patterns: Does the agent consistently pick the wrong tool? The names or descriptions are ambiguous. Does it pass invalid parameter values? You need enums or better descriptions. Does it call tools in the wrong order? You may need to add sequencing hints in descriptions.

Build a test suite specifically for tool selection — give the agent a user message and assert which tool it calls and with what parameters. Run this suite after every tool definition change.

FAQ

How many tools should an agent have?

Research suggests that current LLMs handle 5-15 tools well. Beyond 20 tools, selection accuracy degrades because the model has to compare more options and the tool descriptions compete for attention in the context window. If you need more than 20 tools, consider a two-tier architecture: a routing agent that selects a category, and specialized agents with 5-10 tools each.

Should tool descriptions mention other tools?

Yes, when there is a natural workflow relationship. For example, a check_order_status description might include "Use this before calling cancel_order to verify the order is eligible for cancellation." This helps the agent plan multi-step operations. But avoid creating circular references where tool A's description references tool B and vice versa.

How do you version tool functions without breaking the agent?

Follow the same principles as API versioning: make backward-compatible changes (adding optional parameters, adding new response fields) without a version bump. For breaking changes (removing parameters, changing response structure), deploy the new version alongside the old one and update the agent's tool definitions in a coordinated change. Run evaluation benchmarks before and after to detect regressions.

Should tool responses include next-step suggestions?

Yes, for complex workflows. Including a next_steps or suggestion field in the response guides the agent toward the appropriate follow-up action. For example, after a successful order lookup that shows a delayed shipment, the suggestion might be "Offer to check the tracking status or escalate to the shipping team." This reduces the reasoning burden on the LLM and produces more consistent agent behavior.