Skip to content
Learn Agentic AI14 min read0 views

Building Agents Without Frameworks: When Raw API Calls Beat Abstractions

Learn when and how to build agents using direct LLM API calls instead of frameworks, with a minimal implementation that demonstrates the agent loop, tool calling, and state management from scratch.

The Framework Tax

Every framework adds a layer between your code and the LLM API. That layer provides convenience — tool registration, conversation management, retry logic — but also adds complexity. You inherit the framework's abstractions, opinions, bugs, and update cadence. When something goes wrong, you debug through the framework's stack traces instead of your own code.

For many use cases, the framework tax is worth paying. But for others — especially simple agents, latency-sensitive applications, or systems with unusual requirements — building directly against the LLM API gives you full control with minimal overhead.

The Minimal Agent Loop

An agent is fundamentally a loop: send a message to the LLM, check if it wants to call a tool, execute the tool, send the result back, and repeat until the LLM produces a final response. Here is that loop in about 60 lines:

import json
import openai

client = openai.OpenAI()

# Tool registry: maps function names to callables
TOOLS = {}

def tool(func):
    """Register a function as an agent tool."""
    TOOLS[func.__name__] = func
    return func

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    # In production, call a real weather API
    return json.dumps({"city": city, "temp_f": 72, "condition": "sunny"})

@tool
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    try:
        result = eval(expression, {"__builtins__": {}})
        return json.dumps({"result": result})
    except Exception as e:
        return json.dumps({"error": str(e)})

# Tool schemas for the API
TOOL_SCHEMAS = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Evaluate a mathematical expression.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {"type": "string", "description": "Math expression"}
                },
                "required": ["expression"],
            },
        },
    },
]


def run_agent(user_message: str, system_prompt: str = "You are a helpful assistant.") -> str:
    """Run a minimal agent loop."""
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_message},
    ]

    max_iterations = 10
    for _ in range(max_iterations):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=TOOL_SCHEMAS,
        )
        choice = response.choices[0]

        # If the model wants to call tools
        if choice.finish_reason == "tool_calls":
            messages.append(choice.message)
            for tool_call in choice.message.tool_calls:
                fn_name = tool_call.function.name
                fn_args = json.loads(tool_call.function.arguments)
                result = TOOLS[fn_name](**fn_args)
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result,
                })
        else:
            # Final response
            return choice.message.content

    return "Agent reached maximum iterations without completing."

# Usage
answer = run_agent("What is the weather in Tokyo, and what is 42 * 17?")
print(answer)

This is a complete, working agent. It handles multi-tool calls in a single turn, loops until the LLM decides it is done, and caps iterations to prevent runaway costs. No framework required.

Adding Streaming

Streaming is straightforward with the raw API:

def run_agent_streaming(user_message: str, system_prompt: str = "You are a helpful assistant."):
    """Run agent with streaming final response."""
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_message},
    ]

    max_iterations = 10
    for _ in range(max_iterations):
        # Non-streaming call for tool use turns
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=TOOL_SCHEMAS,
        )
        choice = response.choices[0]

        if choice.finish_reason == "tool_calls":
            messages.append(choice.message)
            for tool_call in choice.message.tool_calls:
                fn_name = tool_call.function.name
                fn_args = json.loads(tool_call.function.arguments)
                result = TOOLS[fn_name](**fn_args)
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result,
                })
        else:
            # Stream the final response
            stream = client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                stream=True,
            )
            for chunk in stream:
                if chunk.choices[0].delta.content:
                    yield chunk.choices[0].delta.content
            return

    yield "Agent reached maximum iterations."

When Frameworks Are Not Worth It

Simple single-agent tools: If your agent has 2-5 tools and a single conversation loop, the raw API is cleaner than importing a framework.

Latency-critical paths: Frameworks add milliseconds of overhead per turn from abstraction layers, event hooks, and serialization. For sub-second agent responses, every millisecond counts.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Unusual conversation patterns: If your agent loop does not fit the standard "LLM calls tools in a loop" pattern — for example, you need to interleave human approval steps, external event triggers, or custom branching logic — a framework's assumptions may fight you.

Learning and understanding: Building from scratch once teaches you what frameworks actually do. You become a better user of frameworks when you understand the primitives underneath.

When Frameworks Win

Multi-agent orchestration: Handoffs, delegation, and group chat patterns are genuinely complex. Frameworks like the OpenAI Agents SDK and AutoGen save significant effort here.

Observability: Built-in tracing, logging, and debugging tools in frameworks like LangChain (with LangSmith) or the Agents SDK are hard to replicate manually.

Rapid prototyping: When you need to test an idea quickly, frameworks eliminate boilerplate and let you focus on the logic.

Team projects: Frameworks provide conventions that keep a team's code consistent. Without a framework, every developer invents their own agent loop.

FAQ

Is there a performance difference between framework agents and raw API agents?

The LLM API call dominates execution time (hundreds of milliseconds to seconds). Framework overhead is typically 1-10ms per turn — negligible for most applications. The performance argument for raw APIs is strongest in high-throughput scenarios processing thousands of agent runs per second.

How do I handle errors in a framework-free agent?

Add try/except around tool execution and include the error in the tool response message. The LLM will see the error and can retry or adjust its approach. Also add timeout handling on the API call itself and validate tool arguments before execution.

Should I build my own framework over time?

Many teams start with raw API calls and gradually extract reusable patterns into an internal library. This is a valid approach — you end up with a framework tailored to your specific needs. The risk is maintaining it as the LLM APIs evolve.


#AgentArchitecture #APIDesign #Python #MinimalAgents #FrameworkFree #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.