Agent APIs Are Not Like Traditional APIs

Traditional APIs serve predictable request-response patterns. You call an endpoint, it processes the request in milliseconds to seconds, and returns a structured response. AI agent APIs break these assumptions in several ways:

Long-running requests: Agent executions take seconds to minutes, not milliseconds
Streaming output: Agents generate tokens incrementally — users expect to see partial results
Multi-step execution: A single agent invocation may involve many internal steps, each with observable state
Callbacks and tool use: The agent may need to call external tools or request human input during execution
Unpredictable response shapes: Agent outputs vary in structure based on the task

These characteristics create unique API design challenges regardless of whether you choose REST, GraphQL, or gRPC.

REST: The Default Choice

REST is the most widely used pattern for AI agent APIs. OpenAI, Anthropic, and most agent platforms expose REST APIs. The pattern is well-understood, widely supported by client libraries, and works with standard HTTP infrastructure.

Agent Invocation Pattern

POST /api/v1/agents/{agent_id}/runs
Content-Type: application/json

{
  "input": "Analyze Q4 sales performance",
  "config": {
    "model": "gpt-4o",
    "max_steps": 10,
    "tools": ["sql_query", "chart_generator"]
  },
  "stream": true
}

Streaming with Server-Sent Events (SSE)

For streaming agent output, SSE is the standard REST-compatible approach. The server sends events as the agent executes — token-by-token output, tool call notifications, and status updates.

from fastapi import FastAPI
from fastapi.responses import StreamingResponse

@app.post("/api/v1/agents/{agent_id}/runs")
async def run_agent(agent_id: str, request: RunRequest):
    async def event_stream():
        async for event in agent.execute(request):
            match event.type:
                case "token":
                    yield f"data: {json.dumps({'type': 'token', 'content': event.token})}\n\n"
                case "tool_call":
                    yield f"data: {json.dumps({'type': 'tool_call', 'tool': event.tool, 'args': event.args})}\n\n"
                case "done":
                    yield f"data: {json.dumps({'type': 'done', 'result': event.result})}\n\n"

    return StreamingResponse(event_stream(), media_type="text/event-stream")

Long-Running Operations with Polling

For agent runs that take minutes, the async operation pattern works well: return a run ID immediately, and the client polls for status.

POST /api/v1/agents/{agent_id}/runs → 202 Accepted, {"run_id": "abc123"}
GET  /api/v1/runs/abc123           → 200 OK, {"status": "running", "steps_completed": 3}
GET  /api/v1/runs/abc123           → 200 OK, {"status": "completed", "result": {...}}

OpenAI's Assistants API uses exactly this pattern — creating a run and then polling (or streaming) for results.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

GraphQL: Flexible but Complex

GraphQL's strength is flexible querying — clients request exactly the data they need. For agent platforms with rich metadata (run history, step details, tool configurations), GraphQL reduces over-fetching.

Where GraphQL Shines

query AgentRunDetails {
  run(id: "abc123") {
    status
    startedAt
    steps {
      type
      toolName
      duration
      ... on LLMStep {
        model
        tokenUsage { input output }
      }
      ... on ToolStep {
        toolName
        input
        output
      }
    }
    result {
      content
      citations
    }
  }
}

This single query returns exactly the data the client needs, with type-specific fields for different step types. In REST, this would require multiple endpoints or a complex query parameter scheme.

Where GraphQL Struggles

Streaming is not native to GraphQL. GraphQL subscriptions over WebSockets can handle it, but the implementation is more complex than SSE. File uploads (for document-processing agents) are awkward in GraphQL. And the overhead of the GraphQL layer adds latency that matters for real-time agent interactions.

gRPC: Best for Inter-Agent Communication

gRPC shines for server-to-server communication in multi-agent systems. Its binary protocol, strong typing via Protocol Buffers, and native streaming support make it ideal for agent orchestration.

Agent Service Definition

syntax = "proto3";

service AgentService {
  // Unary: simple request-response
  rpc InvokeAgent(AgentRequest) returns (AgentResponse);

  // Server streaming: agent sends incremental results
  rpc StreamAgent(AgentRequest) returns (stream AgentEvent);

  // Bidirectional: interactive agent with tool callbacks
  rpc InteractiveAgent(stream ClientMessage) returns (stream AgentEvent);
}

message AgentEvent {
  oneof event {
    TokenEvent token = 1;
    ToolCallEvent tool_call = 2;
    StatusEvent status = 3;
    CompletionEvent completion = 4;
  }
}

Bidirectional Streaming for Human-in-the-Loop

gRPC's bidirectional streaming is uniquely suited for interactive agent workflows. The agent streams its execution, and the client can inject approvals, corrections, or additional context mid-execution — something that is difficult to implement cleanly with REST or GraphQL.

Recommendation by Use Case

Use Case	Recommended	Why
Public API for agent platform	REST + SSE	Universal client support, simple integration
Dashboard / admin interface	GraphQL	Flexible querying for complex data models
Multi-agent orchestration	gRPC	Low latency, typed contracts, bidirectional streaming
Mobile client	REST + SSE	Simpler than GraphQL on mobile, good library support
Internal microservices	gRPC	Performance, code generation, streaming

Universal Design Principles

Regardless of protocol, AI agent APIs should follow these principles:

Idempotent run creation: Clients should be able to safely retry agent invocation requests without creating duplicate runs
Structured events: Every agent step should emit structured events (not just raw text) that clients can parse and display appropriately
Cancellation support: Long-running agent executions must be cancellable
Cost transparency: Include token usage and estimated cost in responses so clients can make informed decisions
Rate limiting by compute: Rate limit by estimated compute cost, not just request count — one complex agent run should consume more rate limit budget than a simple query

The API is the contract between your agent platform and its consumers. Getting the design right early saves significant refactoring as the platform scales.

Sources:

Building AI Agent APIs: REST vs GraphQL vs gRPC Patterns