API Pagination for AI Agent Data: Cursor-Based, Offset, and Keyset Pagination

Why Pagination Matters for AI Agent APIs

AI agents generate enormous volumes of data: conversation histories, tool call logs, evaluation results, and audit trails. Returning all records in a single response is impractical. Without pagination, a single query for an agent's conversation history could return millions of messages, consuming excessive memory, saturating the network, and timing out.

Pagination splits large result sets into manageable pages. The three dominant strategies — offset-based, cursor-based, and keyset pagination — each offer different performance characteristics and consistency guarantees.

Offset-Based Pagination: Simple but Fragile

Offset pagination uses a page number or offset combined with a limit. It is the most intuitive approach and maps directly to SQL's LIMIT and OFFSET clauses.

from fastapi import FastAPI, Query
from pydantic import BaseModel
from sqlalchemy import select, func
from sqlalchemy.ext.asyncio import AsyncSession

app = FastAPI()

class PaginatedResponse(BaseModel):
    data: list[dict]
    total: int
    offset: int
    limit: int
    has_more: bool

@app.get("/v1/agents/{agent_id}/messages")
async def list_messages_offset(
    agent_id: str,
    offset: int = Query(0, ge=0),
    limit: int = Query(20, ge=1, le=100),
    db: AsyncSession = Depends(get_db),
):
    total = await db.scalar(
        select(func.count())
        .select_from(Message)
        .where(Message.agent_id == agent_id)
    )

    rows = await db.execute(
        select(Message)
        .where(Message.agent_id == agent_id)
        .order_by(Message.created_at.desc())
        .offset(offset)
        .limit(limit)
    )
    messages = rows.scalars().all()

    return PaginatedResponse(
        data=[m.to_dict() for m in messages],
        total=total,
        offset=offset,
        limit=limit,
        has_more=offset + limit < total,
    )

The problem with offset pagination is performance degradation at scale. OFFSET 1000000 forces the database to scan and discard one million rows before returning results. It also suffers from consistency issues: if new records are inserted while the client is paginating, pages can shift, causing duplicated or skipped items.

Cursor-Based Pagination: Consistent and Scalable

Cursor pagination uses an opaque token representing the position of the last item on the current page. The server decodes the cursor to determine where to start the next page, avoiding the performance cliff of large offsets.

import base64
import json

def encode_cursor(created_at: str, id: str) -> str:
    payload = json.dumps({"created_at": created_at, "id": id})
    return base64.urlsafe_b64encode(payload.encode()).decode()

def decode_cursor(cursor: str) -> dict:
    payload = base64.urlsafe_b64decode(cursor.encode()).decode()
    return json.loads(payload)

class CursorPaginatedResponse(BaseModel):
    data: list[dict]
    next_cursor: str | None
    has_more: bool

@app.get("/v1/agents/{agent_id}/conversations")
async def list_conversations_cursor(
    agent_id: str,
    cursor: str | None = Query(None),
    limit: int = Query(20, ge=1, le=100),
    db: AsyncSession = Depends(get_db),
):
    query = (
        select(Conversation)
        .where(Conversation.agent_id == agent_id)
        .order_by(
            Conversation.created_at.desc(),
            Conversation.id.desc(),
        )
    )

    if cursor:
        decoded = decode_cursor(cursor)
        query = query.where(
            (Conversation.created_at < decoded["created_at"])
            | (
                (Conversation.created_at == decoded["created_at"])
                & (Conversation.id < decoded["id"])
            )
        )

    rows = await db.execute(query.limit(limit + 1))
    items = rows.scalars().all()

    has_more = len(items) > limit
    items = items[:limit]

    next_cursor = None
    if has_more and items:
        last = items[-1]
        next_cursor = encode_cursor(
            last.created_at.isoformat(), str(last.id)
        )

    return CursorPaginatedResponse(
        data=[c.to_dict() for c in items],
        next_cursor=next_cursor,
        has_more=has_more,
    )

The trick of fetching limit + 1 items lets you determine whether more pages exist without running a separate count query.

Keyset Pagination: Maximum Database Performance

Keyset pagination is a variant of cursor pagination that directly uses column values rather than opaque tokens. It requires a strict, unique ordering and leverages database indexes for maximum efficiency.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

@app.get("/v1/agents/{agent_id}/tool-calls")
async def list_tool_calls_keyset(
    agent_id: str,
    after_id: int | None = Query(None),
    limit: int = Query(50, ge=1, le=200),
    db: AsyncSession = Depends(get_db),
):
    query = (
        select(ToolCall)
        .where(ToolCall.agent_id == agent_id)
        .order_by(ToolCall.id.asc())
    )

    if after_id is not None:
        query = query.where(ToolCall.id > after_id)

    rows = await db.execute(query.limit(limit + 1))
    items = rows.scalars().all()
    has_more = len(items) > limit
    items = items[:limit]

    return {
        "data": [t.to_dict() for t in items],
        "next_after_id": items[-1].id if has_more else None,
        "has_more": has_more,
    }

This generates a simple WHERE id > :after_id ORDER BY id LIMIT :limit query that uses an index seek instead of a sequential scan, performing consistently regardless of how deep into the dataset you paginate.

Choosing the Right Strategy

Use offset pagination for admin dashboards and internal tools where datasets are small, users need to jump to specific pages, and simplicity is valued over performance.

Use cursor pagination for public APIs consumed by AI agents that iterate through large datasets sequentially. It provides stable results and consistent performance.

Use keyset pagination when you control both the API and the client, your ordering column is indexed and unique, and you need maximum query performance on tables with millions of rows.

FAQ

Can I mix pagination strategies in the same API?

Yes, but be consistent within each resource. For example, use cursor pagination for conversation messages (which are append-heavy and sequentially accessed) and offset pagination for a paginated admin dashboard that needs page jumping. Document the strategy clearly in your OpenAPI spec for each endpoint.

How do I handle filtering with cursor pagination?

Apply filters before cursor conditions. The cursor encodes position within the filtered result set. If a user changes filters mid-pagination, they must start from the beginning with no cursor. Never reuse a cursor from a different filter combination — the underlying position may point to a record that no longer matches the new filter.

What page size should I default to for AI agent APIs?

Start with 20 to 50 items per page, with a maximum of 100 to 200. AI agents processing data in bulk may benefit from larger pages to reduce HTTP round trips, but excessively large pages increase memory pressure and response latency. Let clients specify the page size via a limit query parameter with a sane default and a hard maximum.

#APIPagination #CursorPagination #FastAPI #DatabasePerformance #AIAgents #AgenticAI #LearnAI #AIEngineering

API Pagination for AI Agent Data: Cursor-Based, Offset, and Keyset Pagination

Choosing the Right Strategy

FAQ

What page size should I default to for AI agent APIs?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding