Connection Pooling for AI Applications: PgBouncer, pgpool, and Application-Level Pools

Why Connection Pooling Is Critical for AI Agents

AI agent applications have a unique database access pattern. A single agent invocation might open a database connection, fetch context, then wait several seconds for an LLM to respond before writing the result back. During that LLM wait, the database connection sits idle — but it still consumes a PostgreSQL backend process, memory, and a slot in max_connections.

With 50 concurrent agent sessions each holding connections during LLM calls, you quickly exhaust PostgreSQL's default 100 connections. Connection pooling solves this by multiplexing many application connections over a smaller number of database connections, returning idle connections to the pool during LLM wait times.

Application-Level Pooling

Most ORMs and database drivers include built-in connection pools. Configure them correctly as your first line of defense:

SQLAlchemy (Python):

from sqlalchemy.ext.asyncio import create_async_engine

engine = create_async_engine(
    "postgresql+asyncpg://user:pass@localhost/agents",
    pool_size=10,         # Steady-state connections
    max_overflow=5,       # Extra connections under load
    pool_timeout=30,      # Seconds to wait for a connection
    pool_recycle=3600,    # Recycle connections after 1 hour
    pool_pre_ping=True,   # Verify connections before use
)

Prisma (TypeScript):

# In DATABASE_URL
postgresql://user:pass@localhost/agents?connection_limit=15&pool_timeout=30

asyncpg (Python, direct driver):

import asyncpg

pool = await asyncpg.create_pool(
    "postgresql://user:pass@localhost/agents",
    min_size=5,
    max_size=20,
    max_inactive_connection_lifetime=300,  # Drop idle connections after 5 min
)

The key rule: set your pool size to match your actual concurrent database usage, not your concurrent user count. If each agent request uses the database for 100ms out of a 3-second total request time, you need far fewer database connections than active requests.

PgBouncer: External Connection Pooler

PgBouncer sits between your application and PostgreSQL, pooling connections across all application instances:

# pgbouncer.ini
[databases]
agents = host=127.0.0.1 port=5432 dbname=agents

[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

pool_mode = transaction
default_pool_size = 20
max_client_conn = 200
reserve_pool_size = 5
reserve_pool_timeout = 3
server_idle_timeout = 600

The critical setting is pool_mode. Three options:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

session — a server connection is assigned for the entire client session. Safest but provides the least multiplexing.
transaction — a server connection is assigned only for the duration of a transaction. Best for most agent workloads.
statement — a server connection is assigned per statement. Breaks multi-statement transactions. Rarely appropriate.

Use transaction mode for agent applications. It provides the best connection reuse because connections return to the pool between transactions. Your application sees 200 client connections but PostgreSQL only handles 20 actual connections.

PgBouncer with Kubernetes

Deploy PgBouncer as a sidecar container alongside your agent application pods:

# In your deployment spec
containers:
  - name: agent-api
    image: agent-api:latest
    env:
      - name: DATABASE_URL
        value: "postgresql://user:pass@localhost:6432/agents"

  - name: pgbouncer
    image: bitnami/pgbouncer:latest
    ports:
      - containerPort: 6432
    env:
      - name: PGBOUNCER_DATABASE
        value: agents
      - name: POSTGRESQL_HOST
        value: postgres-service.database.svc.cluster.local
      - name: PGBOUNCER_POOL_MODE
        value: transaction
      - name: PGBOUNCER_DEFAULT_POOL_SIZE
        value: "15"
      - name: PGBOUNCER_MAX_CLIENT_CONN
        value: "100"

The application connects to localhost:6432 (PgBouncer), which proxies to the actual PostgreSQL service. Each pod gets its own PgBouncer instance with dedicated pool management.

Monitoring Connection Usage

Track pool health with these queries:

-- Current PostgreSQL connections by state
SELECT state, count(*)
FROM pg_stat_activity
WHERE datname = 'agents'
GROUP BY state;

-- PgBouncer pool stats (connect to PgBouncer admin console)
-- pgbouncer admin: SHOW POOLS;
-- pgbouncer admin: SHOW STATS;

In your application, expose pool metrics:

from fastapi import FastAPI

app = FastAPI()

@app.get("/health/db")
async def db_health():
    pool = get_pool()
    return {
        "pool_size": pool.get_size(),
        "pool_free": pool.get_idle_size(),
        "pool_used": pool.get_size() - pool.get_idle_size(),
        "min_size": pool.get_min_size(),
        "max_size": pool.get_max_size(),
    }

Troubleshooting Common Issues

"too many connections for role" — your total pool size across all instances exceeds PostgreSQL's max_connections. Calculate: instances * pool_max_size < max_connections - reserved_connections.

Connection timeouts under load — increase max_overflow (SQLAlchemy) or reserve_pool_size (PgBouncer) to handle burst traffic. Also check if connections are leaked by code paths that do not properly return connections to the pool.

Idle connections consuming memory — set pool_recycle or server_idle_timeout to close connections that have been idle too long. PostgreSQL consumes approximately 10MB per connection in memory.

FAQ

Should I use PgBouncer or application-level pooling?

Use both. Application-level pooling manages connections within a single process. PgBouncer manages connections across all processes and application instances. A typical production setup uses an application pool of 10-15 connections per instance, with PgBouncer sitting in front of PostgreSQL to further multiplex across instances.

Does PgBouncer transaction mode break prepared statements?

Yes. In transaction mode, PgBouncer may route consecutive statements to different PostgreSQL backends, which do not share prepared statement caches. Most ORMs handle this transparently by using simple query protocol. If you use explicit prepared statements, switch to session mode or disable server-side prepare in your driver (statement_cache_size=0 in asyncpg).

How do I size my connection pool correctly?

Start with the formula: pool_size = (concurrent_requests * db_time_per_request) / total_request_time. For an agent API handling 100 concurrent requests where each request uses the database for 200ms out of a 4-second total, you need 100 * 0.2 / 4 = 5 connections. Add a 2-3x safety margin, giving you 10-15 connections. Monitor and adjust from there.

#ConnectionPooling #PgBouncer #PostgreSQL #Performance #AIAgents #AgenticAI #LearnAI #AIEngineering

Connection Pooling for AI Applications: PgBouncer, pgpool, and Application-Level Pools

Why Connection Pooling Is Critical for AI Agents

Application-Level Pooling

PgBouncer: External Connection Pooler

PgBouncer with Kubernetes

Monitoring Connection Usage

Troubleshooting Common Issues

FAQ

Should I use PgBouncer or application-level pooling?

Does PgBouncer transaction mode break prepared statements?

How do I size my connection pool correctly?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding