Agentic AI Development Environment: VS Code, Docker, and GPU Setup Guide
Step-by-step guide to setting up your agentic AI dev environment — VS Code extensions, Docker Compose for LLM services, GPU passthrough, and debugging config.
Why Your Dev Environment Matters for Agentic AI
Agentic AI development has unique requirements that a standard web development setup does not cover. You need to manage API keys for multiple LLM providers, run local model servers for testing, handle streaming responses, debug non-deterministic agent behavior, and sometimes leverage GPU hardware for local inference or embedding generation.
A well-configured development environment reduces the friction between writing code and testing agent behavior. This guide walks you through setting up a complete agentic AI development environment using VS Code, Docker, and optional GPU support.
VS Code Configuration
Essential Extensions
Install these extensions for an optimal agentic AI development experience:
Python Development:
- Python (ms-python.python) — Core Python support
- Pylance (ms-python.vscode-pylance) — Fast, feature-rich Python language server
- Ruff (charliermarsh.ruff) — Extremely fast Python linter and formatter (replaces Black, isort, and Flake8)
AI and Agent Development:
- Continue (continue.continue) — AI code assistant that works with Claude and other models
- REST Client (humao.rest-client) — Test API endpoints directly from VS Code
- Thunder Client (rangav.vscode-thunder-client) — GUI-based API testing
Infrastructure:
- Docker (ms-azuretools.vscode-docker) — Docker file support and container management
- YAML (redhat.vscode-yaml) — YAML validation for Docker Compose and Kubernetes configs
- Remote - SSH (ms-vscode-remote.remote-ssh) — Develop on remote GPU machines seamlessly
VS Code Settings
Configure VS Code for Python agentic AI development:
{
"python.defaultInterpreterPath": "./venv/bin/python",
"python.analysis.typeCheckingMode": "basic",
"editor.formatOnSave": true,
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.codeActionsOnSave": {
"source.fixAll": "explicit",
"source.organizeImports": "explicit"
}
},
"files.associations": {
"*.env": "dotenv",
"*.env.*": "dotenv"
},
"editor.rulers": [88],
"files.exclude": {
"**/__pycache__": true,
"**/.pytest_cache": true,
"**/node_modules": true
}
}
Launch Configuration for Debugging
Debugging agentic AI code requires special configuration because agent loops are often async and involve external API calls. Create a .vscode/launch.json:
{
"version": "0.2.0",
"configurations": [
{
"name": "Debug Agent Server",
"type": "debugpy",
"request": "launch",
"module": "uvicorn",
"args": [
"app.main:app",
"--reload",
"--port", "8000"
],
"env": {
"ANTHROPIC_API_KEY": "${env:ANTHROPIC_API_KEY}",
"OPENAI_API_KEY": "${env:OPENAI_API_KEY}",
"LOG_LEVEL": "DEBUG"
},
"console": "integratedTerminal",
"justMyCode": false
},
{
"name": "Debug Agent Script",
"type": "debugpy",
"request": "launch",
"program": "${file}",
"env": {
"ANTHROPIC_API_KEY": "${env:ANTHROPIC_API_KEY}",
"OPENAI_API_KEY": "${env:OPENAI_API_KEY}"
},
"console": "integratedTerminal"
},
{
"name": "Debug Tests",
"type": "debugpy",
"request": "launch",
"module": "pytest",
"args": [
"${file}",
"-v",
"--tb=short"
],
"console": "integratedTerminal"
}
]
}
The justMyCode: false setting is important — it lets you step into framework code (Anthropic SDK, OpenAI SDK) when debugging agent behavior.
Environment Variable Management
The .env File Structure
Agentic AI projects typically need many environment variables. Organize them clearly:
# .env
# ── LLM Providers ──
ANTHROPIC_API_KEY=sk-ant-your-key-here
OPENAI_API_KEY=sk-proj-your-key-here
GOOGLE_API_KEY=your-google-key-here
# ── Database ──
DATABASE_URL=postgresql://user:pass@localhost:5432/agents
REDIS_URL=redis://localhost:6379/0
# ── Vector Database ──
QDRANT_URL=http://localhost:6333
PINECONE_API_KEY=your-pinecone-key
# ── Observability ──
LANGSMITH_API_KEY=your-langsmith-key
LANGSMITH_PROJECT=my-agent-project
# ── Application ──
APP_ENV=development
LOG_LEVEL=DEBUG
AGENT_MAX_ITERATIONS=10
AGENT_TIMEOUT_SECONDS=30
Security Best Practices
Never commit API keys to version control. Create a .env.example file with placeholder values and add .env to .gitignore:
# .gitignore
.env
.env.local
.env.*.local
For team development, use a shared secrets manager. Options include:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
- 1Password CLI —
op run -- python main.pyinjects secrets at runtime - Doppler — Syncs secrets across environments and team members
- AWS Secrets Manager — Good for teams already on AWS
- HashiCorp Vault — Self-hosted, enterprise-grade
Load environment variables in your Python code with python-dotenv:
from dotenv import load_dotenv
import os
load_dotenv() # loads .env file
anthropic_key = os.getenv("ANTHROPIC_API_KEY")
if not anthropic_key:
raise ValueError("ANTHROPIC_API_KEY not set")
Docker Compose for Local Services
A Docker Compose file lets you spin up all the services your agent needs with one command. Here is a production-grade setup:
# docker-compose.yml
version: "3.9"
services:
# ── PostgreSQL with pgvector ──
postgres:
image: pgvector/pgvector:pg16
ports:
- "5432:5432"
environment:
POSTGRES_USER: agents
POSTGRES_PASSWORD: localdev
POSTGRES_DB: agents_dev
volumes:
- pgdata:/var/lib/postgresql/data
- ./db/init.sql:/docker-entrypoint-initdb.d/init.sql
healthcheck:
test: ["CMD-SHELL", "pg_isready -U agents"]
interval: 5s
timeout: 3s
retries: 5
# ── Redis for caching and sessions ──
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redisdata:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
# ── Qdrant vector database ──
qdrant:
image: qdrant/qdrant:v1.12.0
ports:
- "6333:6333"
- "6334:6334"
volumes:
- qdrant_data:/qdrant/storage
environment:
QDRANT__SERVICE__GRPC_PORT: 6334
# ── Local LLM server (Ollama) ──
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
# Uncomment for GPU support:
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: 1
# capabilities: [gpu]
# ── Observability: Jaeger for tracing ──
jaeger:
image: jaegertracing/all-in-one:1.55
ports:
- "16686:16686" # UI
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
volumes:
pgdata:
redisdata:
qdrant_data:
ollama_data:
Start all services:
docker compose up -d
Verify everything is running:
docker compose ps
docker compose logs postgres --tail 20
GPU Setup for Local Inference
If you run local models (via Ollama, vLLM, or text-generation-inference), GPU acceleration dramatically improves inference speed.
NVIDIA GPU Setup on Ubuntu
# Install NVIDIA drivers
sudo apt update
sudo apt install -y nvidia-driver-550
# Verify driver installation
nvidia-smi
# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Verify GPU access in Docker
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
Running Local Models with Ollama
Once Docker GPU support is configured, run Ollama with GPU acceleration:
# Pull a model
docker compose exec ollama ollama pull llama3.3:8b
# Test inference
docker compose exec ollama ollama run llama3.3:8b "Explain agentic AI in one paragraph"
Use the local model in your agent code by pointing to the Ollama API:
from openai import OpenAI
# Ollama exposes an OpenAI-compatible API
local_client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama", # Required but not validated
)
response = local_client.chat.completions.create(
model="llama3.3:8b",
messages=[
{"role": "user", "content": "Hello!"}
],
)
print(response.choices[0].message.content)
This is useful for development and testing where you do not want to burn API credits for every debug iteration.
Project Structure
A well-organized project structure makes navigation intuitive and testing straightforward:
my-agent-project/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI entry point
│ ├── agents/
│ │ ├── __init__.py
│ │ ├── base.py # Agent base class
│ │ ├── triage.py # Triage agent
│ │ └── specialists/
│ │ ├── support.py
│ │ └── billing.py
│ ├── tools/
│ │ ├── __init__.py
│ │ ├── database.py # Database query tools
│ │ ├── email.py # Email sending tools
│ │ └── search.py # Knowledge base search
│ ├── models/
│ │ ├── __init__.py
│ │ └── schemas.py # Pydantic models
│ └── core/
│ ├── config.py # Settings management
│ ├── database.py # DB connection
│ └── llm.py # LLM client factory
├── tests/
│ ├── conftest.py
│ ├── test_tools/
│ ├── test_agents/
│ └── fixtures/
│ └── conversations.json
├── db/
│ ├── init.sql
│ └── migrations/
├── docker-compose.yml
├── Dockerfile
├── pyproject.toml
├── .env.example
├── .gitignore
└── README.md
Debugging Tips for Agent Development
Log Every LLM Interaction
The single most useful debugging technique for agentic AI is logging every LLM request and response. Use a middleware or wrapper:
import structlog
import json
logger = structlog.get_logger()
def log_llm_call(messages, response, duration_ms):
logger.info(
"llm_call",
model=response.model,
input_tokens=response.usage.input_tokens,
output_tokens=response.usage.output_tokens,
stop_reason=response.stop_reason,
duration_ms=duration_ms,
tool_calls=[
b.name for b in response.content
if hasattr(b, "name")
],
)
Use Breakpoints in the Agent Loop
Set breakpoints at key points in the agent loop: after the LLM response, before tool execution, and after tool results are formatted. This lets you inspect the agent's reasoning at each step.
Replay Conversations
Save conversation histories as JSON fixtures. When you encounter a bug, save the conversation state and replay it deterministically in tests. This is far more effective than trying to reproduce non-deterministic agent behavior manually.
Frequently Asked Questions
Do I need a GPU for agentic AI development?
No. Most agentic AI development uses cloud-hosted models (Claude, GPT-4o) via API calls, which require no local GPU. A GPU is only needed if you want to run local models (Llama, Mistral) for development and testing without API costs, or if you generate embeddings locally for RAG. A modern laptop with 16GB RAM is sufficient for most agentic AI development work. Consider using a cloud GPU instance (Lambda, RunPod, or a cloud provider) for occasional local model testing rather than investing in a dedicated GPU machine.
What Python version should I use?
Use Python 3.11 or 3.12. Both the Anthropic and OpenAI SDKs require Python 3.9+, but 3.11 and 3.12 offer significant performance improvements and better error messages. Avoid Python 3.13 if you rely on libraries that have not yet updated their C extensions. Use pyenv to manage multiple Python versions and create virtual environments per project.
Should I use virtual environments or Docker for Python dependencies?
Use both. Virtual environments (venv or uv) for local development give you fast iteration with IDE integration. Docker for running services (databases, vector stores, local models) that your agent depends on. Your agent code runs locally in the virtual environment and connects to Dockerized services. For deployment, package everything in Docker. This approach gives you the best developer experience while maintaining production parity.
How do I manage multiple LLM API keys across projects?
Use a .env file per project with python-dotenv for loading. For shared keys across projects, use direnv with a ~/.envrc file that exports common variables, or use a secrets manager like 1Password CLI. Never set API keys as global environment variables in your shell profile — this makes them available to every process on your machine, which is a security risk.
How do I debug streaming agent responses?
Streaming complicates debugging because you cannot inspect the full response at a breakpoint. Two strategies: (1) Add a debug mode flag that disables streaming and uses the synchronous API instead, making the full response available for inspection. (2) Accumulate streamed chunks into a buffer and log the complete response after streaming finishes. Use the VS Code debug console to inspect the accumulated buffer at breakpoints after the stream completes.
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.