Agentic AI Development Environment: VS Code, Docker, and GPU Setup Guide

Why Your Dev Environment Matters for Agentic AI

Agentic AI development has unique requirements that a standard web development setup does not cover. You need to manage API keys for multiple LLM providers, run local model servers for testing, handle streaming responses, debug non-deterministic agent behavior, and sometimes leverage GPU hardware for local inference or embedding generation.

A well-configured development environment reduces the friction between writing code and testing agent behavior. This guide walks you through setting up a complete agentic AI development environment using VS Code, Docker, and optional GPU support.

VS Code Configuration

Essential Extensions

Install these extensions for an optimal agentic AI development experience:

Python Development:

Python (ms-python.python) — Core Python support
Pylance (ms-python.vscode-pylance) — Fast, feature-rich Python language server
Ruff (charliermarsh.ruff) — Extremely fast Python linter and formatter (replaces Black, isort, and Flake8)

AI and Agent Development:

Continue (continue.continue) — AI code assistant that works with Claude and other models
REST Client (humao.rest-client) — Test API endpoints directly from VS Code
Thunder Client (rangav.vscode-thunder-client) — GUI-based API testing

Infrastructure:

Docker (ms-azuretools.vscode-docker) — Docker file support and container management
YAML (redhat.vscode-yaml) — YAML validation for Docker Compose and Kubernetes configs
Remote - SSH (ms-vscode-remote.remote-ssh) — Develop on remote GPU machines seamlessly

VS Code Settings

Configure VS Code for Python agentic AI development:

{
  "python.defaultInterpreterPath": "./venv/bin/python",
  "python.analysis.typeCheckingMode": "basic",
  "editor.formatOnSave": true,
  "[python]": {
    "editor.defaultFormatter": "charliermarsh.ruff",
    "editor.codeActionsOnSave": {
      "source.fixAll": "explicit",
      "source.organizeImports": "explicit"
    }
  },
  "files.associations": {
    "*.env": "dotenv",
    "*.env.*": "dotenv"
  },
  "editor.rulers": [88],
  "files.exclude": {
    "**/__pycache__": true,
    "**/.pytest_cache": true,
    "**/node_modules": true
  }
}

Launch Configuration for Debugging

Debugging agentic AI code requires special configuration because agent loops are often async and involve external API calls. Create a .vscode/launch.json:

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Debug Agent Server",
      "type": "debugpy",
      "request": "launch",
      "module": "uvicorn",
      "args": [
        "app.main:app",
        "--reload",
        "--port", "8000"
      ],
      "env": {
        "ANTHROPIC_API_KEY": "${env:ANTHROPIC_API_KEY}",
        "OPENAI_API_KEY": "${env:OPENAI_API_KEY}",
        "LOG_LEVEL": "DEBUG"
      },
      "console": "integratedTerminal",
      "justMyCode": false
    },
    {
      "name": "Debug Agent Script",
      "type": "debugpy",
      "request": "launch",
      "program": "${file}",
      "env": {
        "ANTHROPIC_API_KEY": "${env:ANTHROPIC_API_KEY}",
        "OPENAI_API_KEY": "${env:OPENAI_API_KEY}"
      },
      "console": "integratedTerminal"
    },
    {
      "name": "Debug Tests",
      "type": "debugpy",
      "request": "launch",
      "module": "pytest",
      "args": [
        "${file}",
        "-v",
        "--tb=short"
      ],
      "console": "integratedTerminal"
    }
  ]
}

The justMyCode: false setting is important — it lets you step into framework code (Anthropic SDK, OpenAI SDK) when debugging agent behavior.

Environment Variable Management

The .env File Structure

Agentic AI projects typically need many environment variables. Organize them clearly:

# .env

# ── LLM Providers ──
ANTHROPIC_API_KEY=sk-ant-your-key-here
OPENAI_API_KEY=sk-proj-your-key-here
GOOGLE_API_KEY=your-google-key-here

# ── Database ──
DATABASE_URL=postgresql://user:pass@localhost:5432/agents
REDIS_URL=redis://localhost:6379/0

# ── Vector Database ──
QDRANT_URL=http://localhost:6333
PINECONE_API_KEY=your-pinecone-key

# ── Observability ──
LANGSMITH_API_KEY=your-langsmith-key
LANGSMITH_PROJECT=my-agent-project

# ── Application ──
APP_ENV=development
LOG_LEVEL=DEBUG
AGENT_MAX_ITERATIONS=10
AGENT_TIMEOUT_SECONDS=30

Security Best Practices

Never commit API keys to version control. Create a .env.example file with placeholder values and add .env to .gitignore:

# .gitignore
.env
.env.local
.env.*.local

For team development, use a shared secrets manager. Options include:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

1Password CLI — op run -- python main.py injects secrets at runtime
Doppler — Syncs secrets across environments and team members
AWS Secrets Manager — Good for teams already on AWS
HashiCorp Vault — Self-hosted, enterprise-grade

Load environment variables in your Python code with python-dotenv:

from dotenv import load_dotenv
import os

load_dotenv()  # loads .env file

anthropic_key = os.getenv("ANTHROPIC_API_KEY")
if not anthropic_key:
    raise ValueError("ANTHROPIC_API_KEY not set")

Docker Compose for Local Services

A Docker Compose file lets you spin up all the services your agent needs with one command. Here is a production-grade setup:

# docker-compose.yml
version: "3.9"

services:
  # ── PostgreSQL with pgvector ──
  postgres:
    image: pgvector/pgvector:pg16
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: agents
      POSTGRES_PASSWORD: localdev
      POSTGRES_DB: agents_dev
    volumes:
      - pgdata:/var/lib/postgresql/data
      - ./db/init.sql:/docker-entrypoint-initdb.d/init.sql
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U agents"]
      interval: 5s
      timeout: 3s
      retries: 5

  # ── Redis for caching and sessions ──
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redisdata:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5

  # ── Qdrant vector database ──
  qdrant:
    image: qdrant/qdrant:v1.12.0
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - qdrant_data:/qdrant/storage
    environment:
      QDRANT__SERVICE__GRPC_PORT: 6334

  # ── Local LLM server (Ollama) ──
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    # Uncomment for GPU support:
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]

  # ── Observability: Jaeger for tracing ──
  jaeger:
    image: jaegertracing/all-in-one:1.55
    ports:
      - "16686:16686"  # UI
      - "4317:4317"    # OTLP gRPC
      - "4318:4318"    # OTLP HTTP

volumes:
  pgdata:
  redisdata:
  qdrant_data:
  ollama_data:

Start all services:

docker compose up -d

Verify everything is running:

docker compose ps
docker compose logs postgres --tail 20

GPU Setup for Local Inference

If you run local models (via Ollama, vLLM, or text-generation-inference), GPU acceleration dramatically improves inference speed.

NVIDIA GPU Setup on Ubuntu

# Install NVIDIA drivers
sudo apt update
sudo apt install -y nvidia-driver-550

# Verify driver installation
nvidia-smi

# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey   | sudo gpg --dearmor -o     /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list   | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g'   | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt update
sudo apt install -y nvidia-container-toolkit

# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify GPU access in Docker
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi

Running Local Models with Ollama

Once Docker GPU support is configured, run Ollama with GPU acceleration:

# Pull a model
docker compose exec ollama ollama pull llama3.3:8b

# Test inference
docker compose exec ollama ollama run llama3.3:8b   "Explain agentic AI in one paragraph"

Use the local model in your agent code by pointing to the Ollama API:

from openai import OpenAI

# Ollama exposes an OpenAI-compatible API
local_client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",  # Required but not validated
)

response = local_client.chat.completions.create(
    model="llama3.3:8b",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)
print(response.choices[0].message.content)

This is useful for development and testing where you do not want to burn API credits for every debug iteration.

Project Structure

A well-organized project structure makes navigation intuitive and testing straightforward:

my-agent-project/
├── app/
│   ├── __init__.py
│   ├── main.py              # FastAPI entry point
│   ├── agents/
│   │   ├── __init__.py
│   │   ├── base.py           # Agent base class
│   │   ├── triage.py         # Triage agent
│   │   └── specialists/
│   │       ├── support.py
│   │       └── billing.py
│   ├── tools/
│   │   ├── __init__.py
│   │   ├── database.py       # Database query tools
│   │   ├── email.py          # Email sending tools
│   │   └── search.py         # Knowledge base search
│   ├── models/
│   │   ├── __init__.py
│   │   └── schemas.py        # Pydantic models
│   └── core/
│       ├── config.py          # Settings management
│       ├── database.py        # DB connection
│       └── llm.py             # LLM client factory
├── tests/
│   ├── conftest.py
│   ├── test_tools/
│   ├── test_agents/
│   └── fixtures/
│       └── conversations.json
├── db/
│   ├── init.sql
│   └── migrations/
├── docker-compose.yml
├── Dockerfile
├── pyproject.toml
├── .env.example
├── .gitignore
└── README.md

Debugging Tips for Agent Development

Log Every LLM Interaction

The single most useful debugging technique for agentic AI is logging every LLM request and response. Use a middleware or wrapper:

import structlog
import json

logger = structlog.get_logger()

def log_llm_call(messages, response, duration_ms):
    logger.info(
        "llm_call",
        model=response.model,
        input_tokens=response.usage.input_tokens,
        output_tokens=response.usage.output_tokens,
        stop_reason=response.stop_reason,
        duration_ms=duration_ms,
        tool_calls=[
            b.name for b in response.content
            if hasattr(b, "name")
        ],
    )

Use Breakpoints in the Agent Loop

Set breakpoints at key points in the agent loop: after the LLM response, before tool execution, and after tool results are formatted. This lets you inspect the agent's reasoning at each step.

Replay Conversations

Save conversation histories as JSON fixtures. When you encounter a bug, save the conversation state and replay it deterministically in tests. This is far more effective than trying to reproduce non-deterministic agent behavior manually.

Frequently Asked Questions

Do I need a GPU for agentic AI development?

No. Most agentic AI development uses cloud-hosted models (Claude, GPT-4o) via API calls, which require no local GPU. A GPU is only needed if you want to run local models (Llama, Mistral) for development and testing without API costs, or if you generate embeddings locally for RAG. A modern laptop with 16GB RAM is sufficient for most agentic AI development work. Consider using a cloud GPU instance (Lambda, RunPod, or a cloud provider) for occasional local model testing rather than investing in a dedicated GPU machine.

What Python version should I use?

Use Python 3.11 or 3.12. Both the Anthropic and OpenAI SDKs require Python 3.9+, but 3.11 and 3.12 offer significant performance improvements and better error messages. Avoid Python 3.13 if you rely on libraries that have not yet updated their C extensions. Use pyenv to manage multiple Python versions and create virtual environments per project.

Should I use virtual environments or Docker for Python dependencies?

Use both. Virtual environments (venv or uv) for local development give you fast iteration with IDE integration. Docker for running services (databases, vector stores, local models) that your agent depends on. Your agent code runs locally in the virtual environment and connects to Dockerized services. For deployment, package everything in Docker. This approach gives you the best developer experience while maintaining production parity.

How do I manage multiple LLM API keys across projects?

Use a .env file per project with python-dotenv for loading. For shared keys across projects, use direnv with a ~/.envrc file that exports common variables, or use a secrets manager like 1Password CLI. Never set API keys as global environment variables in your shell profile — this makes them available to every process on your machine, which is a security risk.

How do I debug streaming agent responses?

Streaming complicates debugging because you cannot inspect the full response at a breakpoint. Two strategies: (1) Add a debug mode flag that disables streaming and uses the synchronous API instead, making the full response available for inspection. (2) Accumulate streamed chunks into a buffer and log the complete response after streaming finishes. Use the VS Code debug console to inspect the accumulated buffer at breakpoints after the stream completes.