AI Agent for Code Teaching: Interactive Programming Tutorials with Live Feedback

The Challenge of Teaching Code with AI

Teaching programming is fundamentally different from teaching other subjects because code either works or it does not. A student can have the right idea but a single misplaced colon produces an error that feels indistinguishable from complete confusion. An effective code teaching agent needs to do three things that generic chatbots cannot: execute code safely, analyze errors with educational context, and provide graduated hints that guide without revealing the answer.

Sandboxed Code Execution

The first component is a safe execution environment. Never run student code directly in your main process. Use subprocess isolation with strict resource limits:

import subprocess
import tempfile
import os
from dataclasses import dataclass
from typing import Optional

@dataclass
class ExecutionResult:
    stdout: str
    stderr: str
    return_code: int
    timed_out: bool
    error_type: Optional[str] = None
    error_line: Optional[int] = None

def execute_student_code(
    code: str,
    timeout_seconds: int = 10,
    max_memory_mb: int = 128,
) -> ExecutionResult:
    """Execute student code in an isolated subprocess."""
    with tempfile.NamedTemporaryFile(
        mode="w", suffix=".py", delete=False
    ) as f:
        f.write(code)
        temp_path = f.name

    try:
        result = subprocess.run(
            ["python3", temp_path],
            capture_output=True,
            text=True,
            timeout=timeout_seconds,
            env={
                "PATH": os.environ.get("PATH", ""),
                "HOME": "/tmp",
            },
        )
        error_type, error_line = parse_error(result.stderr)
        return ExecutionResult(
            stdout=result.stdout,
            stderr=result.stderr,
            return_code=result.returncode,
            timed_out=False,
            error_type=error_type,
            error_line=error_line,
        )
    except subprocess.TimeoutExpired:
        return ExecutionResult(
            stdout="",
            stderr="Code execution timed out. Check for infinite loops.",
            return_code=-1,
            timed_out=True,
            error_type="TimeoutError",
        )
    finally:
        os.unlink(temp_path)

def parse_error(stderr: str) -> tuple[Optional[str], Optional[int]]:
    """Extract error type and line number from Python traceback."""
    if not stderr:
        return None, None

    lines = stderr.strip().split("\n")
    error_type = None
    error_line = None

    for line in lines:
        if "line " in line and "File" in line:
            try:
                parts = line.split("line ")
                error_line = int(parts[-1].split(",")[0].split()[0])
            except (ValueError, IndexError):
                pass
        if lines and "Error" in lines[-1]:
            error_type = lines[-1].split(":")[0].strip()

    return error_type, error_line

Error Analysis Agent

Raw error messages confuse beginners. The error analysis agent translates Python tracebacks into educational explanations that identify what went wrong and why:

from agents import Agent, Runner

error_analyzer = Agent(
    name="Error Analyzer",
    instructions="""You analyze Python error messages for students
learning to program. For each error:

1. State what the error means in plain language
2. Identify the exact line and character causing it
3. Explain WHY this error occurs conceptually
4. Describe the common mistake pattern that causes it
5. Do NOT provide the fix — just explain the problem

Common beginner error patterns to recognize:
- IndentationError: mixing tabs/spaces, forgetting to indent after colon
- NameError: typos in variable names, using before assignment
- TypeError: operating on incompatible types, wrong argument count
- IndexError: off-by-one errors, empty list access
- SyntaxError: missing colons, unmatched parentheses

Adjust your explanation complexity based on the student's level.""",
)

Graduated Hint System

The hint system provides three levels of assistance. The student gets the most minimal hint first and can ask for more if needed:

from agents import function_tool
import json

@dataclass
class ChallengeState:
    challenge_id: str
    description: str
    solution_code: str
    hint_level: int = 0  # 0 = no hints, 1-3 = increasing help
    attempts: int = 0
    student_code: str = ""

HINT_STRATEGIES = {
    1: "Give a conceptual direction hint. Name the concept or "
       "approach needed without referencing specific code. Example: "
       "'Think about how you would repeat something N times.'",
    2: "Give a structural hint. Describe the code structure needed "
       "without writing actual code. Example: 'You need a loop that "
       "iterates over each character in the string, with an if-check "
       "inside.'",
    3: "Give a near-solution hint. Show pseudocode or a partial "
       "implementation with key parts replaced by comments. Do NOT "
       "show the complete solution.",
}

@function_tool
def get_hint(
    challenge_id: str,
    student_code: str,
    hint_level: int,
    challenge_description: str,
) -> str:
    """Generate a graduated hint for the student."""
    strategy = HINT_STRATEGIES.get(
        min(hint_level, 3),
        HINT_STRATEGIES[3],
    )
    return json.dumps({
        "hint_level": hint_level,
        "strategy": strategy,
        "max_hints": 3,
        "remaining": max(0, 3 - hint_level),
    })

Progress Tracking Across Concepts

Track which programming concepts the student has demonstrated understanding of:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

@dataclass
class ConceptMastery:
    concept: str
    times_demonstrated: int = 0
    times_struggled: int = 0
    challenges_completed: list[str] = field(default_factory=list)

    @property
    def mastery_score(self) -> float:
        total = self.times_demonstrated + self.times_struggled
        if total == 0:
            return 0.0
        return self.times_demonstrated / total

PROGRAMMING_CONCEPTS = [
    "variables", "data_types", "conditionals", "for_loops",
    "while_loops", "functions", "parameters", "return_values",
    "lists", "dictionaries", "string_methods", "list_comprehensions",
    "file_io", "error_handling", "classes", "inheritance",
]

@dataclass
class StudentProgress:
    student_id: str
    concepts: dict[str, ConceptMastery] = field(default_factory=dict)
    completed_challenges: list[str] = field(default_factory=list)

    def suggest_next_topic(self) -> str:
        """Suggest the next concept to learn based on prerequisites."""
        for concept in PROGRAMMING_CONCEPTS:
            mastery = self.concepts.get(concept)
            if mastery is None or mastery.mastery_score < 0.6:
                return concept
        return "advanced_topics"

    def record_attempt(self, concept: str, succeeded: bool):
        if concept not in self.concepts:
            self.concepts[concept] = ConceptMastery(concept=concept)
        entry = self.concepts[concept]
        if succeeded:
            entry.times_demonstrated += 1
        else:
            entry.times_struggled += 1

The Code Teaching Agent

Combine all components into the teaching agent that presents challenges, runs code, analyzes errors, and tracks progress:

async def run_code_challenge(
    student_id: str,
    challenge: ChallengeState,
    student_code: str,
):
    # Execute the code
    result = execute_student_code(student_code)

    # Build context for the teaching agent
    context = {
        "challenge": challenge.description,
        "student_code": student_code,
        "execution_result": {
            "stdout": result.stdout,
            "stderr": result.stderr,
            "success": result.return_code == 0,
            "error_type": result.error_type,
        },
        "attempt_number": challenge.attempts + 1,
        "hints_used": challenge.hint_level,
    }

    teacher = Agent(
        name="Code Teacher",
        instructions=f"""You are teaching Python programming.
The student is working on this challenge:
{challenge.description}

Their code produced: {'success' if result.return_code == 0
                      else f'error: {result.error_type}'}

If the code succeeded, congratulate them, explain what they did well,
and suggest how it could be improved.

If the code failed, analyze the error educationally. Do NOT give the
answer. Guide them to discover the fix themselves. Ask a targeted
question about the specific line that failed.""",
        tools=[get_hint],
    )

    response = await Runner.run(
        teacher, json.dumps(context)
    )
    return response.final_output

FAQ

Is subprocess isolation secure enough for production use?

For a learning environment, subprocess isolation with timeouts and environment restrictions is a reasonable starting point. For production deployment serving untrusted users, use container-level isolation with Docker or a dedicated sandbox service like Pyodide running in WebAssembly. The key principle is the same — never execute student code in your main application process.

How does the agent decide which concept a student is struggling with from their code?

The agent analyzes both the error type and the code structure. A NameError on a loop variable suggests confusion about scope, while a TypeError in a function call suggests confusion about argument types. The error analyzer maps Python exception types to programming concepts, and the progress tracker records which concepts are associated with each challenge.

Can this approach scale to languages beyond Python?

Yes. The execution sandbox is the only language-specific component. Replace python3 with node, javac && java, or rustc && ./a.out for other languages. The error analyzer needs language-specific error patterns, but the teaching agent, hint system, and progress tracker are language-agnostic. You would update the PROGRAMMING_CONCEPTS list to reflect language-specific topics.

#CodeEducation #ProgrammingTutor #SandboxExecution #Python #AgenticAI #LearnAI #AIEngineering

AI Agent for Code Teaching: Interactive Programming Tutorials with Live Feedback

The Challenge of Teaching Code with AI

Sandboxed Code Execution

Error Analysis Agent

Graduated Hint System

Progress Tracking Across Concepts

The Code Teaching Agent

FAQ

Is subprocess isolation secure enough for production use?

How does the agent decide which concept a student is struggling with from their code?

Can this approach scale to languages beyond Python?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding