AI Agent for Code Teaching: Interactive Programming Tutorials with Live Feedback
Build an AI code teaching agent with sandboxed execution, intelligent error analysis, graduated hint generation, and progress tracking across programming concepts.
The Challenge of Teaching Code with AI
Teaching programming is fundamentally different from teaching other subjects because code either works or it does not. A student can have the right idea but a single misplaced colon produces an error that feels indistinguishable from complete confusion. An effective code teaching agent needs to do three things that generic chatbots cannot: execute code safely, analyze errors with educational context, and provide graduated hints that guide without revealing the answer.
Sandboxed Code Execution
The first component is a safe execution environment. Never run student code directly in your main process. Use subprocess isolation with strict resource limits:
import subprocess
import tempfile
import os
from dataclasses import dataclass
from typing import Optional
@dataclass
class ExecutionResult:
stdout: str
stderr: str
return_code: int
timed_out: bool
error_type: Optional[str] = None
error_line: Optional[int] = None
def execute_student_code(
code: str,
timeout_seconds: int = 10,
max_memory_mb: int = 128,
) -> ExecutionResult:
"""Execute student code in an isolated subprocess."""
with tempfile.NamedTemporaryFile(
mode="w", suffix=".py", delete=False
) as f:
f.write(code)
temp_path = f.name
try:
result = subprocess.run(
["python3", temp_path],
capture_output=True,
text=True,
timeout=timeout_seconds,
env={
"PATH": os.environ.get("PATH", ""),
"HOME": "/tmp",
},
)
error_type, error_line = parse_error(result.stderr)
return ExecutionResult(
stdout=result.stdout,
stderr=result.stderr,
return_code=result.returncode,
timed_out=False,
error_type=error_type,
error_line=error_line,
)
except subprocess.TimeoutExpired:
return ExecutionResult(
stdout="",
stderr="Code execution timed out. Check for infinite loops.",
return_code=-1,
timed_out=True,
error_type="TimeoutError",
)
finally:
os.unlink(temp_path)
def parse_error(stderr: str) -> tuple[Optional[str], Optional[int]]:
"""Extract error type and line number from Python traceback."""
if not stderr:
return None, None
lines = stderr.strip().split("\n")
error_type = None
error_line = None
for line in lines:
if "line " in line and "File" in line:
try:
parts = line.split("line ")
error_line = int(parts[-1].split(",")[0].split()[0])
except (ValueError, IndexError):
pass
if lines and "Error" in lines[-1]:
error_type = lines[-1].split(":")[0].strip()
return error_type, error_line
Error Analysis Agent
Raw error messages confuse beginners. The error analysis agent translates Python tracebacks into educational explanations that identify what went wrong and why:
from agents import Agent, Runner
error_analyzer = Agent(
name="Error Analyzer",
instructions="""You analyze Python error messages for students
learning to program. For each error:
1. State what the error means in plain language
2. Identify the exact line and character causing it
3. Explain WHY this error occurs conceptually
4. Describe the common mistake pattern that causes it
5. Do NOT provide the fix — just explain the problem
Common beginner error patterns to recognize:
- IndentationError: mixing tabs/spaces, forgetting to indent after colon
- NameError: typos in variable names, using before assignment
- TypeError: operating on incompatible types, wrong argument count
- IndexError: off-by-one errors, empty list access
- SyntaxError: missing colons, unmatched parentheses
Adjust your explanation complexity based on the student's level.""",
)
Graduated Hint System
The hint system provides three levels of assistance. The student gets the most minimal hint first and can ask for more if needed:
from agents import function_tool
import json
@dataclass
class ChallengeState:
challenge_id: str
description: str
solution_code: str
hint_level: int = 0 # 0 = no hints, 1-3 = increasing help
attempts: int = 0
student_code: str = ""
HINT_STRATEGIES = {
1: "Give a conceptual direction hint. Name the concept or "
"approach needed without referencing specific code. Example: "
"'Think about how you would repeat something N times.'",
2: "Give a structural hint. Describe the code structure needed "
"without writing actual code. Example: 'You need a loop that "
"iterates over each character in the string, with an if-check "
"inside.'",
3: "Give a near-solution hint. Show pseudocode or a partial "
"implementation with key parts replaced by comments. Do NOT "
"show the complete solution.",
}
@function_tool
def get_hint(
challenge_id: str,
student_code: str,
hint_level: int,
challenge_description: str,
) -> str:
"""Generate a graduated hint for the student."""
strategy = HINT_STRATEGIES.get(
min(hint_level, 3),
HINT_STRATEGIES[3],
)
return json.dumps({
"hint_level": hint_level,
"strategy": strategy,
"max_hints": 3,
"remaining": max(0, 3 - hint_level),
})
Progress Tracking Across Concepts
Track which programming concepts the student has demonstrated understanding of:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
@dataclass
class ConceptMastery:
concept: str
times_demonstrated: int = 0
times_struggled: int = 0
challenges_completed: list[str] = field(default_factory=list)
@property
def mastery_score(self) -> float:
total = self.times_demonstrated + self.times_struggled
if total == 0:
return 0.0
return self.times_demonstrated / total
PROGRAMMING_CONCEPTS = [
"variables", "data_types", "conditionals", "for_loops",
"while_loops", "functions", "parameters", "return_values",
"lists", "dictionaries", "string_methods", "list_comprehensions",
"file_io", "error_handling", "classes", "inheritance",
]
@dataclass
class StudentProgress:
student_id: str
concepts: dict[str, ConceptMastery] = field(default_factory=dict)
completed_challenges: list[str] = field(default_factory=list)
def suggest_next_topic(self) -> str:
"""Suggest the next concept to learn based on prerequisites."""
for concept in PROGRAMMING_CONCEPTS:
mastery = self.concepts.get(concept)
if mastery is None or mastery.mastery_score < 0.6:
return concept
return "advanced_topics"
def record_attempt(self, concept: str, succeeded: bool):
if concept not in self.concepts:
self.concepts[concept] = ConceptMastery(concept=concept)
entry = self.concepts[concept]
if succeeded:
entry.times_demonstrated += 1
else:
entry.times_struggled += 1
The Code Teaching Agent
Combine all components into the teaching agent that presents challenges, runs code, analyzes errors, and tracks progress:
async def run_code_challenge(
student_id: str,
challenge: ChallengeState,
student_code: str,
):
# Execute the code
result = execute_student_code(student_code)
# Build context for the teaching agent
context = {
"challenge": challenge.description,
"student_code": student_code,
"execution_result": {
"stdout": result.stdout,
"stderr": result.stderr,
"success": result.return_code == 0,
"error_type": result.error_type,
},
"attempt_number": challenge.attempts + 1,
"hints_used": challenge.hint_level,
}
teacher = Agent(
name="Code Teacher",
instructions=f"""You are teaching Python programming.
The student is working on this challenge:
{challenge.description}
Their code produced: {'success' if result.return_code == 0
else f'error: {result.error_type}'}
If the code succeeded, congratulate them, explain what they did well,
and suggest how it could be improved.
If the code failed, analyze the error educationally. Do NOT give the
answer. Guide them to discover the fix themselves. Ask a targeted
question about the specific line that failed.""",
tools=[get_hint],
)
response = await Runner.run(
teacher, json.dumps(context)
)
return response.final_output
FAQ
Is subprocess isolation secure enough for production use?
For a learning environment, subprocess isolation with timeouts and environment restrictions is a reasonable starting point. For production deployment serving untrusted users, use container-level isolation with Docker or a dedicated sandbox service like Pyodide running in WebAssembly. The key principle is the same — never execute student code in your main application process.
How does the agent decide which concept a student is struggling with from their code?
The agent analyzes both the error type and the code structure. A NameError on a loop variable suggests confusion about scope, while a TypeError in a function call suggests confusion about argument types. The error analyzer maps Python exception types to programming concepts, and the progress tracker records which concepts are associated with each challenge.
Can this approach scale to languages beyond Python?
Yes. The execution sandbox is the only language-specific component. Replace python3 with node, javac && java, or rustc && ./a.out for other languages. The error analyzer needs language-specific error patterns, but the teaching agent, hint system, and progress tracker are language-agnostic. You would update the PROGRAMMING_CONCEPTS list to reflect language-specific topics.
#CodeEducation #ProgrammingTutor #SandboxExecution #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.