AI Agent for Interview Preparation: Mock Interviews with Adaptive Questions

Why Mock Interviews Need Adaptive AI

Practicing for interviews with a static question list misses the most important dynamic: real interviews adapt. If a candidate gives a strong answer about system design, the interviewer probes deeper. If they struggle with concurrency, the interviewer might simplify or pivot to gauge the candidate's boundaries. An AI mock interview agent replicates this adaptive behavior, selecting follow-up questions based on the candidate's demonstrated strengths and weaknesses.

Interview Configuration Model

Start by defining the interview structure and question bank:

from dataclasses import dataclass, field
from enum import Enum
from typing import Optional

class InterviewTrack(str, Enum):
    BEHAVIORAL = "behavioral"
    TECHNICAL = "technical"
    SYSTEM_DESIGN = "system_design"
    CODING = "coding"

class DifficultyLevel(str, Enum):
    EASY = "easy"
    MEDIUM = "medium"
    HARD = "hard"
    EXPERT = "expert"

@dataclass
class InterviewQuestion:
    question_id: str
    text: str
    track: InterviewTrack
    difficulty: DifficultyLevel
    follow_ups: list[str] = field(default_factory=list)
    evaluation_criteria: list[str] = field(default_factory=list)
    ideal_answer_points: list[str] = field(default_factory=list)
    time_limit_minutes: int = 5
    tags: list[str] = field(default_factory=list)

@dataclass
class InterviewConfig:
    role: str
    company: str
    tracks: list[InterviewTrack]
    total_duration_minutes: int = 45
    difficulty_start: DifficultyLevel = DifficultyLevel.MEDIUM
    questions_per_track: int = 3

BEHAVIORAL_QUESTIONS = [
    InterviewQuestion(
        question_id="beh-001",
        text="Tell me about a time you had to make a difficult "
             "technical decision with incomplete information.",
        track=InterviewTrack.BEHAVIORAL,
        difficulty=DifficultyLevel.MEDIUM,
        follow_ups=[
            "What data would have changed your decision?",
            "How did you communicate the risk to stakeholders?",
            "What would you do differently now?",
        ],
        evaluation_criteria=[
            "Uses STAR format (Situation, Task, Action, Result)",
            "Shows decision-making process, not just outcome",
            "Demonstrates learning and self-awareness",
            "Quantifies impact where possible",
        ],
        ideal_answer_points=[
            "Specific situation with context",
            "Clear description of the tradeoffs considered",
            "Action taken with reasoning",
            "Measurable result or outcome",
            "Reflection on what was learned",
        ],
    ),
]

Adaptive Question Selection

The question selector adjusts difficulty and topic based on how the candidate has performed so far:

@dataclass
class CandidatePerformance:
    candidate_id: str
    role: str
    answers: list[dict] = field(default_factory=list)
    track_scores: dict[str, list[float]] = field(default_factory=dict)
    current_difficulty: dict[str, str] = field(default_factory=dict)
    asked_questions: set[str] = field(default_factory=set)

    def get_track_average(self, track: str) -> float:
        scores = self.track_scores.get(track, [])
        if not scores:
            return 0.5
        return sum(scores) / len(scores)

    def adjust_difficulty(self, track: str, score: float):
        """Increase difficulty on strong answers, decrease on weak."""
        levels = list(DifficultyLevel)
        current = DifficultyLevel(
            self.current_difficulty.get(track, "medium")
        )
        current_idx = levels.index(current)

        if score >= 0.8 and current_idx < len(levels) - 1:
            self.current_difficulty[track] = levels[current_idx + 1].value
        elif score < 0.4 and current_idx > 0:
            self.current_difficulty[track] = levels[current_idx - 1].value

def select_next_question(
    performance: CandidatePerformance,
    question_bank: list[InterviewQuestion],
    track: InterviewTrack,
) -> Optional[InterviewQuestion]:
    """Select the next question based on adaptive difficulty."""
    target_difficulty = DifficultyLevel(
        performance.current_difficulty.get(track.value, "medium")
    )

    # Filter to unasked questions in the right track and difficulty
    candidates = [
        q for q in question_bank
        if q.track == track
        and q.difficulty == target_difficulty
        and q.question_id not in performance.asked_questions
    ]

    if not candidates:
        # Fall back to adjacent difficulties
        candidates = [
            q for q in question_bank
            if q.track == track
            and q.question_id not in performance.asked_questions
        ]

    if not candidates:
        return None

    # Prefer questions that test areas where candidate is weakest
    return candidates[0]

Response Evaluation Agent

The evaluation agent scores candidate responses against structured criteria:

from agents import Agent, Runner
from pydantic import BaseModel

class ResponseEvaluation(BaseModel):
    score: float  # 0.0 to 1.0
    criteria_met: list[str]
    criteria_missed: list[str]
    strengths: list[str]
    improvement_areas: list[str]
    follow_up_question: Optional[str] = None
    detailed_feedback: str

behavioral_evaluator = Agent(
    name="Behavioral Interview Evaluator",
    instructions="""You evaluate behavioral interview answers. Score
each response on a 0.0-1.0 scale using these criteria:

STAR FORMAT (25% of score):
- Clear Situation with relevant context
- Specific Task or challenge described
- Detailed Actions taken (not just "we" — what did YOU do?)
- Quantifiable Results or outcomes

COMMUNICATION (25% of score):
- Concise and focused (2-3 minutes ideal)
- Logical narrative flow
- Appropriate level of detail

SUBSTANCE (25% of score):
- Shows relevant skills for the role
- Demonstrates impact and ownership
- Includes specific metrics or outcomes

SELF-AWARENESS (25% of score):
- Acknowledges challenges honestly
- Shows learning and growth
- Demonstrates adaptability

If the answer is strong, generate a harder follow-up question that
probes deeper. If weak, note specific coaching advice.""",
    output_type=ResponseEvaluation,
)

technical_evaluator = Agent(
    name="Technical Interview Evaluator",
    instructions="""You evaluate technical interview answers. Score
each response on a 0.0-1.0 scale using these criteria:

ACCURACY (30% of score):
- Technically correct statements
- Proper use of terminology
- No critical misconceptions

DEPTH (25% of score):
- Goes beyond surface-level explanation
- Discusses tradeoffs and alternatives
- Shows understanding of WHY, not just WHAT

PROBLEM SOLVING (25% of score):
- Structured approach to the problem
- Considers edge cases
- Breaks down complex problems into parts

COMMUNICATION (20% of score):
- Explains technical concepts clearly
- Uses appropriate abstractions
- Asks clarifying questions when needed

Generate a follow-up that tests the boundaries of their knowledge.""",
    output_type=ResponseEvaluation,
)

The Mock Interview Agent

Combine question selection, the interview flow, and evaluation into a cohesive interview session:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from agents import function_tool
import json

@function_tool
def evaluate_answer(
    question_id: str,
    candidate_answer: str,
    track: str,
    evaluation_criteria: str,
) -> str:
    """Evaluate a candidate's interview answer."""
    # In production, dispatch to behavioral or technical evaluator
    return json.dumps({
        "status": "evaluated",
        "question_id": question_id,
    })

def build_interviewer_instructions(
    config: InterviewConfig,
    performance: CandidatePerformance,
    current_question: InterviewQuestion,
) -> str:
    track_avg = performance.get_track_average(
        current_question.track.value
    )

    return f"""You are conducting a mock interview for a
{config.role} position at {config.company}.

CURRENT QUESTION: {current_question.text}
TRACK: {current_question.track.value}
DIFFICULTY: {current_question.difficulty.value}

EVALUATION CRITERIA:
{chr(10).join(f'- {c}' for c in current_question.evaluation_criteria)}

IDEAL ANSWER SHOULD COVER:
{chr(10).join(f'- {p}' for p in current_question.ideal_answer_points)}

Candidate's average performance in this track: {track_avg:.0%}

INTERVIEWER BEHAVIOR:
- Act like a real interviewer — professional but friendly
- Listen actively and ask relevant follow-up questions
- If the candidate is struggling, provide a gentle nudge
- After the candidate finishes, provide structured feedback
- Use follow-up questions to probe depth: {current_question.follow_ups}
- Never reveal the ideal answer — coach toward it instead"""

mock_interviewer = Agent(
    name="Mock Interviewer",
    instructions="Dynamic — set per question",
    tools=[evaluate_answer],
)

Post-Interview Feedback Report

After the session, generate a comprehensive feedback report:

def generate_interview_report(
    performance: CandidatePerformance,
) -> dict:
    """Generate a post-interview performance report."""
    report = {
        "role": performance.role,
        "questions_answered": len(performance.answers),
        "track_performance": {},
        "overall_score": 0.0,
        "top_strengths": [],
        "priority_improvements": [],
        "practice_recommendations": [],
    }

    total_score = 0.0
    for track, scores in performance.track_scores.items():
        avg = sum(scores) / len(scores) if scores else 0
        total_score += avg
        report["track_performance"][track] = {
            "average_score": f"{avg:.0%}",
            "questions_answered": len(scores),
            "trend": "improving" if len(scores) > 1
                     and scores[-1] > scores[0] else "stable",
        }

    num_tracks = len(performance.track_scores) or 1
    report["overall_score"] = f"{total_score / num_tracks:.0%}"

    return report

FAQ

How does the agent handle it when a candidate goes off-topic or gives an irrelevant answer?

The evaluator scores off-topic answers low on the "substance" and "accuracy" criteria, which triggers the difficulty adjustment to lower the next question's complexity. The interviewer agent is also instructed to gently redirect — just as a real interviewer would say "That is interesting, but I am specifically asking about..." The system prompt includes the evaluation criteria so the agent knows exactly what a relevant answer should address.

Can the agent simulate different interviewer styles like friendly vs. challenging?

Yes. The interviewer persona is controlled by the system prompt. A "challenging" mode adds instructions like "Push back on vague statements, ask for specific numbers, and express skepticism that forces the candidate to defend their claims." A "friendly" mode uses more encouragement and broader follow-up questions. Candidates should practice with both styles to prepare for real interview variability.

How do you ensure the mock interview covers enough breadth in a limited time?

The InterviewConfig specifies questions per track and total duration. The session manager tracks elapsed time and ensures each track gets proportional coverage. If the candidate spends too long on one question, the agent notes it in feedback and moves to the next topic — mirroring real interview time management expectations.

#InterviewPrep #MockInterviews #CareerAI #Python #AgenticAI #LearnAI #AIEngineering

AI Agent for Interview Preparation: Mock Interviews with Adaptive Questions

Why Mock Interviews Need Adaptive AI

Interview Configuration Model

Adaptive Question Selection

Response Evaluation Agent

The Mock Interview Agent

Post-Interview Feedback Report

FAQ

How does the agent handle it when a candidate goes off-topic or gives an irrelevant answer?

Can the agent simulate different interviewer styles like friendly vs. challenging?

How do you ensure the mock interview covers enough breadth in a limited time?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding