AI Quiz Generator Agent: Creating Assessments from Any Content Source
Build an AI agent that analyzes text, lectures, or documents and automatically generates multiple-choice, short-answer, and true/false questions with calibrated difficulty levels.
The Problem with Manual Quiz Creation
Instructors spend hours crafting quiz questions that test the right concepts at the right difficulty level. A single well-written multiple-choice question requires identifying the key concept, writing a clear stem, creating one correct answer, and generating plausible distractors — wrong answers that would tempt a student with a specific misconception. Scaling this process across an entire course is time-consuming and error-prone.
An AI quiz generator agent automates this by analyzing source content, identifying testable concepts, and producing questions across multiple formats with calibrated difficulty. The agent does not just rephrase sentences as questions — it understands the underlying knowledge structure and generates assessments that probe genuine understanding.
Question Type Definitions
Start by defining a structured output format for different question types:
from pydantic import BaseModel, Field
from enum import Enum
from typing import Optional
class QuestionType(str, Enum):
MULTIPLE_CHOICE = "multiple_choice"
TRUE_FALSE = "true_false"
SHORT_ANSWER = "short_answer"
FILL_IN_BLANK = "fill_in_blank"
class Difficulty(str, Enum):
RECALL = "recall" # Remember facts
UNDERSTANDING = "understanding" # Explain concepts
APPLICATION = "application" # Apply to new situations
ANALYSIS = "analysis" # Break down and evaluate
class Distractor(BaseModel):
text: str
misconception: str = Field(
description="The specific misconception this wrong answer targets"
)
class QuizQuestion(BaseModel):
question: str
question_type: QuestionType
difficulty: Difficulty
correct_answer: str
distractors: list[Distractor] = []
explanation: str = Field(
description="Why the correct answer is right"
)
source_concept: str = Field(
description="The concept from the source material being tested"
)
bloom_level: str = Field(
description="Bloom taxonomy level: remember, understand, apply, "
"analyze, evaluate, create"
)
class QuizOutput(BaseModel):
title: str
questions: list[QuizQuestion]
coverage_summary: str
Content Analysis Pipeline
Before generating questions, the agent needs to extract key concepts from the source material. This two-stage approach produces much better questions than generating directly from raw text:
from agents import Agent, Runner
import json
concept_extractor = Agent(
name="Concept Extractor",
instructions="""Analyze the provided educational content and extract
a structured list of key concepts. For each concept, identify:
1. The concept name
2. A one-sentence definition
3. Prerequisites (other concepts it depends on)
4. Common misconceptions students have about it
5. The cognitive level required to understand it (remember/understand/
apply/analyze)
Return a JSON array of concept objects. Focus on concepts that are
testable — skip transitional phrases and meta-commentary.""",
)
async def extract_concepts(content: str) -> list[dict]:
result = await Runner.run(
concept_extractor,
f"Extract testable concepts from this content:\n\n{content}",
)
return json.loads(result.final_output)
Distractor Generation Strategy
The quality of a multiple-choice question lives or dies on its distractors. Good distractors are plausible to a student with a specific misunderstanding but clearly wrong to a student who understands the concept:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
distractor_agent = Agent(
name="Distractor Generator",
instructions="""You generate plausible wrong answers for
multiple-choice questions. Each distractor must:
1. Be grammatically consistent with the question stem
2. Be approximately the same length as the correct answer
3. Target a SPECIFIC misconception (document which one)
4. Never be partially correct or debatable
5. Never use absolute words like 'always' or 'never' that
test-wise students would eliminate
For each distractor, explain the misconception it targets so
instructors can review the pedagogical reasoning.""",
)
async def generate_distractors(
question: str, correct_answer: str, concept: dict, count: int = 3
) -> list[dict]:
prompt = f"""Question: {question}
Correct answer: {correct_answer}
Concept: {concept['name']} — {concept['definition']}
Common misconceptions: {concept.get('misconceptions', [])}
Generate {count} distractors as a JSON array with 'text' and
'misconception' fields."""
result = await Runner.run(distractor_agent, prompt)
return json.loads(result.final_output)
The Quiz Generator Agent
Now combine concept extraction, question generation, and distractor creation into a single orchestrating agent:
quiz_generator = Agent(
name="Quiz Generator",
instructions="""You are an expert assessment designer. Given a list
of extracted concepts, generate quiz questions that:
1. Cover all major concepts from the source material
2. Mix question types (multiple choice, true/false, short answer,
fill-in-blank)
3. Distribute difficulty across Bloom's taxonomy levels
4. Include clear explanations for correct answers
5. For multiple-choice questions, generate 3 distractors that each
target a specific student misconception
Difficulty calibration rules:
- 40% recall/understanding questions (foundational)
- 40% application questions (intermediate)
- 20% analysis questions (challenging)
Return the quiz in the specified JSON schema.""",
output_type=QuizOutput,
)
async def generate_quiz(
content: str, num_questions: int = 10
) -> QuizOutput:
# Stage 1: Extract concepts
concepts = await extract_concepts(content)
# Stage 2: Generate calibrated quiz
prompt = f"""Source concepts:
{json.dumps(concepts, indent=2)}
Generate a quiz with {num_questions} questions covering these concepts.
Ensure balanced difficulty distribution and question type variety."""
result = await Runner.run(quiz_generator, prompt)
return result.final_output_as(QuizOutput)
Difficulty Calibration
A common failure mode is generating questions that are all the same difficulty. The agent uses Bloom's taxonomy levels as a calibration framework and validates the distribution after generation:
def validate_difficulty_distribution(
quiz: QuizOutput,
) -> dict[str, float]:
counts: dict[str, int] = {}
for q in quiz.questions:
level = q.difficulty.value
counts[level] = counts.get(level, 0) + 1
total = len(quiz.questions)
distribution = {k: v / total for k, v in counts.items()}
# Check against target distribution
targets = {"recall": 0.2, "understanding": 0.2,
"application": 0.4, "analysis": 0.2}
warnings = []
for level, target in targets.items():
actual = distribution.get(level, 0)
if abs(actual - target) > 0.15:
warnings.append(
f"{level}: target {target:.0%}, actual {actual:.0%}"
)
return {"distribution": distribution, "warnings": warnings}
FAQ
How do you ensure questions test understanding rather than just rephrasing the text?
The two-stage pipeline is key. By first extracting abstract concepts and their relationships, the question generation stage works from conceptual understanding rather than surface-level text. The Bloom's taxonomy classification forces the agent to create questions at the application and analysis levels, which inherently require deeper understanding than simple recall.
Can the agent generate questions from non-text sources like videos or slides?
Yes, with a preprocessing step. For videos, pass a transcript through the concept extractor. For slides, concatenate the text content with slide context. The concept extraction stage normalizes all source formats into the same structured representation, so the question generator works identically regardless of input format.
How do you prevent duplicate or near-duplicate questions?
Add a deduplication pass after generation that computes semantic similarity between question stems using embeddings. Questions with cosine similarity above 0.85 should be flagged, and the agent can be prompted to regenerate replacements that test the same concept from a different angle.
#QuizGeneration #AssessmentAI #EducationTechnology #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.