Building a Writing Coach Agent: Grammar, Style, and Structure Feedback
Create an AI writing coach that provides layered feedback on grammar, style, structure, and tone — with actionable revision suggestions and progress tracking across writing sessions.
Why Writing Feedback Needs Layers
Good writing feedback operates at multiple levels simultaneously. A grammar checker catches surface errors but ignores whether the argument is coherent. A structural review ensures logical flow but might miss awkward phrasing. An effective writing coach agent addresses all these layers in a prioritized way — fixing a thesis statement is more important than fixing a comma splice.
The agent provides feedback in four categories, from most impactful to least: Structure (organization and argument flow), Content (clarity of ideas and evidence), Style (voice, tone, and readability), and Mechanics (grammar, spelling, punctuation).
Feedback Data Model
Define structured feedback that organizes suggestions by category and priority:
from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
class FeedbackCategory(str, Enum):
STRUCTURE = "structure"
CONTENT = "content"
STYLE = "style"
MECHANICS = "mechanics"
class Severity(str, Enum):
CRITICAL = "critical" # Must fix: breaks understanding
IMPORTANT = "important" # Should fix: weakens writing
SUGGESTION = "suggestion" # Could improve: polish
@dataclass
class WritingIssue:
category: FeedbackCategory
severity: Severity
location: str # Paragraph or sentence reference
original_text: str
issue_description: str
suggestion: str
revised_text: Optional[str] = None
rule_name: Optional[str] = None # e.g., "passive_voice"
@dataclass
class WritingAnalysis:
overall_score: float # 0-100
category_scores: dict[str, float] = field(default_factory=dict)
issues: list[WritingIssue] = field(default_factory=list)
strengths: list[str] = field(default_factory=list)
word_count: int = 0
readability_grade: float = 0.0
sentence_variety_score: float = 0.0
@property
def critical_issues(self) -> list[WritingIssue]:
return [i for i in self.issues if i.severity == Severity.CRITICAL]
@property
def issues_by_category(self) -> dict[str, list[WritingIssue]]:
grouped: dict[str, list[WritingIssue]] = {}
for issue in self.issues:
cat = issue.category.value
if cat not in grouped:
grouped[cat] = []
grouped[cat].append(issue)
return grouped
Readability Analysis
Before the AI agent reviews the writing, compute quantitative metrics that inform the feedback:
import re
def compute_readability_metrics(text: str) -> dict:
"""Compute readability statistics for the text."""
sentences = re.split(r'[.!?]+', text)
sentences = [s.strip() for s in sentences if s.strip()]
words = text.split()
syllable_count = sum(count_syllables(w) for w in words)
num_sentences = len(sentences)
num_words = len(words)
if num_sentences == 0 or num_words == 0:
return {"error": "text too short to analyze"}
# Flesch-Kincaid Grade Level
avg_sentence_length = num_words / num_sentences
avg_syllables_per_word = syllable_count / num_words
fk_grade = (
0.39 * avg_sentence_length
+ 11.8 * avg_syllables_per_word
- 15.59
)
# Sentence length variety (std deviation)
lengths = [len(s.split()) for s in sentences]
mean_length = sum(lengths) / len(lengths)
variance = sum((l - mean_length) ** 2 for l in lengths) / len(lengths)
std_dev = variance ** 0.5
# Paragraph analysis
paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]
return {
"word_count": num_words,
"sentence_count": num_sentences,
"paragraph_count": len(paragraphs),
"avg_sentence_length": round(avg_sentence_length, 1),
"sentence_length_std": round(std_dev, 1),
"flesch_kincaid_grade": round(fk_grade, 1),
"avg_syllables_per_word": round(avg_syllables_per_word, 2),
}
def count_syllables(word: str) -> int:
"""Rough syllable count using vowel groups."""
word = word.lower().strip(".,!?;:'"")
if not word:
return 0
vowels = "aeiouy"
count = 0
prev_vowel = False
for char in word:
is_vowel = char in vowels
if is_vowel and not prev_vowel:
count += 1
prev_vowel = is_vowel
if word.endswith("e") and count > 1:
count -= 1
return max(1, count)
The Multi-Layer Writing Coach
The writing coach agent operates as a pipeline of specialized reviewers, each focusing on one feedback category:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from agents import Agent, Runner
from pydantic import BaseModel
class StructureFeedback(BaseModel):
thesis_clear: bool
logical_flow: bool
paragraph_transitions: list[str]
organization_issues: list[str]
suggestions: list[str]
structure_reviewer = Agent(
name="Structure Reviewer",
instructions="""Review the writing's organizational structure.
Evaluate:
1. THESIS/MAIN IDEA: Is there a clear central argument or purpose?
If not, suggest where and how to add one.
2. LOGICAL FLOW: Do paragraphs follow a logical progression? Flag
any jumps in logic or missing connections.
3. TRANSITIONS: Are transitions between paragraphs smooth? Identify
abrupt shifts.
4. PARAGRAPH UNITY: Does each paragraph focus on one main idea?
Flag paragraphs that try to cover too much.
5. INTRODUCTION/CONCLUSION: Does the intro set up the argument?
Does the conclusion synthesize rather than merely repeat?
Focus ONLY on structure. Ignore grammar and style issues.""",
output_type=StructureFeedback,
)
style_reviewer = Agent(
name="Style Reviewer",
instructions="""Review the writing's style and voice. Evaluate:
1. ACTIVE vs PASSIVE VOICE: Flag unnecessary passive constructions.
"The ball was thrown by John" -> "John threw the ball"
2. WORDINESS: Identify phrases that can be shortened.
"due to the fact that" -> "because"
3. SENTENCE VARIETY: Flag sections where sentence structure is
monotonous (e.g., five Subject-Verb-Object sentences in a row).
4. TONE CONSISTENCY: Is the tone appropriate and consistent
throughout? Flag shifts.
5. JARGON: Flag technical terms that are not defined for the audience.
Provide specific rewrites, not just general advice.""",
)
Orchestrating the Review Pipeline
Run all reviewers in parallel and merge their feedback into a single prioritized report:
import asyncio
import json
async def full_writing_review(text: str, context: str = "") -> WritingAnalysis:
"""Run all review layers and produce a unified analysis."""
metrics = compute_readability_metrics(text)
prompt = f"Review this writing:\n\n{text}"
if context:
prompt += f"\n\nContext: {context}"
# Run reviewers in parallel
structure_task = Runner.run(structure_reviewer, prompt)
style_task = Runner.run(style_reviewer, prompt)
results = await asyncio.gather(structure_task, style_task)
structure_result = results[0]
style_result = results[1]
analysis = WritingAnalysis(
overall_score=0.0,
word_count=metrics["word_count"],
readability_grade=metrics["flesch_kincaid_grade"],
sentence_variety_score=metrics["sentence_length_std"],
)
# Merge feedback from all reviewers and score
# (In production, parse structured outputs into WritingIssue objects)
analysis.overall_score = calculate_composite_score(
metrics, structure_result, style_result
)
return analysis
Revision Suggestion Engine
Instead of just pointing out problems, the agent generates concrete revision options:
revision_agent = Agent(
name="Revision Suggester",
instructions="""Given a piece of writing and identified issues,
generate specific revision suggestions. For each issue:
1. Quote the exact original text
2. Explain what is wrong and why it matters
3. Provide 2-3 alternative phrasings ranked by quality
4. Explain why the top suggestion is best
Never rewrite the entire piece. Focus on targeted improvements
that the writer can learn from. The goal is to teach the writer
to self-edit, not to edit for them.
Format each suggestion clearly so the writer can accept or reject
individual changes.""",
)
async def get_revision_suggestions(
text: str, issues: list[WritingIssue]
) -> str:
issue_summary = json.dumps([
{
"category": i.category.value,
"location": i.location,
"description": i.issue_description,
"original": i.original_text,
}
for i in issues[:10] # Limit to top 10 issues
])
result = await Runner.run(
revision_agent,
f"Writing:\n{text}\n\nIssues to address:\n{issue_summary}",
)
return result.final_output
FAQ
How does the agent avoid overwhelming the writer with too many issues at once?
The severity classification (critical, important, suggestion) creates a natural triage. The agent presents critical issues first — things like unclear thesis, broken logic flow, or sentences that are genuinely confusing. Style suggestions and minor mechanics come last. For first drafts, the agent might limit feedback to structure and content only, deferring style and mechanics to later revision rounds.
Can the agent adapt to different writing contexts like academic vs. business vs. creative?
Yes. The context parameter passed to the review pipeline changes the evaluation criteria. Academic writing needs formal tone, citation support, and hedged claims. Business writing prioritizes brevity and clear action items. Creative writing tolerates rule-breaking for effect. The agent's system prompt includes context-specific rules so "Use active voice" becomes a firm rule in business writing but a suggestion in creative writing.
How do you track improvement over multiple writing sessions?
Store each WritingAnalysis result with a timestamp and compare category scores over time. A student who consistently improves their structure score from 60 to 80 but plateaus on style at 55 would see the agent shift its coaching emphasis toward style. Trend visualization and session-over-session diffs help the student see concrete progress.
#WritingCoach #GrammarAnalysis #AIFeedback #Python #EducationAI #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.