Skip to content
Learn Agentic AI14 min read0 views

Building a Writing Coach Agent: Grammar, Style, and Structure Feedback

Create an AI writing coach that provides layered feedback on grammar, style, structure, and tone — with actionable revision suggestions and progress tracking across writing sessions.

Why Writing Feedback Needs Layers

Good writing feedback operates at multiple levels simultaneously. A grammar checker catches surface errors but ignores whether the argument is coherent. A structural review ensures logical flow but might miss awkward phrasing. An effective writing coach agent addresses all these layers in a prioritized way — fixing a thesis statement is more important than fixing a comma splice.

The agent provides feedback in four categories, from most impactful to least: Structure (organization and argument flow), Content (clarity of ideas and evidence), Style (voice, tone, and readability), and Mechanics (grammar, spelling, punctuation).

Feedback Data Model

Define structured feedback that organizes suggestions by category and priority:

from dataclasses import dataclass, field
from enum import Enum
from typing import Optional

class FeedbackCategory(str, Enum):
    STRUCTURE = "structure"
    CONTENT = "content"
    STYLE = "style"
    MECHANICS = "mechanics"

class Severity(str, Enum):
    CRITICAL = "critical"    # Must fix: breaks understanding
    IMPORTANT = "important"  # Should fix: weakens writing
    SUGGESTION = "suggestion"  # Could improve: polish

@dataclass
class WritingIssue:
    category: FeedbackCategory
    severity: Severity
    location: str  # Paragraph or sentence reference
    original_text: str
    issue_description: str
    suggestion: str
    revised_text: Optional[str] = None
    rule_name: Optional[str] = None  # e.g., "passive_voice"

@dataclass
class WritingAnalysis:
    overall_score: float  # 0-100
    category_scores: dict[str, float] = field(default_factory=dict)
    issues: list[WritingIssue] = field(default_factory=list)
    strengths: list[str] = field(default_factory=list)
    word_count: int = 0
    readability_grade: float = 0.0
    sentence_variety_score: float = 0.0

    @property
    def critical_issues(self) -> list[WritingIssue]:
        return [i for i in self.issues if i.severity == Severity.CRITICAL]

    @property
    def issues_by_category(self) -> dict[str, list[WritingIssue]]:
        grouped: dict[str, list[WritingIssue]] = {}
        for issue in self.issues:
            cat = issue.category.value
            if cat not in grouped:
                grouped[cat] = []
            grouped[cat].append(issue)
        return grouped

Readability Analysis

Before the AI agent reviews the writing, compute quantitative metrics that inform the feedback:

import re

def compute_readability_metrics(text: str) -> dict:
    """Compute readability statistics for the text."""
    sentences = re.split(r'[.!?]+', text)
    sentences = [s.strip() for s in sentences if s.strip()]
    words = text.split()
    syllable_count = sum(count_syllables(w) for w in words)

    num_sentences = len(sentences)
    num_words = len(words)

    if num_sentences == 0 or num_words == 0:
        return {"error": "text too short to analyze"}

    # Flesch-Kincaid Grade Level
    avg_sentence_length = num_words / num_sentences
    avg_syllables_per_word = syllable_count / num_words
    fk_grade = (
        0.39 * avg_sentence_length
        + 11.8 * avg_syllables_per_word
        - 15.59
    )

    # Sentence length variety (std deviation)
    lengths = [len(s.split()) for s in sentences]
    mean_length = sum(lengths) / len(lengths)
    variance = sum((l - mean_length) ** 2 for l in lengths) / len(lengths)
    std_dev = variance ** 0.5

    # Paragraph analysis
    paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]

    return {
        "word_count": num_words,
        "sentence_count": num_sentences,
        "paragraph_count": len(paragraphs),
        "avg_sentence_length": round(avg_sentence_length, 1),
        "sentence_length_std": round(std_dev, 1),
        "flesch_kincaid_grade": round(fk_grade, 1),
        "avg_syllables_per_word": round(avg_syllables_per_word, 2),
    }

def count_syllables(word: str) -> int:
    """Rough syllable count using vowel groups."""
    word = word.lower().strip(".,!?;:'"")
    if not word:
        return 0
    vowels = "aeiouy"
    count = 0
    prev_vowel = False
    for char in word:
        is_vowel = char in vowels
        if is_vowel and not prev_vowel:
            count += 1
        prev_vowel = is_vowel
    if word.endswith("e") and count > 1:
        count -= 1
    return max(1, count)

The Multi-Layer Writing Coach

The writing coach agent operates as a pipeline of specialized reviewers, each focusing on one feedback category:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

from agents import Agent, Runner
from pydantic import BaseModel

class StructureFeedback(BaseModel):
    thesis_clear: bool
    logical_flow: bool
    paragraph_transitions: list[str]
    organization_issues: list[str]
    suggestions: list[str]

structure_reviewer = Agent(
    name="Structure Reviewer",
    instructions="""Review the writing's organizational structure.
Evaluate:

1. THESIS/MAIN IDEA: Is there a clear central argument or purpose?
   If not, suggest where and how to add one.
2. LOGICAL FLOW: Do paragraphs follow a logical progression? Flag
   any jumps in logic or missing connections.
3. TRANSITIONS: Are transitions between paragraphs smooth? Identify
   abrupt shifts.
4. PARAGRAPH UNITY: Does each paragraph focus on one main idea?
   Flag paragraphs that try to cover too much.
5. INTRODUCTION/CONCLUSION: Does the intro set up the argument?
   Does the conclusion synthesize rather than merely repeat?

Focus ONLY on structure. Ignore grammar and style issues.""",
    output_type=StructureFeedback,
)

style_reviewer = Agent(
    name="Style Reviewer",
    instructions="""Review the writing's style and voice. Evaluate:

1. ACTIVE vs PASSIVE VOICE: Flag unnecessary passive constructions.
   "The ball was thrown by John" -> "John threw the ball"
2. WORDINESS: Identify phrases that can be shortened.
   "due to the fact that" -> "because"
3. SENTENCE VARIETY: Flag sections where sentence structure is
   monotonous (e.g., five Subject-Verb-Object sentences in a row).
4. TONE CONSISTENCY: Is the tone appropriate and consistent
   throughout? Flag shifts.
5. JARGON: Flag technical terms that are not defined for the audience.

Provide specific rewrites, not just general advice.""",
)

Orchestrating the Review Pipeline

Run all reviewers in parallel and merge their feedback into a single prioritized report:

import asyncio
import json

async def full_writing_review(text: str, context: str = "") -> WritingAnalysis:
    """Run all review layers and produce a unified analysis."""
    metrics = compute_readability_metrics(text)

    prompt = f"Review this writing:\n\n{text}"
    if context:
        prompt += f"\n\nContext: {context}"

    # Run reviewers in parallel
    structure_task = Runner.run(structure_reviewer, prompt)
    style_task = Runner.run(style_reviewer, prompt)
    results = await asyncio.gather(structure_task, style_task)

    structure_result = results[0]
    style_result = results[1]

    analysis = WritingAnalysis(
        overall_score=0.0,
        word_count=metrics["word_count"],
        readability_grade=metrics["flesch_kincaid_grade"],
        sentence_variety_score=metrics["sentence_length_std"],
    )

    # Merge feedback from all reviewers and score
    # (In production, parse structured outputs into WritingIssue objects)
    analysis.overall_score = calculate_composite_score(
        metrics, structure_result, style_result
    )

    return analysis

Revision Suggestion Engine

Instead of just pointing out problems, the agent generates concrete revision options:

revision_agent = Agent(
    name="Revision Suggester",
    instructions="""Given a piece of writing and identified issues,
generate specific revision suggestions. For each issue:

1. Quote the exact original text
2. Explain what is wrong and why it matters
3. Provide 2-3 alternative phrasings ranked by quality
4. Explain why the top suggestion is best

Never rewrite the entire piece. Focus on targeted improvements
that the writer can learn from. The goal is to teach the writer
to self-edit, not to edit for them.

Format each suggestion clearly so the writer can accept or reject
individual changes.""",
)

async def get_revision_suggestions(
    text: str, issues: list[WritingIssue]
) -> str:
    issue_summary = json.dumps([
        {
            "category": i.category.value,
            "location": i.location,
            "description": i.issue_description,
            "original": i.original_text,
        }
        for i in issues[:10]  # Limit to top 10 issues
    ])

    result = await Runner.run(
        revision_agent,
        f"Writing:\n{text}\n\nIssues to address:\n{issue_summary}",
    )
    return result.final_output

FAQ

How does the agent avoid overwhelming the writer with too many issues at once?

The severity classification (critical, important, suggestion) creates a natural triage. The agent presents critical issues first — things like unclear thesis, broken logic flow, or sentences that are genuinely confusing. Style suggestions and minor mechanics come last. For first drafts, the agent might limit feedback to structure and content only, deferring style and mechanics to later revision rounds.

Can the agent adapt to different writing contexts like academic vs. business vs. creative?

Yes. The context parameter passed to the review pipeline changes the evaluation criteria. Academic writing needs formal tone, citation support, and hedged claims. Business writing prioritizes brevity and clear action items. Creative writing tolerates rule-breaking for effect. The agent's system prompt includes context-specific rules so "Use active voice" becomes a firm rule in business writing but a suggestion in creative writing.

How do you track improvement over multiple writing sessions?

Store each WritingAnalysis result with a timestamp and compare category scores over time. A student who consistently improves their structure score from 60 to 80 but plateaus on style at 55 would see the agent shift its coaching emphasis toward style. Trend visualization and session-over-session diffs help the student see concrete progress.


#WritingCoach #GrammarAnalysis #AIFeedback #Python #EducationAI #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.