Text Summarization Techniques for AI Agents: Extractive vs Abstractive Methods

Why Agents Need Summarization

AI agents frequently work with documents, conversation histories, and knowledge bases that exceed context window limits. Summarization compresses information while preserving the key points an agent needs to make decisions. A customer support agent summarizing a 50-message conversation thread, a research agent condensing a 20-page report, or a monitoring agent digesting hundreds of log entries — all rely on summarization as a core capability.

The two fundamental approaches are extractive summarization (selecting the most important sentences verbatim) and abstractive summarization (generating new sentences that capture the meaning). Each has distinct strengths for different agent use cases.

Extractive Summarization

Extractive methods score each sentence in the source text and select the top-scoring ones. They never hallucinate because every word in the summary exists in the original. This makes them ideal for agents that need factual reliability.

from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np

def extractive_summary(text: str, num_sentences: int = 3) -> str:
    """Select the most informative sentences using TF-IDF scoring."""
    sentences = text.split(". ")
    if len(sentences) <= num_sentences:
        return text

    vectorizer = TfidfVectorizer(stop_words="english")
    tfidf_matrix = vectorizer.fit_transform(sentences)

    # Score each sentence by sum of its TF-IDF values
    scores = np.array(tfidf_matrix.sum(axis=1)).flatten()

    # Get indices of top sentences, preserving original order
    top_indices = sorted(
        np.argsort(scores)[-num_sentences:],
    )

    summary = ". ".join(sentences[i] for i in top_indices)
    return summary + "."

document = """Machine learning models require training data to learn patterns.
The quality of training data directly impacts model performance.
Data preprocessing includes cleaning, normalization, and feature engineering.
Feature selection reduces dimensionality and improves training speed.
Cross-validation helps estimate how well a model generalizes to unseen data.
Hyperparameter tuning optimizes model configuration for best results.
Ensemble methods combine multiple models to reduce variance and bias."""

print(extractive_summary(document, num_sentences=3))

Abstractive Summarization with Transformers

Abstractive models generate entirely new text, producing more natural and concise summaries. They can rephrase, merge concepts, and adjust the level of detail.

from transformers import pipeline

summarizer = pipeline(
    "summarization",
    model="facebook/bart-large-cnn",
)

def abstractive_summary(
    text: str,
    max_length: int = 130,
    min_length: int = 30,
) -> str:
    result = summarizer(
        text,
        max_length=max_length,
        min_length=min_length,
        do_sample=False,
    )
    return result[0]["summary_text"]

article = """The European Union announced new regulations for artificial
intelligence systems on Wednesday. The AI Act categorizes AI applications
by risk level, from minimal risk to unacceptable risk. High-risk systems
used in critical infrastructure, education, and law enforcement will face
strict requirements including human oversight, transparency documentation,
and regular audits. Companies that violate the regulations could face
fines of up to 35 million euros or 7 percent of global revenue."""

print(abstractive_summary(article))

Length-Controlled Summarization

Agents often need summaries of specific lengths — a one-line preview for a notification, a paragraph for a dashboard, or a page for a report. Here is a utility that handles multiple formats.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from enum import Enum

class SummaryLength(Enum):
    HEADLINE = "headline"      # ~10 words
    SHORT = "short"            # ~30 words
    MEDIUM = "medium"          # ~80 words
    DETAILED = "detailed"      # ~150 words

LENGTH_CONFIG = {
    SummaryLength.HEADLINE: {"max_length": 20, "min_length": 5},
    SummaryLength.SHORT: {"max_length": 60, "min_length": 20},
    SummaryLength.MEDIUM: {"max_length": 150, "min_length": 60},
    SummaryLength.DETAILED: {"max_length": 300, "min_length": 100},
}

def summarize(text: str, length: SummaryLength) -> str:
    config = LENGTH_CONFIG[length]
    result = summarizer(
        text,
        max_length=config["max_length"],
        min_length=config["min_length"],
        do_sample=False,
    )
    return result[0]["summary_text"]

Key Point Extraction

Sometimes an agent does not need a flowing summary but a list of discrete key points. This hybrid approach extracts and then reformulates.

import openai

def extract_key_points(text: str, max_points: int = 5) -> list[str]:
    """Extract key points from text using an LLM."""
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{
            "role": "user",
            "content": f"""Extract up to {max_points} key points from this text.
Return each point on a new line, prefixed with a dash.
Be concise — each point should be one sentence.

Text: {text}""",
        }],
        temperature=0,
    )

    lines = response.choices[0].message.content.strip().split("\n")
    return [line.lstrip("- ").strip() for line in lines if line.strip()]

Measuring Summary Quality

Agents that summarize autonomously need quality guardrails. The ROUGE metric family measures overlap between generated and reference summaries.

from rouge_score import rouge_scorer

def evaluate_summary(generated: str, reference: str) -> dict:
    """Compute ROUGE scores between generated and reference summaries."""
    scorer = rouge_scorer.RougeScorer(
        ["rouge1", "rouge2", "rougeL"],
        use_stemmer=True,
    )
    scores = scorer.score(reference, generated)
    return {
        metric: {
            "precision": round(score.precision, 3),
            "recall": round(score.recall, 3),
            "f1": round(score.fmeasure, 3),
        }
        for metric, score in scores.items()
    }

reference = "The EU introduced risk-based AI regulations with fines."
generated = "New EU rules classify AI by risk level, with penalties up to 35M euros."
print(evaluate_summary(generated, reference))

ROUGE-1 measures unigram overlap, ROUGE-2 measures bigram overlap, and ROUGE-L measures the longest common subsequence. For agent summarization, ROUGE-L above 0.4 typically indicates adequate quality for downstream decision making.

FAQ

When should an agent use extractive versus abstractive summarization?

Use extractive summarization when factual accuracy is paramount and hallucination is unacceptable — legal documents, medical records, or financial reports. Use abstractive summarization when you need concise, natural-sounding text and the source material is long or repetitive. Many production systems use a hybrid approach: extractive selection first to narrow the source text, then abstractive rewriting for the final summary.

How do I handle summarizing text that exceeds the model's context window?

Split the document into chunks that fit within the model's maximum input length, summarize each chunk independently, then summarize the intermediate summaries. This is called hierarchical or recursive summarization. For very long documents, use a sliding window with overlap to avoid losing information at chunk boundaries.

How can I detect when a summary contains hallucinated information?

Compare named entities and numerical values in the summary against the source text. If the summary mentions a person, date, or number not present in the original, flag it as a potential hallucination. You can also use NLI (Natural Language Inference) models to check whether each summary sentence is entailed by the source document.

#TextSummarization #NLP #Extractive #Abstractive #AIAgents #Python #AgenticAI #LearnAI #AIEngineering

Text Summarization Techniques for AI Agents: Extractive vs Abstractive Methods

Why Agents Need Summarization

Extractive Summarization

Abstractive Summarization with Transformers

Length-Controlled Summarization

Key Point Extraction

Measuring Summary Quality

FAQ

When should an agent use extractive versus abstractive summarization?

How do I handle summarizing text that exceeds the model's context window?

How can I detect when a summary contains hallucinated information?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding